23 Replies Latest reply on Sep 1, 2013 7:49 AM by Test Screen Name

# How can I check text rotation when extracting text from pdf file?

If there are rotated text in pdf file how can I detect this rotation in the output stream??

• ###### 1. Re: How can I check text rotation when extracting text from pdf file?

Rotation is an aspect of the effective text matrix (combined effect of cm and Tm). Also the entire page may be rotated by setting the Rotate key in the Page dictionary.

• ###### 2. Re: How can I check text rotation when extracting text from pdf file?

must combine cm and tm? if this How can detect rotation by combining cm and tm?

• ###### 3. Re: How can I check text rotation when extracting text from pdf file?

Certainly you must combine the CTM and text matrix. This is as stated in 9.4.4, where Trm must be calculated for each single character that is placed.

Once you have the matrix, and understand the effect of matrixes as necessary, you will be able to examine its components to find if the text is rotated. An alternative approach is to transform a bounding rectangle for the text, and then examine the coordinates of the resulting rectangle to see if the Y coordinates of the baseline are identical. This is basically an application of Trm rather than an entirely different technique.

• ###### 4. Re: How can I check text rotation when extracting text from pdf file?

Thakns more for these informations, but can i get specific angle for this rotation from this matrix?

• ###### 5. Re: How can I check text rotation when extracting text from pdf file?

Perhaps you could do it by trigonometry. Transform a horizontal line with the matrix, work out the angle between the lines.

• ###### 6. Re: How can I check text rotation when extracting text from pdf file?

Many Thanks

• ###### 7. Re: How can I check text rotation when extracting text from pdf file?

"Rotations shall be produced by [cos q sin q -sin q cos q 0 0], which has the effect of rotating the coordinate system axes by an angle q counter clockwise."

Please, I get this statement from pdf refrence book in Graphics Categoty, can you tell me how can i calculate this matrix and how can i produce right angle for this rotation? because i read more in this book but i can't understand anything!

• ###### 8. Re: How can I check text rotation when extracting text from pdf file?

You can try and invert the matrix but it will only work for simple rotations. In practice, rotations may be combined with skewing and I've never seen a suitable formula.

I stand by my reply #5. If I was unfortunate enough to face this problem, that would be my approach. Transform with matrix, then basic trigonometry.

• ###### 9. Re: How can I check text rotation when extracting text from pdf file?

I would like to know what is your name and where are you from?

• ###### 10. Re: How can I check text rotation when extracting text from pdf file?

So would many people!

• ###### 11. Re: How can I check text rotation when extracting text from pdf file?

Because I couldn't get a solution!
Please given the below stream and Tm matrix how can I calculate rotation angle for specific text in pdf??

q

/AbsoluteColorimetric ri

1 g

/GS1 gs

36.013 36.977 540 720 re

f

BT

/F1 1 Tf

0 -43.92 43.92 0 334.093 672.9771 Tm

0 g

0.0088 Tc

0.0877 Tw

(Fonts, Fonts, and more Fonts!)Tj

• ###### 12. Re: How can I check text rotation when extracting text from pdf file?

Maybe this will help, if getting the angle between the lines is the hard part (the rest is standard PDF and matrix transforms)

http://www.wikihow.com/Find-the-Angle-Between-Two-Vectors

• ###### 13. Re: How can I check text rotation when extracting text from pdf file?

And whatever algorithm you come up with does NOT just apply to text, but also to any object – since all you are doing is working with the CTM.

• ###### 14. Re: How can I check text rotation when extracting text from pdf file?

Thanks for this helpful link but please I need to know which 2 matrices I should work on them to get the angle and their size?

• ###### 15. Re: How can I check text rotation when extracting text from pdf file?

The matrix you use is Trm. Transform a horizonal line with Trm. Then work out the angle between the two lines (i.e. the transformed line and the original horizontal line).

• ###### 16. Re: How can I check text rotation when extracting text from pdf file?

The caped PDF crusader! I always forget, are you the guy with the PDF

utility belt, or do you change in a phone booth?

Karl Heinz Kremer

PDF Acrobatics Without a Net

PDF Software Development, Training and More...

khk@khk.net

http://www.khkonsulting.com

On Wed, Aug 21, 2013 at 4:18 AM, Test Screen Name

• ###### 17. Re: How can I check text rotation when extracting text from pdf file?

This an output stream, please get me the 2 matrices from here??

q

/AbsoluteColorimetric ri

1 g

/GS1 gs

36.013 36.977 540 720 re

f

BT

/F1 1 Tf

0 -43.92 43.92 0 334.093 672.9771 Tm

0 g

0.0088 Tc

0.0877 Tw

(Fonts, Fonts, and more Fonts!)Tj

0 -31.92 31.92 0 234.973 492.9771 Tm

• ###### 18. Re: How can I check text rotation when extracting text from pdf file?

Do you have a specific problem with understanding 32000-1 (the PDF standard), and deriving Trm? We would like to help you understand the specification, hat is why this forum is here.

• ###### 19. Re: How can I check text rotation when extracting text from pdf file?

PDF 32000-1 is very ambiguous and it is difficult to understant anything from it!!!!

As example, rotation topic not clear in the book and i can't understand how pdf make rotation for text and which 2 matrices i deal with them and how can i get the angle???its explain is difficult and ambiguous!!!

• ###### 20. Re: How can I check text rotation when extracting text from pdf file?

Please give me the section that explain this point and if there is thing incomprehensible i will tell you

• ###### 21. Re: How can I check text rotation when extracting text from pdf file?

Is that the ENTIRE stream?   Because everything in the stream could potentially impact the CTM.

When you find the cm operator, concat it with your CTM.   When you find the ™ operator set or concat the text matrix.

Once you have the matrix, then its up to you to figure out what it means for your application.

• ###### 22. Re: How can I check text rotation when extracting text from pdf file?

yes, this is full stream:

q

/AbsoluteColorimetric ri

1 g

/GS1 gs

36.013 36.977 540 720 re

f

BT

/F1 1 Tf

0 -43.92 43.92 0 334.093 672.9771 Tm

0 g

0.0088 Tc

0.0877 Tw

(Fonts, Fonts, and more Fonts!)Tj

0 -31.92 31.92 0 234.973 492.9771 Tm

0.0014 Tc

0.0142 Tw

(EuroTeX 2001)Tj

-0.2932 -1.4436 TD

0.0081 Tc

0.0814 Tw

(Tom Kacvinsky)Tj

1 -1.4361 TD

0.0043 Tc

0 Tw

(2001/09/25)Tj

ET

Q

please simply get me 2 matrices which should i work on them??

• ###### 23. Re: How can I check text rotation when extracting text from pdf file?

There are some ambiguities in 32000-1, but there are not so many as it might seem. Most confusion comes from people who try to read just a little of the standard rather read it all, carefully and repeatedly, which is needed for any serious work.

But if you find a specific thing that you believe to be ambiguous, this forum is the perfect place to ask about it.

The description of Tm is in 9.4.4; it is vital to understand all of this.

8.3.4 is a full (so far as I know) definition of how a rotation matrix is to be created; the prerequisite knowledge of matrix arithmetic is not defined because it is just maths. The PDF standard does not tell you how to get the angle because you do not need to know this information either to create or to render a PDF. So if you are unlucky enough to want this information, you have to do your own mathemetical analysis to work this out, in the cases where the question is even meaningful.