Skip navigation
Mohammed_Mostafa
Currently Being Moderated

How can I check text rotation when extracting text from pdf file?

Aug 20, 2013 4:45 AM

If there are rotated text in pdf file how can I detect this rotation in the output stream??

 
Replies
  • Currently Being Moderated
    Aug 20, 2013 5:05 AM   in reply to Mohammed_Mostafa

    Rotation is an aspect of the effective text matrix (combined effect of cm and Tm). Also the entire page may be rotated by setting the Rotate key in the Page dictionary.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 20, 2013 5:17 AM   in reply to Mohammed_Mostafa

    Certainly you must combine the CTM and text matrix. This is as stated in 9.4.4, where Trm must be calculated for each single character that is placed.

     

    Once you have the matrix, and understand the effect of matrixes as necessary, you will be able to examine its components to find if the text is rotated. An alternative approach is to transform a bounding rectangle for the text, and then examine the coordinates of the resulting rectangle to see if the Y coordinates of the baseline are identical. This is basically an application of Trm rather than an entirely different technique.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 20, 2013 5:44 AM   in reply to Mohammed_Mostafa

    Perhaps you could do it by trigonometry. Transform a horizontal line with the matrix, work out the angle between the lines.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 21, 2013 12:29 AM   in reply to Mohammed_Mostafa

    You can try and invert the matrix but it will only work for simple rotations. In practice, rotations may be combined with skewing and I've never seen a suitable formula.

     

    I stand by my reply #5. If I was unfortunate enough to face this problem, that would be my approach. Transform with matrix, then basic trigonometry.

     
    |
    Mark as:
  • Currently Being Moderated
    Aug 21, 2013 1:18 AM   in reply to Mohammed_Mostafa

    So would many people!

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 3:51 AM   in reply to Test Screen Name

    Because I couldn't get a solution!
    Please given the below stream and Tm matrix how can I calculate rotation angle for specific text in pdf??

    q

    /AbsoluteColorimetric ri

    1 g

    /GS1 gs

    36.013 36.977 540 720 re

    f

    BT

    /F1 1 Tf

    0 -43.92 43.92 0 334.093 672.9771 Tm

    0 g

    0.0088 Tc

    0.0877 Tw

    (Fonts, Fonts, and more Fonts!)Tj

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 4:13 AM   in reply to baccah

    Maybe this will help, if getting the angle between the lines is the hard part (the rest is standard PDF and matrix transforms)

    http://www.wikihow.com/Find-the-Angle-Between-Two-Vectors

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 5:07 AM   in reply to baccah

    And whatever algorithm you come up with does NOT just apply to text, but also to any object – since all you are doing is working with the CTM.

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 5:54 AM   in reply to Test Screen Name

    Thanks for this helpful link but please I need to know which 2 matrices I should work on them to get the angle and their size?

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:02 AM   in reply to baccah

    The matrix you use is Trm. Transform a horizonal line with Trm. Then work out the angle between the two lines (i.e. the transformed line and the original horizontal line).

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:06 AM   in reply to Test Screen Name

    The caped PDF crusader! I always forget, are you the guy with the PDF

    utility belt, or do you change in a phone booth?

     

     

    Karl Heinz Kremer

    PDF Acrobatics Without a Net

    PDF Software Development, Training and More...

     

    khk@khk.net

    http://www.khkonsulting.com

     

     

     

    On Wed, Aug 21, 2013 at 4:18 AM, Test Screen Name

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:10 AM   in reply to Test Screen Name

    This an output stream, please get me the 2 matrices from here??

    q

    /AbsoluteColorimetric ri

    1 g

    /GS1 gs

    36.013 36.977 540 720 re

    f

    BT

    /F1 1 Tf

    0 -43.92 43.92 0 334.093 672.9771 Tm

    0 g

    0.0088 Tc

    0.0877 Tw

    (Fonts, Fonts, and more Fonts!)Tj

    0 -31.92 31.92 0 234.973 492.9771 Tm

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:15 AM   in reply to baccah

    Do you have a specific problem with understanding 32000-1 (the PDF standard), and deriving Trm? We would like to help you understand the specification, hat is why this forum is here.

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:20 AM   in reply to Test Screen Name

    PDF 32000-1 is very ambiguous and it is difficult to understant anything from it!!!!

    As example, rotation topic not clear in the book and i can't understand how pdf make rotation for text and which 2 matrices i deal with them and how can i get the angle???its explain is difficult and ambiguous!!!

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 6:38 AM   in reply to Test Screen Name

    Please give me the section that explain this point and if there is thing incomprehensible i will tell you

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 7:41 AM   in reply to baccah

    Is that the ENTIRE stream?   Because everything in the stream could potentially impact the CTM.

     

    Start with an identity matrix for both matrices.

     

    When you find the cm operator, concat it with your CTM.   When you find the ™ operator set or concat the text matrix.

     

    Once you have the matrix, then its up to you to figure out what it means for your application.

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 7:47 AM   in reply to lrosenth

    yes, this is full stream:

    q

    /AbsoluteColorimetric ri

    1 g

    /GS1 gs

    36.013 36.977 540 720 re

    f

    BT

    /F1 1 Tf

    0 -43.92 43.92 0 334.093 672.9771 Tm

    0 g

    0.0088 Tc

    0.0877 Tw

    (Fonts, Fonts, and more Fonts!)Tj

    0 -31.92 31.92 0 234.973 492.9771 Tm

    0.0014 Tc

    0.0142 Tw

    (EuroTeX 2001)Tj

    -0.2932 -1.4436 TD

    0.0081 Tc

    0.0814 Tw

    (Tom Kacvinsky)Tj

    1 -1.4361 TD

    0.0043 Tc

    0 Tw

    (2001/09/25)Tj

    ET

    Q

     

    please simply get me 2 matrices which should i work on them??

     
    |
    Mark as:
  • Currently Being Moderated
    Sep 1, 2013 7:49 AM   in reply to baccah

    There are some ambiguities in 32000-1, but there are not so many as it might seem. Most confusion comes from people who try to read just a little of the standard rather read it all, carefully and repeatedly, which is needed for any serious work.

     

    But if you find a specific thing that you believe to be ambiguous, this forum is the perfect place to ask about it.

     

    The description of Tm is in 9.4.4; it is vital to understand all of this.

     

    8.3.4 is a full (so far as I know) definition of how a rotation matrix is to be created; the prerequisite knowledge of matrix arithmetic is not defined because it is just maths. The PDF standard does not tell you how to get the angle because you do not need to know this information either to create or to render a PDF. So if you are unlucky enough to want this information, you have to do your own mathemetical analysis to work this out, in the cases where the question is even meaningful.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points