17 Replies Latest reply on Sep 30, 2016 3:13 PM by rjplummer

    getPageNthWordQuads fails

    rjplummer

      I have  a set of pdf pages where getPageNthWordQuads returns the wrong coordinates. The coordinates appear to be offset 15 pts up and to the left. Anybody else seen this or have a suggestion how to detect  that this page has an issue?

       

      I checked all the values returned by getPageBox and nothing seemed different from pages that return correct results

       

      Any word on the page is offset the same amount, so it's a translation error, not a scaling error

        • 1. Re: getPageNthWordQuads fails
          George_Johnson MVP & Adobe Community Professional

          It's hard to say what's wrong without looking at the actual file.

          • 3. Re: getPageNthWordQuads fails
            George_Johnson MVP & Adobe Community Professional

            I'm not seeing any problems. When I ran a script (using Acrobat 9.5.5) to add a strikeout markup for every word using the same quads, they were all correctly placed. Can you give an example of a word in that document and the corresponding quad that you believe isn't correct?

            • 4. Re: getPageNthWordQuads fails
              Test Screen Name Most Valuable Participant

              Perhaps you're assuming that (0,0) is a corner of the visible page, rather than just a relative measure.

              • 5. Re: getPageNthWordQuads fails
                rjplummer Level 1

                You're right. I appear to have misstated the issue. I'm trying to add a link.The example from Adobe's JavaScript Reference using the Matrix2D class draws the offset box. Since the page properties seem to indicate that the coordinate systems are the same, I tried creating a link using coordinates from the quads directly and got the same results.

                 

                But using the quads to create an annotation that's created directly from quads seems to work.

                So my question becomes: How do I tell that that the two coordinate systems are different? And why doesn't Matrix2D work

                 

                Here's the code based on Adobe's example:

                 

                var q = this.getPageNthWordQuads(0, 200);

                // Convert quads in default user space to rotated

                // User space used by Links.

                m = (new Matrix2D).fromRotated(this,0);

                mInv = m.invert()

                r = mInv.transform(q)

                r=r.toString()

                r = r.split(",");

                l = this.addLink(0, [r[4], r[5], r[2], r[3]]);

                l.borderColor = color.red;

                l.borderWidth = 1;

                l.setAction("this.getURL('http://www.adobe.com/');");

                • 6. Re: getPageNthWordQuads fails
                  rjplummer Level 1

                  Thanks for your interest.

                   

                  If this were the case, wouldn't some value of getPageBox show this? If not, how do I determine that the origin is offset? And why doesn't Adobe's Matrix2D class take this into account?

                  • 7. Re: getPageNthWordQuads fails
                    Test Screen Name Most Valuable Participant

                    The crop box would give you the effective, visible, origin. But I'd expect the APIs to use the same coordinate system. I can't say because I don't know what Matrix2D is.

                     

                    The problem may be that a quad is not a rect; that's why there are two types. A rect is identified by lower-left x, lower-left y, upper-right x, and upper-right y. But a quad is identified by four corners of a quadrllateral. Crucially

                    (a) a quadrilateral may not be a rectangle.

                    (b) a quadrilateral may be a rotated rectangle e.g. at 45 degrees

                    (c) the corners of a quadrilateral may be for an object rotated eg upside down, so the lower left of the object is not the lowest or the leftist in the page coordinate system.

                     

                    You have to decide how to convert, if going to an annotation type that doesn't accept quads. One way is to get the enclosing axis-aligned rectangle, by taking min(x1,x2,x3,x4), min(y1,y2,y3,y4), max(x1,x2,x3,x4), max(y1,y2,y3,y4).

                    • 8. Re: getPageNthWordQuads fails
                      George_Johnson MVP & Adobe Community Professional

                      Here's a link to a good tutorial that might help: https://acrobatusers.com/tutorials/auto_placement_annotations

                      • 9. Re: getPageNthWordQuads fails
                        rjplummer Level 1

                        Thanks, I know the quads are horizontal rectangles from examiing the quads. I considered the possibility that the quads were upside-down, which might cause the vertical offset (since the vertical offset may be the height of the rectangle), but it couldn't cause the horizontal offset.

                        • 10. Re: getPageNthWordQuads fails
                          rjplummer Level 1

                          Thanks for the suggestion. I understand the geometry and what the Matrix2D class does. I can't figure out why it's not working for a handful of pages out of hundreds.

                          • 11. Re: getPageNthWordQuads fails
                            rjplummer Level 1

                            I'm back to my original issue. I look at the values returned by getPageNthWordQuads and from my measurements, they don't correspond to the position of the word on the page. My guess is the origin of certain pages is not in the corner of the page. Adobe's Matrix2D class doesn't seem to take this into account either. Values for getPageBox aren't any different for pages that have this problem and pages that don't

                             

                            I'm happy to live with this issue if somebody can tell me how to programatically identify these pages

                            • 12. Re: getPageNthWordQuads fails
                              Test Screen Name Most Valuable Participant

                              Certainly you must not assume the origin is the corner of the page. You should consider

                              1. The Crop Box. If there is one, the corner is from the Crop box, relative to the Media Box.

                              2. The Media Box. This defines the corner of the original media. For example, if the bottom left is 72,72 then 0,0 is one inch below and to the left of the page

                              3. The Rotate value, which will rotate the viewed page after all of the above is applied.

                              • 13. Re: getPageNthWordQuads fails
                                Bernd Alheit Adobe Community Professional & MVP

                                The code creates correct links when I create a new document from your document with printing to Adobe PDF.

                                • 14. Re: getPageNthWordQuads fails
                                  rjplummer Level 1

                                  Thanks for your answer.

                                   

                                  Crop and Media have exactly the same values, also the same as pages where I can draw link boxes correctly.

                                   

                                  If I show rulers, I can see that addLink is drawing a box at the position I specify based on the quads returned for the word. There's no value returned by getPageBox that tells me why getPageNthWordQuads returns coordinates for a box that's offset from the ruler measurements.

                                  • 15. Re: getPageNthWordQuads fails
                                    rjplummer Level 1

                                    Thanks for responding.

                                     

                                    I'm sure the code works for you. The code works for probably 99% of pdf pages. It's that other 1%, e.g., http://plummer.us/BadPage.pdf

                                     

                                    If you can tell me why the code doesn't work on my example page, I'd be grateful

                                    • 16. Re: getPageNthWordQuads fails
                                      Karl Heinz Kremer Adobe Community Professional

                                      The problem is that Doc.getPageBox() will not give you the actual media or crop box, it will do some cleanup and then give you something that in this case is different from the actual media/crop box. When you bring up the preflight tool, and then browse the PDF contents, you will see this for the page boxes:

                                       

                                      2016-09-30_15-26-07.png

                                       

                                      As you can see, both the media and the crop box do not start at (0.0), they have an offset of almost +-/12pt. I assume that's also the offset that you see between the word you want to place the link on and the link that's actually placed on the page.

                                       

                                      I don't see any way you can get the true coordinates from this document (or any other document with the same type of page boxes) in JavaScript. A plug-in can do this - or an application based on the Adobe PDF library.

                                      • 17. Re: getPageNthWordQuads fails
                                        rjplummer Level 1

                                        That certainly makes sense. So the problem is that getPageBox is returning results (whether correct or not) that cause their Matrix2D class and the rulers in Acrobat to give incorrect results. When I get a chance, I'll see if using setPageBoxes to clear them fixes the page