It's hard to say what's wrong without looking at the actual file.
I'm not seeing any problems. When I ran a script (using Acrobat 9.5.5) to add a strikeout markup for every word using the same quads, they were all correctly placed. Can you give an example of a word in that document and the corresponding quad that you believe isn't correct?
Perhaps you're assuming that (0,0) is a corner of the visible page, rather than just a relative measure.
But using the quads to create an annotation that's created directly from quads seems to work.
So my question becomes: How do I tell that that the two coordinate systems are different? And why doesn't Matrix2D work
Here's the code based on Adobe's example:
var q = this.getPageNthWordQuads(0, 200);
// Convert quads in default user space to rotated
// User space used by Links.
m = (new Matrix2D).fromRotated(this,0);
mInv = m.invert()
r = mInv.transform(q)
r = r.split(",");
l = this.addLink(0, [r, r, r, r]);
l.borderColor = color.red;
l.borderWidth = 1;
Thanks for your interest.
If this were the case, wouldn't some value of getPageBox show this? If not, how do I determine that the origin is offset? And why doesn't Adobe's Matrix2D class take this into account?
The crop box would give you the effective, visible, origin. But I'd expect the APIs to use the same coordinate system. I can't say because I don't know what Matrix2D is.
The problem may be that a quad is not a rect; that's why there are two types. A rect is identified by lower-left x, lower-left y, upper-right x, and upper-right y. But a quad is identified by four corners of a quadrllateral. Crucially
(a) a quadrilateral may not be a rectangle.
(b) a quadrilateral may be a rotated rectangle e.g. at 45 degrees
(c) the corners of a quadrilateral may be for an object rotated eg upside down, so the lower left of the object is not the lowest or the leftist in the page coordinate system.
You have to decide how to convert, if going to an annotation type that doesn't accept quads. One way is to get the enclosing axis-aligned rectangle, by taking min(x1,x2,x3,x4), min(y1,y2,y3,y4), max(x1,x2,x3,x4), max(y1,y2,y3,y4).
Thanks, I know the quads are horizontal rectangles from examiing the quads. I considered the possibility that the quads were upside-down, which might cause the vertical offset (since the vertical offset may be the height of the rectangle), but it couldn't cause the horizontal offset.
Thanks for the suggestion. I understand the geometry and what the Matrix2D class does. I can't figure out why it's not working for a handful of pages out of hundreds.
I'm back to my original issue. I look at the values returned by getPageNthWordQuads and from my measurements, they don't correspond to the position of the word on the page. My guess is the origin of certain pages is not in the corner of the page. Adobe's Matrix2D class doesn't seem to take this into account either. Values for getPageBox aren't any different for pages that have this problem and pages that don't
I'm happy to live with this issue if somebody can tell me how to programatically identify these pages
Certainly you must not assume the origin is the corner of the page. You should consider
1. The Crop Box. If there is one, the corner is from the Crop box, relative to the Media Box.
2. The Media Box. This defines the corner of the original media. For example, if the bottom left is 72,72 then 0,0 is one inch below and to the left of the page
3. The Rotate value, which will rotate the viewed page after all of the above is applied.
The code creates correct links when I create a new document from your document with printing to Adobe PDF.
Thanks for your answer.
Crop and Media have exactly the same values, also the same as pages where I can draw link boxes correctly.
If I show rulers, I can see that addLink is drawing a box at the position I specify based on the quads returned for the word. There's no value returned by getPageBox that tells me why getPageNthWordQuads returns coordinates for a box that's offset from the ruler measurements.
Thanks for responding.
I'm sure the code works for you. The code works for probably 99% of pdf pages. It's that other 1%, e.g., http://plummer.us/BadPage.pdf
If you can tell me why the code doesn't work on my example page, I'd be grateful
The problem is that Doc.getPageBox() will not give you the actual media or crop box, it will do some cleanup and then give you something that in this case is different from the actual media/crop box. When you bring up the preflight tool, and then browse the PDF contents, you will see this for the page boxes:
As you can see, both the media and the crop box do not start at (0.0), they have an offset of almost +-/12pt. I assume that's also the offset that you see between the word you want to place the link on and the link that's actually placed on the page.
That certainly makes sense. So the problem is that getPageBox is returning results (whether correct or not) that cause their Matrix2D class and the rulers in Acrobat to give incorrect results. When I get a chance, I'll see if using setPageBoxes to clear them fixes the page