I am working on a tool to extract images and text boxes from IDML documents and use this information to draw the corresponding bounding boxes on an image (and more, but that is what matters now). In order to do this, I need to transform the IDML item coordinates to page image coordinates, where origin (0, 0) is at the top left corner.
I am applying all item transforms up to the top parent, then the page transform, then the spread transform. For master items, I apply the parent transforms, then all inherited master page transforms, then the page transform, and lastly the spread transform. I thought this would be the right way to do it since it looked correct for many sample documents. I have now encountered documents with a more complicated IDML structure where this does not produce the correct result.
In particular, I am encountering documents where the left and right page for a spread have geometric bounds (bounding boxes) that do not start at the same y-value. To compensate for this, I have added an extra transform to move the page origo to 0,0. To make it even more complicated, master pages can also have different origo - e.g. a spread page may have origo at (10, 60), the applied master page at (0, 0), and the parent master page at (0, 50). Master pages also do not need to match the size of the corresponding spread pages, apparently.
So what I'm asking is a step by step description of exactly which transforms that are needed to move all master items as well as normal spread items to the same page coordinate system where top left is always at (0, 0). Also, I'd like to know the correct method of determining to what page an item belongs. I am currently checking if the item bounding box intersects the page geometric bounds. This gets a bit messy for master items, I seem to get lost in when to do this in the transformation chain and what bounding box to use.
Anyhow, all help is greatly appreciated. I can provide samples of the IDML documents I need to work with if needed. These documents have been generated with InDesign CC. The documents I previously could handle correctly came from CS6. Don't know if that matters ...