-
1. Re: Adobe Reader crashes on duplicate pages
lrosenth Sep 19, 2014 6:57 AM (in response to jamescarter2)Well, it shouldn't crash. Can you post a sample PDF that demonstrates the problem.
Having duplicate page objects in the page tree is ambiguous vis-a-vis the spec. It's not prohibited for it's also not normally done and most people would recommend against it.
I also find it pretty interesting that you are finding pages that are 100% duplicates (including page content, resources, etc.)
-
2. Re: Re: Adobe Reader crashes on duplicate pages
jamescarter2 Sep 19, 2014 7:28 AM (in response to lrosenth)Thanks. Here are the details:
If I load the file into the latest OS X Reader 11.0.09, and then scroll down to the second page, I see a wrongly-sized second page (a little white square), and then it crashes with what looks like following an inappropriate pointer into the zero page:
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000086
I have built a minimal 1046 byte example (click 'download pdf'):
The examples I have where it occurs in the wild are where blank or "this page is left intentionally blank" pages have been inserted in a document. Thus they lack content streams, or have identical content streams / resources. For example, I have a 40000 page output from a third party electricity bill generator. It shares content streams for all its blank pages, but has separate page objects. So, when my program squeezes it, it notices that all the /Page objects for those "blank" pages are identical, and coalesces them.
(side note: the same program seems to create /Contents arrays for its non-blank pages which have a common first element so as to share the background image, Adobe Reader doesn't mind several /Contents entries pointing to the same object.)
-
3. Re: Adobe Reader crashes on duplicate pages
lrosenth Sep 19, 2014 8:15 AM (in response to jamescarter2)Content elements can be shared - even on the same page - because it's basically read/streamed in as a single chunk
Pages are harder, because they can be manipulated (edit/delete/move/etc.) and so having duplicate entries poses all sorts of unexpected complexities on management and caching.
-
4. Re: Adobe Reader crashes on duplicate pages
Test Screen Name Sep 20, 2014 4:50 AM (in response to jamescarter2)ACtually, duplicate pages are catastrophic, but content can otherwise be shared. To see why, envisage a PDF reader tring to follow a link to a page object. It has the object, and it can display it, but it now needs to know the number - but what is the page number? a reader might constantly check page number, discovered by using the page object. Just don't do it. Also imagine an app adding an annotation to a page. If your construct is legal it must always check for duplicated page dictionaries and clone them.
I Was wary of using the same object as multiple contents but I've never seen problems with this. i imagined editing problems but the lack of inlplace stream editing stops that.
-
5. Re: Adobe Reader crashes on duplicate pages
jamescarter2 Sep 20, 2014 5:13 AM (in response to Test Screen Name)Right, following a link in a reader is a killer problem - we don't know which of the identical pages to jump to.
So I'll make sure my squeezer doesn't coalesce equal page objects. The question is, though - if some applications can't cope with duplicated page objects to the extent that they crash, will they be able to cope with duplicated entries in, say, number trees or any other PDF data structure?
Perhaps the standard should have a sentence on the topic, at least for the page tree, since it's so special.



