Because PDF isn't a text format, there's little value in just copy/pasting the text. The line endings could be wrong. Almost certainly, byte addresses will change. However, I can see the problem here.
The problem is the duplicated page object. You can duplicate everything else e.g. Contents, Resources as indirect objects.
The reason is that Acrobat needs to be able to say "here is an object, it's a page object. What page is it?" -- then negotate the Pages tree to find it.
I wasn't expecting anyone to run the listing - just explain why it would not display four instances of the same page.
"The reason is that Acrobat needs to be able to say "here is an object, it's a page object. What page is it?" -- then negotate the Pages tree to find it."
But why can it not do that. The /Pages dictionary simply contains 4 references to the same indirect page reference
Can you point me to anywhere in the PDF Reference documentation that explicitly states whether this is (or is not) an admissable structure?
As mentioned the Chrome PDF reader displays all four pages as I had hoped.
1. It may surprise you, but more than 50% of the problems which people ask about with PDF generators turn out to be related to the exact byte layout, and disappear (or get worse) with copy/paste. People with tools for PDF examination like to be able to use them and know they are working on the exact/same file.
"But why can it not do that."
Because the question is unanswerable. Imagine an API call
Given object 5 0 R and asking for the page number, what is the correct answer - 1, 2, 3, or 4? This is of more than academic interest. For example, the page object might be the destination of a link, and an interactive viewer needs to know which page to navigate to. So, while a simple PDF viewer could display it, many things might not work.
ISO 32000-1 doesn't forbid it. But there isn't much mileage in taking the high ground and saying that one is making good files but Acrobat isn't compliant.
Firstly I was not aware of 'taking the high ground' and claiming the Adobe reader is not compliant - just asking why it did not display four pages.
It also wouldn't surprise me if most PDF problems did stem from incorrectly composed documents but that is not, AFAICS, relevant in this case.
Secondly, since, as you say, the ISO does not address the issue it seems to me a moot point as to whether Adobe or Chrome is providing the correct implementation.
"Given object 5 0 R and asking for the page number, what is the correct answer - 1, 2, 3, or 4"
Obviously there isn't one. But again I can find nothing in ISO 32000 that implies such an answer should be determinable from the document structure?
You weren't taking the high ground, but I've been there and found it most unhelpful, so I was trying to steer the discussion back to what works rather than the detail of what the ISO spec says.
If we do want to pursue it we could look at 126.96.36.199 "A destination defines a particular view of a document, consisting of ...The page of the document that shall be displayed", and not "one or more pages". So in practical terms these are unhelpful, and some might argue implicitly forbidden.
This structure will also mess up users who take their PDFs through editing tools. Any addition of content, or annotations, will update the single page object and hence all the pages, which may well confuse both user and software. I think you'll find other software getting confused if it goes beyond viewing.
It would helpful for ISO32000-2 to be more specific.
I accept the points you made (although I am not providing any outline information)
However, in my scenario, the final document is created 'on-the-fly' purely for printing and certain pages within the document need to printed more than once. In practice the user would be *more* confused (and pretty upset) if the duplicated pages could differ.
I have got round the problem by adding more pages (one for each physical page) but pointing the Content for each duplicated page to the same Stream object. This works with both Adobe and Chrome - but with the disadvantage that the resulting PDF is, neccessarily, somewhat larger. And, of course, if opened for editing would be just as, if not more, confusing to a user - in fact I have no idea what the resulting behaviour would be.
Thank you for your input, onwards........
I use the technique of sharing identical content streams myself and no problems have yet been reported. This may also be due to the Acrobat model of low level object editing; streams cannot be edited, only recreated. So an updated content stream stored in one page object would then "break free" of the other duplicates.
IdenticalResources could give problems, but don't seem to in practice. It is common practice in any case to add the same resources throughout a document, so editors should start by duplicating direct resources before starting per-page edits. I would not recommend identical Annots.