I am trying to import one Pdf-document into another using a Reference XObject. I am looking at thePDF- specification, and I thought I was doing things the right way, but its not working. The PDF-specification doesen't have an example of how to do this, only an explanation.
Below is the PDF-file I am testing with (if anyone can be bothered looking at it
). The file embeds another pdf-file, and is supposed to show a part of the embedded file at a certain position.
Again, if anyone can be bothered looking at my PDF and see if they can see what I am doing wrong here I would appreciate it.
(There is nothing wrong with the embedded file itself...)
PDF-file embedding another PDF-file and using a Reference XObject to display the embedded file:
%PDF-1.5
1 0 obj
<<
/Type /XObject
/Subtype /Form
/BBox [0 0 100 100]
/Ref <<
/F (4fa27162e72547a00771606.pdf)
/Page 1
>>
>>
endobj
2 0 obj
<<
/Type /Filespec
/F (4fa27162e72547a00771606.pdf)
/EF <</F 3 0 R>>
>>
endobj
3 0 obj
<<
/Type /EmbeddedFile
/Length 854
>>
stream
%PDF-1.5
1 0 obj
<<
/Length 334
>>
stream
q
1 0 0 1 128.10769621539 70.591821183642 cm
1 0 0 1 0 0 cm
1 0 0 1 0 0 cm
1 0 0 1 0 0 cm
1 0 0 1 -128.10769621539 -70.591821183642 cm
/DeviceCMYK CS
1 1 1 1 SCN
/DeviceCMYK cs
0 0.89 0.89 0.05 scn
0.2743205486411 w
2.743205486411 M
0 J
0 j
-1.9202438404877 -2.6517653035306 260.05588011176 146.48717297435 re
h
B
Q
endstream
endobj
2 0 obj
<<
/Type /Page
/Parent 3 0 R
/Resources <<
/Font <<>>
/XObject <<>>
>>
/MediaBox [0 0 255.11811023622 141.73228346457]
/Contents [1 0 R]
>>
endobj
3 0 obj
<<
/Type /Pages
/Kids [2 0 R]
/Count 1
>>
4 0 obj
<<
/Type /Catalog
/Pages 3 0 R
>>
endobj
xref
0 5
0000000000 65535 f
0000000010 00000 n
0000000401 00000 n
0000000568 00000 n
0000000624 00000 n
trailer
<<
/Size 5
/Root 4 0 R
>>
startxref
679
%%EOF
endstream
endobj
4 0 obj
<<
/Length 322
>>
stream
q
1 0 0 1 93.268986537973 71.826263652527 cm
1 0 0 1 0 0 cm
1 0 0 1 0 0 cm
1 0 0 1 0 0 cm
1 0 0 1 -93.268986537973 -71.826263652527 cm
55.961391922784 45.628651257302 74.615189230378 52.39522479045 re
W n
1 0 0 1 55.961391922784 42.062484124968 cm
74.615189230378 0 0 55.961391922784 0 0 cm
/XobjectPDF1 Do
Q
endstream
endobj
5 0 obj
<<
/Type /Page
/Parent 6 0 R
/Resources <<
/Font <<>>
/XObject << /XobjectPDF1 1 0 R >>
>>
/MediaBox [0 0 255.11811023622 141.73228346457]
/Contents [4 0 R]
>>
endobj
6 0 obj
<<
/Type /Pages
/Kids [5 0 R]
/Count 1
>>
7 0 obj
<<
/Type /Catalog
/Pages 6 0 R
>>
endobj
xref
0 8
0000000000 65535 f
0000000010 00000 n
0000000144 00000 n
0000000238 00000 n
0000001172 00000 n
0000001551 00000 n
0000001738 00000 n
0000001794 00000 n
trailer
<<
/Size 8
/Root 7 0 R
>>
startxref
1849
%%EOF
A Reference XObject is used for referring to ONE PAGE of another PDF – usually a completely separate file, though it could also be embedded. It is not a well supported feature of PDF – it requires Acrobat/Reader 9 or later. I am not aware of any other PDF viewer that supports it ☹.
It’s extremely hard to read/follow what you post. If you want a PDF looked at – please post a link to an actual PDF.
PDF is a binary format - even tough sometimes it looks like straight ASCII
when you open it in an editor. This means that we cannot really do anything
with what you've posted. If I would save it and open it in my debug tool,
it would report a corrupt file.
Reference XObjects are a pretty tricky subject, and a lot of planets have
to perfectly align to make them work. Take a look at this blog post from
four years ago that explains what you need to do in terms of setting up
Reader or Acrobat, but also has links to sample files in the comments:
https://blogs.adobe.com/ReferenceXObjects/2008/06/reference_xobjects_1 .html#comments
Karl Heinz Kremer
PDF Acrobatics Without a Net
PDF is a binary format - even tough sometimes it looks like straight ASCII
when you open it in an editor.
This means that we cannot really do anything
with what you've posted. If I would save it and open it in my debug tool,
it would report a corrupt file.
I don't get that. According to the specification "A non-encrypted PDF can be entirely represented using byte values corresponding to the visible printable subset of the character set defined in ANSI X3.4-1986, plus white space characters."
Binary data can of course be included in the PDF, but again, "The tokens that delimit objects and that describe the structure of a PDF file shall use the ASCII character set."
(From section 7.2.1 General)
My code does not contain any binary data, so if I copy and paste the code into any text editor and save it as a ASCII text file, it opens fine in the PDF-readers I have tried.
I know it may look a little messy like this. Sorry about that, but I just thought it was an easy way of doing it.
PDF is a binary format. The most important piece in a PDF file is the cross
reference table. It contains byte offsets to the different objects in a
file. If you insert just one space (which in a true ASCII file will not
change much), you are making all byte offsets after the insertion point
invalid. They now all point to something different than the start of an
object.
Because there is no standard about how line endings are encoded in text
files, every time may add (or subtract) a byte - depending on what computer
system and what editor you use to convert your PDF to fake-ASCII, and then
again what I use to try to convert it back to PDF. If you are using a
Windows system and I'm using a Mac, the cross reference table gets
corrupted. And that's even before the web server does it's magic and
potentially adds data.
If you want somebody to look at your file.
Have you looked at the post I linked to? The settings in Acrobat are
important. You need to enable Reference XObjects, AND you have to declare
the directory where your target file is stored as trusted.
From what I can see in your PDF code, you are doing a whole bunch of cm
operations. You may want to consolidate all those into one operation. I
have not done a detailed analysis, but you could potentially shift your
data off the page. Keep it simple and see if that makes a difference.
Karl Heinz Kremer
PDF Acrobatics Without a Net
PDF is a binary format. The most important piece in a PDF file is the cross reference table. It contains byte offsets to the different objects in a file. If you insert just one space (which in a true ASCII file will not change much), you are making all byte offsets after the insertion point invalid. They now all point to something different than the start of an object.
True, you can't edit the file without editing the xref-table. But as long as you don't edit the file without editing the xref-table, and as long as you don't save in a wrong encoding, my code should work fine.
(I have tried to copy and paste it from the post without any problems.)
Thanks for even bothering to look at the code, though.
The reason there are so many 'cm' operations there is because the pdf-is generated from a little program I made.
The 'cm'operations don't have any effect in this case thought, because in the posted code they always start with a certain translation, followed by 4 "identity matrixes" (1 0 0 1 0 0 cm), before ending with being translated back again.
(I know the 'cm' s are not the problem because I can do the same operations with an image XObject instead of an Reference XObject, and then things work fine...)
Ok, I uploaded it on a filesharing site. Hope it works.
This should be the link
http://i.minus.com/1336142392/P0Wc6W01M9-bfpohRCTqCg/dbxLJXYihJSUjU.pd f
That link does not work.
How many pages do you have in your reference document? You are trying to
pull in the second page. If this is only a one page document, it will of
course not work. Remember that page numbers in the PDF world are zero
based. Assuming that your referenced document does indeed have a 100x100pt
page as indicated by your BBox statement, the following content stream
should give you a correctly placed reference XObject:
q
1 0 0 1 0 0 cm
/XobjectPDF1 Do
Q
Again, you need to make sure that your directory is correctly configured in
Acrobat's preferences, and that you have Reference XObjects enabled for all
documents (and not just PDF/X5, which is the default).
Karl Heinz Kremer
PDF Acrobatics Without a Net
North America
Europe, Middle East and Africa
Asia Pacific