7 Replies Latest reply: Sep 29, 2014 5:57 AM by sammousa1 RSS

    Can't construct a valid cross reference stream.

    sammousa1 Community Member

      I'm building a PDF generation library from scratch.

      Currently I'm having trouble generating a valid crossreference stream, but I'm totally lost as to why it is invalid.

      %PDF-1.7

      0 0 obj

      <<

      /Pages 1 0 R

      /Type /Catalog

      >>

      endobj

      1 0 obj

      <<

      /Type /Pages

      /Kids [2 0 R]

      /Count 1

      >>

      endobj

      2 0 obj

      <<

      /Parent 1 0 R

      /Type /Page

      /MediaBox [0 0 612 792]

      /Contents 3 0 R

      >>

      endobj

      3 0 obj

      <<

      /Length 0

      >>

      stream

      endstream

      endobj

      4 0 obj

      <<

      /Type /XRef

      /W [1 2 0]

      /Size 6

      /Length 16

      >>

      stream

      ... endstream endobj startxref 254 %%EOF

      The full pdf file can be found here: https://www.dropbox.com/s/mvn0xptf0lasb28/test.pdf?dl=0

       

      According to the spec a PDF file can consist of only objects with exemption of the first line (the 2nd is a comment) and the part from startxref.

      Any tips would be greatly appreciated.

      For simplicitly I've added the stream (extract via a hex editor) below:

      0A 01 00 0D 01 00 3E 01 00 77 01 00 CE 01 00 FE 0A

      Note that the stream starts and ends with a newline character. There are 17 bytes and the last line ending is not part of the stream length.

      The remaining bytes 16 bytes have 15 bytes of data, (the first line ending is ignored (right?)):

      01 00 0D

      01 00 3E

      01 00 77

      01 00 CE

      01 00 FE

       

      As far as I can tell this PDF file's cross reference stream is valid. Any help would be greatly appreciated!

        • 1. Re: Can't construct a valid cross reference stream.
          Test Screen Name CommunityMVP

          It's never worth looking closely at a PDF considered as text. It's impossible for the reader to see problems with byte addresses and line endings. If you want it examined please share it on a file sharing site or your own. (You cannot email it).

           

          However, your statement that the stream contains line endings is worrying. It must not, though the normal rules of whitespace around the stream and endstream keyword apply; the newlines they permit or require are not part of the stream, just as they are not part of the Flate compressed data in a page. The format of an XRef stream has no requirement to ignore a newline (unless you see that written down).

          • 2. Re: Can't construct a valid cross reference stream.
            Test Screen Name CommunityMVP

            Sorry, just saw you had a file link.  Your statements about your stream are incorrect.

            "There are 17 bytes" -- clearly, there are 16 bytes (Length value)

            "the last line ending is not part of the stream length." Yes, it is. The line ending after stream is a normal part of PDF syntax however, and not part of the stream. So the first bytes of the stream are at offset 0x13d, value 01000d...


            It seems to me that if your entries are 3 bytes the stream length must be a multiple of 3. And certainly that 0A at the end is wrong.

             

            • 3. Re: Can't construct a valid cross reference stream.
              sammousa1 Community Member

              Thanks for the quick responses.

              I figured it out by looking at a lot of example files.

               

              The specs say Size should be 1+ the "highest object number".

              QPDF a library implementing PDF transformations requires the Length to be equal to Size * sum(W).

              However according to the specs if I have objects numbered 1, 2 and 3 the Size parameter should be 4.

              What I did was padding the stream with null bytes to the appropriate length.

               

              Still seems that the spec could be more clear and should explicitly state that the Size parameter is the number of objects in the stream +1 and that there needs to be a null entry as first reference.

              (This requirement is mentioned for xref tables but not xref streams).

               

              Cheers,

              • 4. Re: Can't construct a valid cross reference stream.
                Test Screen Name CommunityMVP

                The spec is clear - but not self contained at this point - and your approach of filling with nulls just extraordinary good luck. You need to read the section about normal (old fashioned) xref structures, and the special meaning of object 0, which is required and cannot be a regular object in the file.

                • 5. Re: Can't construct a valid cross reference stream.
                  sammousa1 Community Member

                  I disagree. Let me show some exempts from the spec:

                   

                  Cross reference table section

                  The first entry in the table (object number 0) is always free and has a generation number of 65,535; it is the head of the linked list of free objects.

                   

                  Cross reference stream section

                  For files that use cross-reference streams entirely (that is, files that are not hybrid-reference files; see “Compatibility with Applications That Do Not Support PDF 1.5” on page 109), the keywords xref and trailer are no longer used.

                  Therefore, with the exception of the startxref address %%EOF segment and comments, a PDF 1.5 file is entirely a sequence of objects.

                   

                  Nowhere in the cross reference stream a required entry for object number 0 is specified.

                   

                  It makes absolutely no sense to "assume" that a requirement for the xref table translates to a requirement for an xref stream.

                  Filling with #0 works because the type is 0, the next free object doesn't exist and the generation number is #0.

                   

                  Even worse, in the example several pages further, the following ASCIIHex encoded crossreference stream is shown:

                  stream 

                  01 0E8A 0 % Entry for object 2 (0x0E8A = 3722)

                  02 0002 00 % Entry for object 3 (in object stream 2, index 0) 

                  02 0002 01 % Entry for object 4 (in object stream 2, index 1) 

                  02 0002 02 % … 

                  02 0002 03 

                  02 0002 04 

                  02 0002 05 

                  02 0002 06 

                  02 0002 07 % Entry for object 10 (in object stream 2, index 7) 

                  01 1323 0 % Entry for object 11 (0x1323 = 4899) 

                  endstream 

                   

                  So now the real requirement seems to be that the 0 object must be in the crossreference stream only if there is no xref table...

                  Still if a document requires a special object 0 it should be specified and not be specified as part of the xref table section since that is (or should be) irrelevant for files that don't use cross reference tables.

                  • 6. Re: Can't construct a valid cross reference stream.
                    lrosenth Adobe Employee

                    What document are you using?   This doesn't read like ISO 32000-1:2008, which is the PDF standard.

                    • 7. Re: Can't construct a valid cross reference stream.
                      sammousa1 Community Member

                      The exempts are actually from the adobe 1.7 version, but section 7.5.8.3 tells you exactly teh same.