8 Replies Latest reply on Oct 3, 2008 11:08 PM by BKBK

    isPDFFile() and CFPDF won't recognize a PDF

    zbis12
      I have a PDF that I can open up in Adobe Acrobat Reader 8.0 without problems. However, if I use the CFPDF tag to action = "read", I get the following error:
      "Error: Invalid Document D:\temp\test.pdf specified for source or directory. "

      If I use the isPDFFile() with the same file, I get a return of false.

      The only way around this is using the <cffile action="readBinary"... and using the <cfontent> tag.

      I had this same problem on 8.0 on a version 1.2 document, but installing 8.0.1 fixed it. This is a 1.3 document.

      Here is a link to the file
      test.pdf

      Is this a bug in CF?
        • 1. Re: isPDFFile() and CFPDF won't recognize a PDF
          Level 7
          I can confirm the behaviour you're seeing. I think it's a bug. You should
          raise as such with Adobe.

          --
          Adam
          • 2. Re: isPDFFile() and CFPDF won't recognize a PDF
            -==cfSearching==- Level 4
            Yes, I get the same results with some sort of exception about the trailer.

            com.adobe.internal.pdftoolkit.core.exceptions.PDFCosParseException: Expected &apos;trailer&apos; : 66066
            at com.adobe.internal.pdftoolkit.core.cos.XRefTable.readTrailer(Unknown Source)
            at com.adobe.internal.pdftoolkit.core.cos.XRefTable.parseTableXrefChain(Unknown Source)
            ..

            Interestingly cfpdf can read the file if you copy it with iText first. So maybe it is a bug or if it is a malformed document maybe iText is more forgiving ..?

            <cfscript>
            pdfFileIn = "c:\pathToFile\test.pdf";
            pdfFileOut = "c:\pathToFile\testCopy.pdf";
            pdfReader = createObject("java", "com.lowagie.text.pdf.PdfReader").init( pdfFileIn );
            streamOut = createObject("java", "java.io.FileOutputStream").init( pdfFileOut );
            pdfStamper = createObject("java", "com.lowagie.text.pdf.PdfStamper").init( pdfReader, streamOut );
            pdfStamper.close();
            streamOut.close();
            </cfscript>

            <cfpdf action="read" source="#pdfFileOut#" name="pdfContent">
            • 3. isPDFFile() and CFPDF won't recognize a PDF
              BKBK Adobe Community Professional & MVP
              Me smell a fish. The document is very likely corrupt. In more ways than one.

              First, it is corrupt as a PDF binary. Coldfusion has done you a favour and told you so.

              Secondly, it is as corrupt as counterfeit. Two reasons:

              1) Pages 2,3 and 4 contain copyright information in the margin, which page 1 lacks.
              2) The font type changes when you go from page 1 to page 2.

              IsPDFFile is a blunt tool, and makes no firm promises. The documentation tells you, "This function returns False if the value is not a valid pathname to a PDF file, the pathname is null, the PDF file is not valid, or the PDF file is corrupted.".

              The fact that this is a legal document makes the fish even smellier. By the way, if those are really someone's private details, what are they doing here?




              • 4. Re: isPDFFile() and CFPDF won't recognize a PDF
                zbis12 Level 1
                Adam - Thanks for taking a look. I submitted the bug.

                CFSearching - Thanks for taking a look and providing the sample with iText. That made it work for me too. This will help me create a workaround for now. I haven't familiarized myself with iText yet. That's going to change.

                BKBK - You may be correct on the 'corrupt as a PDF binary'. However, I had this same problem with a 1.2 versioned document when using 8.0 to display it. After doing the 8.0.1 update, that document looked fine, so it's hard to tell is it's an acrobat issue or a CF issue. Either way, it's an Adobe issue.
                On the counterfeit remark - please don't dig too much into that. This is a court document created by an attorney using bankruptcy software and some word processor, printed it, scanned it, ocr'd it, and submitted it to our system. The copyright info you see in the margin is for the bankruptcy software he uses.
                On the personal information - This document is a matter of public record and can be downloaded on the internet for a fee (.08/page). You may also drive to Birmingham, AL and view it for free in our reception area. Once you file Bankruptcy, your case information becomes public.

                • 5. Re: isPDFFile() and CFPDF won't recognize a PDF
                  -==cfSearching==- Level 4
                  quote:

                  Originally posted by: zbis12
                  On the counterfeit remark - please don't dig too much into that. This is a court document created by an attorney using bankruptcy software and some word processor, printed it, scanned it, ocr'd it, and submitted it to our system.

                  The copyright info you see in the margin is for the bankruptcy software he uses.



                  Not being a lawyer I could not comment on copyrights. But I will say that, given the number of steps involved, it certainly seems like there is an opportunity for file corruption.

                  quote:

                  Originally posted by: zbis12
                  On the personal information - This document is a matter of public record and can be downloaded on the internet for a fee (.08/page). You may also drive to Birmingham, AL and view it for free in our reception area. Once you file Bankruptcy, your case information becomes public.



                  I suspect you are 100% correct about it being public record. That said, my personal approach falls in the "just because you can, does not mean you should" category ;-) Even when I am not bound by some sort of NDA, I try to not to disclose client related information. IMO it is just good business sense, for both the client and myself. Plus, it just shows some basic courtesy to all parties involved. Something that is often in short supply these days.

                  (My $0.02 for what is worth ;-)
                  • 6. isPDFFile() and CFPDF won't recognize a PDF
                    BKBK Adobe Community Professional & MVP
                    Zbis12,
                    I understand what you say. However, it doesn't go against my remarks. In fact, you can verify what I said.

                    There is a change in font. There are also clear signs that the material had undergone image processing and perhaps a merge operation before it finally became a PDF document. All of which increases the chances for the file to be corrupt.

                    The fact that you can open a PDF with certain software, including Adobe's own Reader, is not a guarantee that the file is error-free. Observe it for yourself.

                    Open your file, test.pdf, in a text editor. Replace each occurrence of 65775 with 66066. Save the file as test2.pdf. Now, perform the exercise that -==cfSearching==- gave earlier, using test2.pdf instead. You will see that isPDFFile("testCopy.pdf") returns "Yes", even though you've blatantly corrupted the original file.

                    I wouldn't submit a bug report on this one. I think it is the document that has a bug, not Coldfusion.

                    • 7. Re: isPDFFile() and CFPDF won't recognize a PDF
                      Level 7
                      > I wouldn't submit a bug report on this one. I think it is the document that
                      > has a bug, not Coldfusion.

                      I'm split on this one. I can see where you're coming from, and am mostly
                      inclined to agree.

                      However I think an equitable benchmark for the function could be "will it
                      open in Acrobat (reader)?"

                      It does, so it's reasonable for that test to pass.

                      A relevant aside here... I presume docs created using a current version of
                      Acrobat will not necessarily open in older versions of Acrobat, unless some
                      sort of compatibility mode is used. In this light, shouldn't the function
                      take an optional "PDF version" argument too? Or is there a minimum
                      standard that qualifies as "is a valid PDF"?

                      (I never work with PDFs, so am completely ignorant of this sort of thing).

                      --
                      Adam
                      • 8. Re: isPDFFile() and CFPDF won't recognize a PDF
                        BKBK Adobe Community Professional & MVP
                        Adam Cameron wrote:
                        A relevant aside here... I presume docs created using a current
                        version of Acrobat will not necessarily open in older versions of
                        Acrobat, unless some sort of compatibility mode is used. In this
                        light, shouldn't the function take an optional "PDF version"
                        argument too? Or is there a minimum standard that qualifies
                        as "is a valid PDF"?


                        Good question. Readers from Adobe should take note, pun intended, naturally.