0 Replies Latest reply on Sep 2, 2010 2:25 PM by kevran

    Programically convert/extract text from PDF

    kevran

      Hey there -- I have been struggling with this all week. I am trying to take a PDF that we are sent daily and have the data (text) extracted for placement in our database. I have tried multiple PHP classes & functions as well as running a PERL script through PHP.

       

      The methods I used above worked for a sample PDF I downloaded from here: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm07 2322.pdf

       

      So the problem I am having is getting the PDF that our Vendor is sending us to convert as well. This specific PDF document is generated with Amuyuni PDF Converter version 4.0.0.7. and the only difference I see in these two PDFs is when I use Notepad to view the raw data.

       

      Sample PDF:

      %PDF-1.3%âãÏÓ

      376 0 obj<< /Linearized 1 /O 379 /H [ 1063 556 ] /L 220094 /E 92903 /N 12 /T 212455 >> endobj

                                                           xref376 20 0000000016 00000 n

       

       

      Vendor PDF:

      %PDF-1.3%ÿÿÿÿ1 0 obj<</Title (þÿ I n t u i t _ Q B O B _ I n t e r n a l . p d f)/Producer (Amyuni PDF Converter version 4.0.0.7)/ CreationDate (D:20100830160629-07'00')>>endobj7 0 obj<< /Length 8 0 R /Filter /FlateDecode >>streamxœ ›M®ã6 €O&#144;;ä õˆú³  ’¼dÑ]Ñwƒ)º

       

       

       

       

      Is there anything out there that I might be able to use to convert this particular PDF document?

       

      Thanks!

       

      Kevin