0 Replies Latest reply on Sep 2, 2010 2:25 PM by kevran

    Programically convert/extract text from PDF


      Hey there -- I have been struggling with this all week. I am trying to take a PDF that we are sent daily and have the data (text) extracted for placement in our database. I have tried multiple PHP classes & functions as well as running a PERL script through PHP.


      The methods I used above worked for a sample PDF I downloaded from here: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm07 2322.pdf


      So the problem I am having is getting the PDF that our Vendor is sending us to convert as well. This specific PDF document is generated with Amuyuni PDF Converter version and the only difference I see in these two PDFs is when I use Notepad to view the raw data.


      Sample PDF:


      376 0 obj<< /Linearized 1 /O 379 /H [ 1063 556 ] /L 220094 /E 92903 /N 12 /T 212455 >> endobj

                                                           xref376 20 0000000016 00000 n



      Vendor PDF:

      %PDF-1.3%ÿÿÿÿ1 0 obj<</Title (þÿ I n t u i t _ Q B O B _ I n t e r n a l . p d f)/Producer (Amyuni PDF Converter version CreationDate (D:20100830160629-07'00')>>endobj7 0 obj<< /Length 8 0 R /Filter /FlateDecode >>streamxœ ›M®ã6 €O&#144;;ä õˆú³  ’¼dÑ]Ñwƒ)º





      Is there anything out there that I might be able to use to convert this particular PDF document?