0 Replies Latest reply on May 28, 2010 5:21 PM by Ginney

    cfpdf action=extracttext


      In CF9, I am attempting to use the cfpdf action=extracttext to pull text from a pdf document.  Extracting text works fine, but I need to go a step further and I need to extract the text in a structured format.  That is I need to be able to keep line breaks, centering, indents/tabs, etc.  I had hoped the using useStructure="true" along with honourspaces="true", would do the trick, but I cannot see that either have done anything.


      I have tried both of the following:


      <cfpdf pages="1" useStructure="true" honourspaces="true" type="xml" action="extracttext" source="pdfFile.pdf"  name="pdfXml" />

      <cfpdf pages="1" useStructure="true" honourspaces="true" type="string" action="extracttext" source="pdfFile.pdf"  name="pdfString" />


      It does not seem to matter if I extract the text in string format or in xml format, either way the text all just runs together without any structure - no line breaks, no indents, etc.


      Am I missing something, or is it just impossible to obtain the basic structure of the PDF document when extracting the text?