Skip navigation
Paiz
Currently Being Moderated

Can you find a string inside of apdf using cfpdf in Coldfusion?

Apr 9, 2012 3:06 PM

Tags: #pdf #coldfusion #cfpdf

Is it possilbe to determine if a String, for example "Not Possible",  exists in a PDF using CFPDF or another function inside of Coldfusion?  If so, any suggestions on how to do this would be appreciated.

 

Thanks!

 
Replies
  • Currently Being Moderated
    Apr 10, 2012 1:44 AM   in reply to Paiz

    Yes, it is possible. In the following example, the 2 files and the PDF ('myDoc.pdf') are in the same directory.

     

    textFromPDF.cfm

     

    <!--- Convert from PDF to text and search text --->

    <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

    <cfset ddxfile = "#currentDir#myDDX.ddx">

    <cfset inputStruct=StructNew()>

    <cfset inputStruct.Doc1= "#currentDir#myDoc.pdf">

     

    <cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

    <cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

     

    <cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

     

    <cfif myDDXVar.out1 is "successful"><!--- read the text --->

        <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

    </cfif>

    Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

     

    myDDX.ddx

     

    <?xml version="1.0" encoding="UTF-8"?>

    <DDX xmlns="http://ns.adobe.com/DDX/1.0/"

       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

       xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

       <DocumentText result="Out1">

          <PDF source="Doc1"/>

       </DocumentText>

    </DDX>

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 10, 2012 1:54 PM   in reply to Paiz

    I suspect you made the same mistake I did in the beginning. Note that there is a space before the word coldfusion in:

    "http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd"

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 10, 2012 2:38 PM   in reply to Paiz

    Hmm, challenging construction! What about first writing the PDF to disk? I know it's inefficient, but let us first get a working example.

     

    <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

    <cfset ddxfile = "#currentDir#myDDX.ddx">

     

    <cffile action="write" file="#currentDir#myNewDoc.pdf" output="#pdfBinaryVariable#" >

     

    <cfset inputStruct=StructNew()>

    <cfset inputStruct.Doc1= "#currentDir#myNewDoc.pdf">

     

    <cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

    <cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

     

    <cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

     

    <cfif myDDXVar.out1 is "successful"><!--- read the text --->

        <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

        <!---<cfdump var="#my_PDF_doc_as_text#">--->

    </cfif>

    Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 10, 2012 2:49 PM   in reply to Paiz

    Can you open the newly created PDF? Is its content what you expected?

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 10, 2012 2:57 PM   in reply to Paiz

    Could you please show us your code.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 10, 2012 11:44 PM   in reply to Paiz

    Paiz wrote:

     

    By the way,  CF9 has an extracttext action on cfpdf!

     

    <cfpdf action="read" source="pdfFile.pdf" name="mypdf">

    <cfpdf

        action="extracttext"

        source= "mypdf"

        pages = "*"

        type = "xml"

        destination = "#currentDir#testxml.xml" >

    Ahhh, there's the kind of efficiency we want! I went with DDX from memory, as I had used it a lot in a project. I honestly didn't think of 'extractText'! Thanks for bringing it in and lightening the load.

     

    However, at least, as I see it, the main problem remains how to go from the byte array from the database to the text file. What about something like this:

     

    <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

     

    <cffile action="write" file="#currentDir#myNewDoc.pdf" output="#query.pdf#" >

    <cfpdf action="extracttext" source="#currentDir#myNewDoc.pdf" name="txtFromPdf">

     

    <p>Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",txtFromPdf)#</cfoutput></p>

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 11, 2012 12:52 PM   in reply to Paiz

    That must feel like a bit of a downer, after your discovery of the extractText action!

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points