17 Replies Latest reply: Apr 11, 2012 12:52 PM by BKBK RSS

    Can you find a string inside of apdf using cfpdf in Coldfusion?

    Paiz Community Member

      Is it possilbe to determine if a String, for example "Not Possible",  exists in a PDF using CFPDF or another function inside of Coldfusion?  If so, any suggestions on how to do this would be appreciated.

       

      Thanks!

        • 1. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
          BKBK MVP

          Yes, it is possible. In the following example, the 2 files and the PDF ('myDoc.pdf') are in the same directory.

           

          textFromPDF.cfm

           

          <!--- Convert from PDF to text and search text --->

          <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

          <cfset ddxfile = "#currentDir#myDDX.ddx">

          <cfset inputStruct=StructNew()>

          <cfset inputStruct.Doc1= "#currentDir#myDoc.pdf">

           

          <cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

          <cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

           

          <cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

           

          <cfif myDDXVar.out1 is "successful"><!--- read the text --->

              <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

          </cfif>

          Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

           

          myDDX.ddx

           

          <?xml version="1.0" encoding="UTF-8"?>

          <DDX xmlns="http://ns.adobe.com/DDX/1.0/"

             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

             xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

             <DocumentText result="Out1">

                <PDF source="Doc1"/>

             </DocumentText>

          </DDX>

          • 2. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
            Paiz Community Member

            BKBK,

             

            Thanks for your help.  You've gotten me off to a great start! 

             

            I keep getting a DDX is invalid error, Check for invalid construct or restricted keywords.   Is it possible that your ddx is somehow misformed?

            • 3. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
              BKBK MVP

              I suspect you made the same mistake I did in the beginning. Note that there is a space before the word coldfusion in:

              "http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd"

              • 4. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                Paiz Community Member

                Thanks!  Actually I did end up eventually finding that space.

                 

                I think the issue I'm having is that I'm trying to do this from a binary file.   I'm storing my pdfs in a database as blobs.  I can successfully read them out of the database but I'm having issues incorporating the binary blob into the example above.  

                 

                I think it would be something like

                 

                <cfset inputStruct.Doc1= "#ToString(query.pdfBinaryVariable)#">

                 

                But I can't get that to work out.  Any thoughts?

                • 5. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                  Paiz Community Member

                  I deceided to just write the file locally to see if that would help.  It successfully writes the file and I can open the pdf with Adobe.

                   

                  When I run the code, it never writes the my_PDF_doc_as_text.xml file.    The myDDXVar keeps reporting back failed. 

                   

                  When I dump myDDXVar is says

                   

                  failed: 0, Size: 0

                  • 6. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                    BKBK MVP

                    Hmm, challenging construction! What about first writing the PDF to disk? I know it's inefficient, but let us first get a working example.

                     

                    <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

                    <cfset ddxfile = "#currentDir#myDDX.ddx">

                     

                    <cffile action="write" file="#currentDir#myNewDoc.pdf" output="#pdfBinaryVariable#" >

                     

                    <cfset inputStruct=StructNew()>

                    <cfset inputStruct.Doc1= "#currentDir#myNewDoc.pdf">

                     

                    <cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

                    <cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

                     

                    <cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

                     

                    <cfif myDDXVar.out1 is "successful"><!--- read the text --->

                        <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

                        <!---<cfdump var="#my_PDF_doc_as_text#">--->

                    </cfif>

                    Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

                    • 7. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                      Paiz Community Member

                      BKBK,

                       

                      Same result when I write the file to disk.

                       

                      When I dump myDDXVar is says

                       

                      failed: 0, Size: 0

                      • 8. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                        BKBK MVP

                        Can you open the newly created PDF? Is its content what you expected?

                        • 9. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                          Paiz Community Member

                          Yes.  When I open the new pdf.  It is exactly what I'm expecting coming out of the DB.  I just can't get CF to spit out the text file

                          • 10. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                            BKBK MVP

                            Could you please show us your code.

                            • 11. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                              Paiz Community Member

                              I appologize!  I should have done that first!

                               

                              <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

                               

                              <cfset ddxfile = "#currentDir#myDDX.ddx">

                               

                              <cffile action="write" file="#currentDir#myNewDoc.pdf" output="#query.pdf#" >

                               

                              <cfset inputStruct=StructNew()>

                              <cfset inputStruct.Doc1= "#currentDir#myNewDoc.pdf">

                               

                              <cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

                              <cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

                               

                              <cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

                               

                              <br><cfoutput>#myDDXVar.Out1#</cfoutput><br>

                              <cfdump var="#myDDXVar#">

                               

                              <cfif myDDXVar.Out1 is "successful"><!--- read the text --->

                                  <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

                                  <!---<cfdump var="#my_PDF_doc_as_text#">--->

                              </cfif>

                               

                              Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

                              • 12. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                Paiz Community Member

                                My DDX file:

                                 

                                 

                                <?xml version="1.0" encoding="UTF-8"?>

                                <DDX xmlns="http://ns.adobe.com/DDX/1.0/"

                                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                                   xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

                                   <DocumentText result="Out1">

                                      <PDF source="Doc1"/>

                                   </DocumentText>

                                </DDX>

                                • 13. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                  Paiz Community Member

                                  Interesting.  

                                   

                                  I tried using a different PDF.   Just a random pdf I had on my file system not one that was coming out of the DB.   Using the random pdf the code worked perfectly.  

                                   

                                  Using what was coming out of the DB, gets me the error.    The pdf opens without an issue in Adobe reader though. 

                                   

                                  The random file has

                                   

                                  PDF Producer: Adobe PDF Library 9.9

                                  PDF Vesion: 1.6 (Acrobat 7.x)

                                   

                                  The files out of the database has

                                  PDF Producer: iText 2.0.2 (by lowagie.com)

                                  PDF Vesion: 1.4 (Acrobat 5.x)

                                   

                                   

                                   

                                  * I thought maybe it was a version issue.  I tried reading the db file and using cfpdf to write it back to the file system as a differenct version.  Unfortunatly that failed as well.

                                  • 14. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                    Paiz Community Member

                                    By the way,  CF9 has an extracttext action on cfpdf!

                                     

                                    <cfpdf action="read" source="pdfFile.pdf" name="mypdf">

                                    <cfpdf

                                        action="extracttext"

                                        source= "mypdf"

                                        pages = "*"

                                        type = "xml"

                                        destination = "#currentDir#testxml.xml" >

                                    • 15. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                      BKBK MVP

                                      Paiz wrote:

                                       

                                      By the way,  CF9 has an extracttext action on cfpdf!

                                       

                                      <cfpdf action="read" source="pdfFile.pdf" name="mypdf">

                                      <cfpdf

                                          action="extracttext"

                                          source= "mypdf"

                                          pages = "*"

                                          type = "xml"

                                          destination = "#currentDir#testxml.xml" >

                                      Ahhh, there's the kind of efficiency we want! I went with DDX from memory, as I had used it a lot in a project. I honestly didn't think of 'extractText'! Thanks for bringing it in and lightening the load.

                                       

                                      However, at least, as I see it, the main problem remains how to go from the byte array from the database to the text file. What about something like this:

                                       

                                      <cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

                                       

                                      <cffile action="write" file="#currentDir#myNewDoc.pdf" output="#query.pdf#" >

                                      <cfpdf action="extracttext" source="#currentDir#myNewDoc.pdf" name="txtFromPdf">

                                       

                                      <p>Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",txtFromPdf)#</cfoutput></p>

                                      • 16. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                        Paiz Community Member

                                        BKBK,

                                         

                                        Unfortuantley I think the issue is with the PDF not the code.  We use 2 differnt methods to generate the pdfs based on what we need.  One of those methods is an AFP2PDF  process and it appears those pdfs are somehow corrupted.   The Adobe reader opens them just fine, but internal they are somehow corrupted.  Similar to how a web browser is forgiving and will still display a web page if you have misformed HTML.

                                         

                                        I got the code to work with other pdfs, just not the ones I need it to work on.     I need to research how we generate those pdfs.

                                         

                                        Thanks again for all your help!

                                        • 17. Re: Can you find a string inside of apdf using cfpdf in Coldfusion?
                                          BKBK MVP

                                          That must feel like a bit of a downer, after your discovery of the extractText action!