• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Can you find a string inside of apdf using cfpdf in Coldfusion?

Explorer ,
Apr 09, 2012 Apr 09, 2012

Copy link to clipboard

Copied

Is it possilbe to determine if a String, for example "Not Possible",  exists in a PDF using CFPDF or another function inside of Coldfusion?  If so, any suggestions on how to do this would be appreciated.

Thanks!

Views

2.9K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Yes, it is possible. In the following example, the 2 files and the PDF ('myDoc.pdf') are in the same directory.

textFromPDF.cfm

<!--- Convert from PDF to text and search text --->

<cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

<cfset ddxfile = "#currentDir#myDDX.ddx">

<cfset inputStruct=StructNew()>

<cfset inputStruct.Doc1= "#currentDir#myDoc.pdf">

<cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

<cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

<cfif myDDXVar.out1 is "successful"><!--- read the text --->

    <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

</cfif>

Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

myDDX.ddx

<?xml version="1.0" encoding="UTF-8"?>

<DDX xmlns="http://ns.adobe.com/DDX/1.0/"

   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

   xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

   <DocumentText result="Out1">

      <PDF source="Doc1"/>

   </DocumentText>

</DDX>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

BKBK,

Thanks for your help.  You've gotten me off to a great start! 

I keep getting a DDX is invalid error, Check for invalid construct or restricted keywords.   Is it possible that your ddx is somehow misformed?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

I suspect you made the same mistake I did in the beginning. Note that there is a space before the word coldfusion in:

"http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd"

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Thanks!  Actually I did end up eventually finding that space.

I think the issue I'm having is that I'm trying to do this from a binary file.   I'm storing my pdfs in a database as blobs.  I can successfully read them out of the database but I'm having issues incorporating the binary blob into the example above.  

I think it would be something like

<cfset inputStruct.Doc1= "#ToString(query.pdfBinaryVariable)#">

But I can't get that to work out.  Any thoughts?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

I deceided to just write the file locally to see if that would help.  It successfully writes the file and I can open the pdf with Adobe.

When I run the code, it never writes the my_PDF_doc_as_text.xml file.    The myDDXVar keeps reporting back failed. 

When I dump myDDXVar is says

failed: 0, Size: 0

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Hmm, challenging construction! What about first writing the PDF to disk? I know it's inefficient, but let us first get a working example.

<cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

<cfset ddxfile = "#currentDir#myDDX.ddx">

<cffile action="write" file="#currentDir#myNewDoc.pdf" output="#pdfBinaryVariable#" >

<cfset inputStruct=StructNew()>

<cfset inputStruct.Doc1= "#currentDir#myNewDoc.pdf">

<cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

<cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

<cfif myDDXVar.out1 is "successful"><!--- read the text --->

    <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

    <!---<cfdump var="#my_PDF_doc_as_text#">--->

</cfif>

Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

BKBK,

Same result when I write the file to disk.

When I dump myDDXVar is says

failed: 0, Size: 0

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Can you open the newly created PDF? Is its content what you expected?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Yes.  When I open the new pdf.  It is exactly what I'm expecting coming out of the DB.  I just can't get CF to spit out the text file

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Could you please show us your code.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

I appologize!  I should have done that first!

<cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

<cfset ddxfile = "#currentDir#myDDX.ddx">

<cffile action="write" file="#currentDir#myNewDoc.pdf" output="#query.pdf#" >

<cfset inputStruct=StructNew()>

<cfset inputStruct.Doc1= "#currentDir#myNewDoc.pdf">

<cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

<cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

<br><cfoutput>#myDDXVar.Out1#</cfoutput><br>

<cfdump var="#myDDXVar#">

<cfif myDDXVar.Out1 is "successful"><!--- read the text --->

    <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

    <!---<cfdump var="#my_PDF_doc_as_text#">--->

</cfif>

Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

My DDX file:

<?xml version="1.0" encoding="UTF-8"?>

<DDX xmlns="http://ns.adobe.com/DDX/1.0/"

   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

   xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

   <DocumentText result="Out1">

      <PDF source="Doc1"/>

   </DocumentText>

</DDX>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Interesting.  

I tried using a different PDF.   Just a random pdf I had on my file system not one that was coming out of the DB.   Using the random pdf the code worked perfectly.  

Using what was coming out of the DB, gets me the error.    The pdf opens without an issue in Adobe reader though. 

The random file has

PDF Producer: Adobe PDF Library 9.9

PDF Vesion: 1.6 (Acrobat 7.x)

The files out of the database has

PDF Producer: iText 2.0.2 (by lowagie.com)

PDF Vesion: 1.4 (Acrobat 5.x)

* I thought maybe it was a version issue.  I tried reading the db file and using cfpdf to write it back to the file system as a differenct version.  Unfortunatly that failed as well.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

By the way,  CF9 has an extracttext action on cfpdf!

<cfpdf action="read" source="pdfFile.pdf" name="mypdf">

<cfpdf

    action="extracttext"

    source= "mypdf"

    pages = "*"

    type = "xml"

    destination = "#currentDir#testxml.xml" >

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 10, 2012 Apr 10, 2012

Copy link to clipboard

Copied

Paiz wrote:

By the way,  CF9 has an extracttext action on cfpdf!

<cfpdf action="read" source="pdfFile.pdf" name="mypdf">

<cfpdf

    action="extracttext"

    source= "mypdf"

    pages = "*"

    type = "xml"

    destination = "#currentDir#testxml.xml" >

Ahhh, there's the kind of efficiency we want! I went with DDX from memory, as I had used it a lot in a project. I honestly didn't think of 'extractText'! Thanks for bringing it in and lightening the load.

However, at least, as I see it, the main problem remains how to go from the byte array from the database to the text file. What about something like this:

<cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

<cffile action="write" file="#currentDir#myNewDoc.pdf" output="#query.pdf#" >

<cfpdf action="extracttext" source="#currentDir#myNewDoc.pdf" name="txtFromPdf">

<p>Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",txtFromPdf)#</cfoutput></p>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 11, 2012 Apr 11, 2012

Copy link to clipboard

Copied

BKBK,

Unfortuantley I think the issue is with the PDF not the code.  We use 2 differnt methods to generate the pdfs based on what we need.  One of those methods is an AFP2PDF  process and it appears those pdfs are somehow corrupted.   The Adobe reader opens them just fine, but internal they are somehow corrupted.  Similar to how a web browser is forgiving and will still display a web page if you have misformed HTML.

I got the code to work with other pdfs, just not the ones I need it to work on.     I need to research how we generate those pdfs.

Thanks again for all your help!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 11, 2012 Apr 11, 2012

Copy link to clipboard

Copied

LATEST

That must feel like a bit of a downer, after your discovery of the extractText action!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation