2 Replies Latest reply on Jul 6, 2011 2:40 PM by Joe The Engineer

    Using Excel VBA to save PDF file as a text (plain) file?

    Unprintable Color

      I'm using 8.0 Acrobat Pro.

       

      Third party apps are not an option (I'm restricted from installing anything on my office PC).

       

      I could simply save each PDF manually, but given the number of files this would not be practical.

       

      The text it will output does not have to be pretty by any means (and I'm aware it won't be).

       

      As long as I can get the PDF files in a text file format, there would be no further obstacles for me - because I would then be able to get Excel to gather-import all the data and consolidate-clean-organize everything accordingly at that point.

       

      What's in these PDF files are simply names, numbers, and email addresses (basic office text type data stuff simple).

       

      I wish to save them all as text files, and then use Excel to suck all the data from those newly created text files and consolidate-organize them accordingly.

       

      The VBA module I've seen referrenced around the web is the following beneath; however, I am having no luck with it.

       

       

       

      I keep getting the error message:

      ActiveX component can't create object or  return reference to this object (Error 429)

       

      I am running Microsoft Office 2010 | Excel (and using Adobe Acrobat Pro 8.0).

       

      Any help is extremely appreciated.

       

      Here is the VB code in Excel I'm having trouble with:  What this is supposed to do is simply find the specified PDF file and save it as a Plain Txt File.

       

       

      Sub Convert_PDF_To_Text_File()

      '---------------------------------

      Dim AcroXApp As Object

      Dim AcroXAVDoc As Object

      Dim AcroXPDDoc As Object

       

      Set AcroXApp = CreateObject("AcroExch.App")

      AcroXApp.Hide

       

      Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")

      AcroXAVDoc.Open strPDFPath, "Acrobat"

      AcroXAVDoc.BringToFront

       

      Set AcroXPDDoc = AcroXAVDoc.GetPDDoc

       

      Dim jsObj As Object

      Set jsObj = AcroXPDDoc.GetJSObject

       

      jsObj.SaveAs strOutputFile, "com.adobe.acrobat.plain-text"

       

      AcroXAVDoc.Close False

      AcroXApp.Exit

      '---------------------------------

      End Sub

        • 1. Re: Using Excel VBA to save PDF file as a text (plain) file?
          Joe The Engineer

          Have you referenced the adobe/acrobat type library in the VBA editor?  (Tools -->References ----->  Whichever type library references all that PDF junk)  I can't remember which type library it is...maybe AcroBrokerLib or maybe it's just Acrobat.  Either way, if you're not referencing it in your code, then your VB code doesn't know what to do with it.  First, I'd say try referencing the Adobe Acrobat 8.0 Type Library....

          • 2. Re: Using Excel VBA to save PDF file as a text (plain) file?
            Joe The Engineer Level 1

            one other thought is that you can create text files with a batch process inside of acrobat.  Advanced ---> Document Processing ----> Batch Processing.

             

            Choose new sequence, give it a name

             

            Step 1: Select Commands ----> Choose "execute javascript", then hit "Add."  Click on "Execute Javascript" that was just added on the right, then hit the edit button and toss in the script below that I grabbed from the javascript samples (also, if your docs have multiple pages you'll need to embed the for loop into another for loop that cycles through each page).  You'd also want to change the name of the saved txt file to the name of pdf you're doing it to.  It shouldn't be that hard, but I'm guessing you're probably alot more familiar with VBA then javascript which is why you're choosing to do it from excel.  I'm kind of in the same boat of having written a bunch of VBA, but not much javascript.  I highly recommend investing some time into learning acrobat scripting b/c VB does have it's limitations when it comes to messing with PDF's.

             

            /**

            * function to extract the text content of the current page and save to a file.

            */

             

            try  {

            var p = this.pageNum;

            var n = this.getPageNumWords(p);

            app.alert("Number of words in the page: " + n);

             

            var str = "";

            for(var i=0;i<n;i++) {

            var wd = this.getPageNthWord(p, i, false);  

            if(wd != "") str = str + wd;  

            }

             

            // save the string into a data object

            this.createDataObject("whatever.txt",str); 

             

            // pop up a file selection box to export the data

            this.exportDataObject("whatever.txt");

             

            // clean up

            this.removeDataObject("whatever.txt");

             

            } catch (e)  { 

            app.alert(e)

            };