1 Reply Latest reply on Jul 12, 2017 11:43 AM by try67

    How to extract Form table from PDF to Excel using JavaScript

    AinulHasan072 Level 1

      Hi,

       

       

      I have been trying to extract certain set of tables that contain the transactions and the portfolio value in my monthly Portfolio statements. I tried to convert the entire PDF document with export feature in acrobat dc, but the problem i faced was that it converted the document in a very terrible formatting which is gonna take more time if i was to use it. Right now i have to manually copy data from this PDF statements and there are hundreds of them.

       

       

      Can someone please tell me if there is a way without using any third party software, i can extract only these tables from these statements.

       

       

      Statement details are as follows:

       

       

      1. It has tables with the names of 'Account Holding','Account Activity Details', 'Account Summary','Product Summary', 'Income Summary','Realized Gain/Loss Summary'. Underneath these tables are the relevant data in accordance with the headings.

      2. There are text like dates , such as 'April 1, 2015 - April 30, 2015' and on the top left and likewise on the top right the 'Account #' with the number  below it. Other important text is that holding party name just slightly below the date on the top right of the page. These details are common on every page.

       

       

       

       

      All i want is to extract these tables only and get them aligned in excel, so that i can use them further for my analysis. Can someone please help me with the JavaScript or any other medium to sort out this issue. This would be very much helpful.

       

       

      If this can be done with a VBA code, then please help me with that too. Thanks a lot in advance . Peace.

        • 1. Re: How to extract Form table from PDF to Excel using JavaScript
          try67 MVP & Adobe Community Professional

          There's no such thing as a table in a PDF file. The text and graphics are just aligned to look like a table, but it's not a real object as such.

          Therefore, it is very difficult (and sometimes impossible) to extract such data in a reliable manner. Acrobat does a pretty good job of it, most of the time, but doing it using a script is extremely challenging. And it's impossible to say for sure whether or not it's possible without seeing some sample files.