Skip navigation
Currently Being Moderated

Converting Election PDF's to Excel;

Nov 25, 2012 8:19 PM

Tags: #converting #convert_pdf_to_word_document #acrobat_xi

When converting several different election PDF's to Excel, the conversion does not come out correctly. Individual rows in the PDF's are grouped together as sets of three in the corresponding Excel file. It is very difficult to work with the resulting Excel file.

 

For example try converting this PDF election file: http://www.acgov.org/rov/current.htm

 

The result in Excel becomes this: (note how three rows are gouped in each cell)

200200 - Election Day Reporting
200300 - Vote by Mail Reporting
200300 - Election Day Reporting
715
589
589
256
317
168
35.80
53.82
28.52
227
283
148
4
2
5
1
1
0
19
23
13
2
6
0
1
0
1
2
1
1
0
0
0
200600 - Vote by Mail Reporting
200600 - Election Day Reporting
200700 - Vote by Mail Reporting
684
684
671
353
220
347
51.61
32.16
51.71
312
186
305
13
7
12
1
0
1
21
23
25
2
2
1
1
1
2
2
0
1
1
0
0
200700 - Election Day Reporting
200800 - Vote by Mail Reporting
200800 - Election Day Reporting
671
617
617
199
333
188
29.66
53.97
30.47
185
302
170
1
0
7
0
0
0
9
25
6
1
4
4
2
0
1
0
2
0
0
0
0

 

I have an Adobe account and used the online converter. I also had problem converting such files with Adobe Acrobat Pro XI.

 

Are there any options or other trick that can be used to convert such files?

 

Thanks

 
Replies
  • Currently Being Moderated
    Nov 27, 2012 11:20 AM   in reply to g_choquette

    It just looks like your first column needs to be expanded a bit in Excel to display all the text. I just tried to convert a PDF page to Excel and it looks exactly like the PDF table. Here is a snapshot:

     

    excel.png

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 27, 2012 1:28 PM   in reply to LoriAUC

    Lori,

     

    If you change the row height to, say 27 or 30, do you have multiple logical rows/lines in a single row of the spreadsheet?

    .

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 27, 2012 1:32 PM   in reply to g_choquette

    Short response:
    "You cannot make a silk purse from a sow's ear."


    Expanded response:
    Just looked at "sovc2012-11-06.pdf".
    From looking at the PDF and at the content exported to PDF -
    The Excel file has multiple rows in a single Excel row.
    re: (note how three rows are grouped in each cell)
    .
    Root cause for export content needing cleanup = the PDF is not a tagged PDF.
    Viewing the content one "sees" and extrapolates the intended logical hierarchy.
    However the PDF content has no logical hierarchy imposed as it is not a well-formed tagged PDF.
    ("well-formed" as defined by ISO 14289-1, the ISO standard for PDF/UA).
    .
    Export needs the content's logical hierarchy to be known.
    When this is not present heuristics are used. How effective this is is dictated on how the content was created and processed out to PDF.
    The PDF I viewed is intended to represent tabular data.
    That is not how the content in the PDF was created/painted.
    .
    If you have Acrobat Pro open the PDF then open the Content Panel.
    Ignore the "Path" entries.
    Select a "Text:" entry.
    From the Panel's Options menu select the "Highlight" choice.
    Now move down, selecting "Text:" entries for a given page.
    Observe how what is highlighted bounce about different locations on the page.
    Whatever application was used to feed Distiller (which produced the PDF) did not provide requisite logical hierarchy.
    .
    The page bounce I mentioned displays the order in which textual content was painted to the PDF page.
    .
    A consequence of how the PDF source content was mastered and how this was processed to PDF is that what we see/know to be, logically, discrete rows gets bundled during export and placed into a single row.
    This would not occur had the PDF content been created as proper tabular data with correct application of the PDF <Table> element and this element's child elements/tags.
    .
    If this PDF had been mastered properly and appropriate tag management used the tagged output PDF would've had a well-formed <Table> element with proper child elements.
    An export of this to Excel would've required little to no cleanup in the Excel file.
    .
    Unfortunately you do not have the "well-formed tagged PDF".
    Consequently ExportPDF or Acrobat export is hampered.
    n.b., manually tagging to provide a proper <Table> is, typically, not practicable.
    For this particular content it'd be a waste of life-minutes.
    .
    Unfortunately, you'll have to perform manual cleanup of the content exported to Excel to get where you want to be.
    As good as ExportPDF and Acrobat export are they are constrained by what is provided as input.
    .
    It's unfortunate that Alameda did not provide well-formed tagged PDF on it's publicly facing web site.
    Consequently they are not accessible to many (i.e., Section 508 accessibility), they are not fully supportive of "re-purpose" via export, they are not fully supportive of use on mobile devices.
    Disappointing because with Acrobat Pro (and perhaps some ancillary supporting tools) it is not so difficult to provide well-formed tagged PDF.

    .
    Be well...

     

     

    
     
    |
    Mark as:
  • Currently Being Moderated
    Nov 27, 2012 2:02 PM   in reply to CtDave

    I just tried expanding row 27 and 30 and I only see one row of data. What version of Excel are folks using? I’m running 2010.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 27, 2012 4:17 PM   in reply to LoriAUC

    Lori,

     

    For this round I'm using Acrobat X and Office 2007.

     

    Just an aside here but while the blivets that crop up tend to command attention it is worth stepping back to observe just how well ExportPDF and Acrobat's export performs when feed something other than a well-formed tagged PDF. Speaking for myself, I'm impressed.

    .

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 27, 2012 9:30 PM   in reply to CtDave

    I go a rather Nasty surprise on the Mac Version of Acrobat XI In order to convert the PDF I have to subscribe to $19.95 a month subscription in order to convert Documents. Fortunately on Mac you can have more than one version because everything is Compartmentalized. So I opened Acxrobat X and was able to convert to excel Document with no problem. Looks pretty good I had to Reformat a little.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 28, 2012 6:39 AM   in reply to Phillip Jones

    That is a rather nasty suprise. The only problem I found was that the colums E, F, and G didn't quite come out correctly but that's just due to the way the table was originally created -- as Dave mentioned.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 28, 2012 8:24 AM   in reply to LoriAUC

    I simply changed Column and Row widths as needed.  (After I converted with Acrobat X) The workbook was so large it took minutes to process. Looks like they would have devided them up in sheets. made much easier to format.

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 30, 2012 9:02 AM   in reply to Phillip Jones

    Phillip - can you confirm that you were using Acrobat XI Pro. on the Mac and not Reader XI when you were prompted for the ExportPDF service?

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 30, 2012 12:00 PM   in reply to LoriAUC

    Yes I know the difference.  I was using Acrobat XI

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points