11 Replies Latest reply: Dec 30, 2012 12:00 PM by Phillip Jones RSS

    Converting Election PDF's to Excel;

    g_choquette Community Member

      When converting several different election PDF's to Excel, the conversion does not come out correctly. Individual rows in the PDF's are grouped together as sets of three in the corresponding Excel file. It is very difficult to work with the resulting Excel file.

       

      For example try converting this PDF election file: http://www.acgov.org/rov/current.htm

       

      The result in Excel becomes this: (note how three rows are gouped in each cell)

      200200 - Election Day Reporting
      200300 - Vote by Mail Reporting
      200300 - Election Day Reporting
      715
      589
      589
      256
      317
      168
      35.80
      53.82
      28.52
      227
      283
      148
      4
      2
      5
      1
      1
      0
      19
      23
      13
      2
      6
      0
      1
      0
      1
      2
      1
      1
      0
      0
      0
      200600 - Vote by Mail Reporting
      200600 - Election Day Reporting
      200700 - Vote by Mail Reporting
      684
      684
      671
      353
      220
      347
      51.61
      32.16
      51.71
      312
      186
      305
      13
      7
      12
      1
      0
      1
      21
      23
      25
      2
      2
      1
      1
      1
      2
      2
      0
      1
      1
      0
      0
      200700 - Election Day Reporting
      200800 - Vote by Mail Reporting
      200800 - Election Day Reporting
      671
      617
      617
      199
      333
      188
      29.66
      53.97
      30.47
      185
      302
      170
      1
      0
      7
      0
      0
      0
      9
      25
      6
      1
      4
      4
      2
      0
      1
      0
      2
      0
      0
      0
      0

       

      I have an Adobe account and used the online converter. I also had problem converting such files with Adobe Acrobat Pro XI.

       

      Are there any options or other trick that can be used to convert such files?

       

      Thanks

        • 1. Re: Converting Election PDF's to Excel;
          LoriAUC Community Member

          It just looks like your first column needs to be expanded a bit in Excel to display all the text. I just tried to convert a PDF page to Excel and it looks exactly like the PDF table. Here is a snapshot:

           

          excel.png

          • 2. Re: Converting Election PDF's to Excel;
            CtDave Community Member

            Lori,

             

            If you change the row height to, say 27 or 30, do you have multiple logical rows/lines in a single row of the spreadsheet?

            .

            Be well...

            • 3. Re: Converting Election PDF's to Excel;
              CtDave Community Member

              Short response:
              "You cannot make a silk purse from a sow's ear."


              Expanded response:
              Just looked at "sovc2012-11-06.pdf".
              From looking at the PDF and at the content exported to PDF -
              The Excel file has multiple rows in a single Excel row.
              re: (note how three rows are grouped in each cell)
              .
              Root cause for export content needing cleanup = the PDF is not a tagged PDF.
              Viewing the content one "sees" and extrapolates the intended logical hierarchy.
              However the PDF content has no logical hierarchy imposed as it is not a well-formed tagged PDF.
              ("well-formed" as defined by ISO 14289-1, the ISO standard for PDF/UA).
              .
              Export needs the content's logical hierarchy to be known.
              When this is not present heuristics are used. How effective this is is dictated on how the content was created and processed out to PDF.
              The PDF I viewed is intended to represent tabular data.
              That is not how the content in the PDF was created/painted.
              .
              If you have Acrobat Pro open the PDF then open the Content Panel.
              Ignore the "Path" entries.
              Select a "Text:" entry.
              From the Panel's Options menu select the "Highlight" choice.
              Now move down, selecting "Text:" entries for a given page.
              Observe how what is highlighted bounce about different locations on the page.
              Whatever application was used to feed Distiller (which produced the PDF) did not provide requisite logical hierarchy.
              .
              The page bounce I mentioned displays the order in which textual content was painted to the PDF page.
              .
              A consequence of how the PDF source content was mastered and how this was processed to PDF is that what we see/know to be, logically, discrete rows gets bundled during export and placed into a single row.
              This would not occur had the PDF content been created as proper tabular data with correct application of the PDF <Table> element and this element's child elements/tags.
              .
              If this PDF had been mastered properly and appropriate tag management used the tagged output PDF would've had a well-formed <Table> element with proper child elements.
              An export of this to Excel would've required little to no cleanup in the Excel file.
              .
              Unfortunately you do not have the "well-formed tagged PDF".
              Consequently ExportPDF or Acrobat export is hampered.
              n.b., manually tagging to provide a proper <Table> is, typically, not practicable.
              For this particular content it'd be a waste of life-minutes.
              .
              Unfortunately, you'll have to perform manual cleanup of the content exported to Excel to get where you want to be.
              As good as ExportPDF and Acrobat export are they are constrained by what is provided as input.
              .
              It's unfortunate that Alameda did not provide well-formed tagged PDF on it's publicly facing web site.
              Consequently they are not accessible to many (i.e., Section 508 accessibility), they are not fully supportive of "re-purpose" via export, they are not fully supportive of use on mobile devices.
              Disappointing because with Acrobat Pro (and perhaps some ancillary supporting tools) it is not so difficult to provide well-formed tagged PDF.

              .
              Be well...

               

               

              • 4. Re: Converting Election PDF's to Excel;
                LoriAUC Community Member

                I just tried expanding row 27 and 30 and I only see one row of data. What version of Excel are folks using? I’m running 2010.

                • 5. Re: Converting Election PDF's to Excel;
                  g_choquette Community Member

                  I'm using MS Excel 2010. In the spreadsheet, rows 5,6,7 get converted correctly, but the rest are grouped in three rows per cell.

                   

                  I'll accept CTDave's explanation that the PDF file was not created correctly.

                   

                  What's pretty bad is most election results (nationwide) are output this way and they don't have any other output format, such as .csv, .txt or .xls.

                   

                  It makes analysis of elections very difficult for that reason alone. The election machine manufacturers don't want to change their output formats.

                  • 6. Re: Converting Election PDF's to Excel;
                    CtDave Community Member

                    Lori,

                     

                    For this round I'm using Acrobat X and Office 2007.

                     

                    Just an aside here but while the blivets that crop up tend to command attention it is worth stepping back to observe just how well ExportPDF and Acrobat's export performs when feed something other than a well-formed tagged PDF. Speaking for myself, I'm impressed.

                    .

                    Be well...

                    • 7. Re: Converting Election PDF's to Excel;
                      Phillip Jones Community Member

                      I go a rather Nasty surprise on the Mac Version of Acrobat XI In order to convert the PDF I have to subscribe to $19.95 a month subscription in order to convert Documents. Fortunately on Mac you can have more than one version because everything is Compartmentalized. So I opened Acxrobat X and was able to convert to excel Document with no problem. Looks pretty good I had to Reformat a little.

                      • 8. Re: Converting Election PDF's to Excel;
                        LoriAUC Community Member

                        That is a rather nasty suprise. The only problem I found was that the colums E, F, and G didn't quite come out correctly but that's just due to the way the table was originally created -- as Dave mentioned.

                        • 9. Re: Converting Election PDF's to Excel;
                          Phillip Jones Community Member

                          I simply changed Column and Row widths as needed.  (After I converted with Acrobat X) The workbook was so large it took minutes to process. Looks like they would have devided them up in sheets. made much easier to format.

                          • 10. Re: Converting Election PDF's to Excel;
                            LoriAUC Community Member

                            Phillip - can you confirm that you were using Acrobat XI Pro. on the Mac and not Reader XI when you were prompted for the ExportPDF service?

                            • 11. Re: Converting Election PDF's to Excel;
                              Phillip Jones Community Member

                              Yes I know the difference.  I was using Acrobat XI