Copy link to clipboard
Copied
I'm trying to convert a pdf file into excel. The pdf document is st up with lines and columns. When converting over to Excel, some lines are breaking numbers out into their own cells, lines a including each pdf column into one long merged excel cell.
How can I convert this so each column on each line is in its own cell in excel?
Copy link to clipboard
Copied
Converting from PDF to Word, Excel or any other format is one of the most complex things you can try to do with a PDF file. It works very well in some cases, in other cases the output has very little to do with the original file. The key for success is that the PDF file needs to be "tagged" - which means that it contains information about the information that is displayed in the file. The best way to make sure that a PDF file is tagged correctly is by using the PDFMaker in Acrobat to create the PDF file from Word or Excel (that's the Acrobat ribbon or toolbar).
Unfortunately there is not much you can do to improve the output without spending a lot of time (e.g. by manually tagging the file). Also, if you are using Adobe's ExportPDF service and don't have access to Acrobat, that is not even an option.
The only thing you can do is complain to the original author of the file and tell them that they used a bad PDF generator to create the PDF file.
Sometimes it helps to save the PDF file as a set of high resolution (e.g. 600dpi) images, then import these images back into Acrobat, run OCR and then export to Word or Excel again.
Copy link to clipboard
Copied
Karl, these pdf's files were created rom scans of a paper report form a few years ago. They weren't created from an excel file originally. Based on that, it sounds like we can't do a clean conversion?
Copy link to clipboard
Copied
If Acrobat does not convert the file correctly, there is nothing you can do to configure Acrobat to treat your file differently. It does what it does.
There is another conversion tool that is much harder to use than what's built into Acrobat, but sometimes gives better results: Tabula: Extract Tables from PDFs