We have a LOT of scanned PDF's at work (about 8,400) that we are needing to export into Excel. There are a few tricky parts to this as I'm trying to find if there's a way to automate all of this. These are all individual PDF's, but I know I could convert them all to a multipage PDF if I need to, that shouldn't be an issue. The major issue being we only want a small portion of all of the PDF's, the top left corner that has some basic information (name and address), to be exported to an Excel spreadsheet. The rest of the PDF we do not need. The are all single page PDF's if that helps anything on the automation formatting. Does anyone know if that can be done?
Remember that the output of a scanner is an image -- no "text".
OCR of the scanned image can provide renderable text (hidden or visible glyphs depending on which method of OCR is used).
The OCR output can be exported. Don't expect it to be a 100% recognition of the bit map images of the characters.