Either in the printer > Adobe PDF > properties > settings tab, select the PDF/A settings file (don't remember the name, but it should be obvious). If you are using PDF Maker in WORD or such, select the create PDF preferences and again select the PDF/A settings file.
If you have created a PDF, you can also use preflight and select the convert to PDF/A script.
(I need to go to my other computer if you need names of scripts and such.)
It would be against the point of PDF/A to make the PDF/A file and then OCR it.
The PDF/A file is supposed to be (a) final, for archiving (b) treatable as effectively uneditable (c) containing the maximum possible information for text extraction, so OCRd. Also, even if the OCR worked the file would not be PDF/A any more.
So, change the order: OCR first, then PDF/A. Convert to PDF/A must always be the last step.
I understand that OCR-ing a PDF/A file would defeat the purpose.
My problem is:
I have a PDF file which has been OCR-ed.
When I convert the file to PDF/A, I seem to lose the OCR capability.
Is this what is supposed to happen? Perhaps I don’t understand what OCR does to the file – embed something?
Penelope (Penny)
After OCR-ing, you have the ability to do “find” for a name – or other text.
These files are paper timesheets which have been scanned to PDF.
I am trying to convert them to PDF/A for use as backup for IRS regulations.
I “assumed” (yeah, I know) that the OCR capability would carry forward in the conversion.
If this is not the case, just let me know and we’ll look them up another way.
Thanks for your patience.
Penny (Penelope) Dudley
To help clarify the OCR a bit. You can only OCR an imaged PDF one time. If the PDF is not a set of pure images it can not be OCR'd. OCR stands for optical character recognition and once it has been done there is no need to redo it. There are multiple options with the OCR. 1. Clearscan replaces text it finds with the best guess. If you were wanting to convert to a text document and deal with editing, this would be the choice. 2-3. Searchable text leave the image alone (maybe what you want to keep the original time sheet view intact) and adds searchable text in a layer behind the image. This text allow the search capability.
That being said, there is not real sense to retain OCR capability if you have OCR'd a document. The only question might be if the search capability would be retained. As I mentioned before, you would probably use the preflight to do this conversion to PDF/A.
I just did an OCR (searchable on a graphic PDF). I then went to preflight and use the conversion to PDF/A. I first tried the 1a version and got errors and then repeated with the 1b and it said ti worked. After that the file was still searchable. I used AA9 for reference. With that said, it would seem that the process works as I understand your need.
I have tested this two ways online:
Open a PDF file with OCR capability
SAVE AS “appropriate file name”
SAVE AS TYPE PDF/A (.*pdf)
Preflight setting = SAVE AS PDF/A-1b
Just re-tried this and it worked!; I can “find” !
Open a PDF file with OCR capability
SAVE AS OTHER “appropriate file name”
ARCHIVABLE PDF (PDF/A)
Just re-tried this and it also worked!: I can “find” !
Guess I must have done something “strange”, but no problems, now.
Thanks for your help & patience.
Penelope (Penny) Dudley
North America
Europe, Middle East and Africa
Asia Pacific