This article may be useful.
As the "a" and "b" mean the same as in PDF/A-1 & "u" relates to Unicode mapping I suspect that, from a approach of practicality, you can attempt to obtain "b".
Trying to get "a" from OCR output is rather problematic.
Trying for "u" means you'd have to get the OCR output to properly map to Unicode.
That's not to say something workable is not possible if you are willing to spend the money to obtain one of the high-end third party applications.
Many thanks for your reply.
Getting a searchable PDF/A-2b compliant PDF is exactly what i am trying to achieve.
Do you know how to do this with Adobe Acrobat Pro X?
Just a guess, but you might be facing the 2-ton in a 1-ton pickup truck scenario what with everything the Action is doing.
Are you back fitting PDFs you have already processed to 2b?
Doing the OCR of such after the fact might be contributing to the issue(s).
fwiw, On a small sample of PDF of scanned textual content that had already be processed with Searchable Image (Exact) the Acrobat X Pro Preflight "Convert to PDF/A-2b (sRGB)" analyze and fix provided valid "2b".
I too am experiencing the exact same thing that you are. I can turn a scanned document into PDF/A-2b, but if I then use OCR to recognize text in the document, I can't turn that OCR'd document into PDF/A-2b. If I use Searchable Image (exact) I get the .notdef problem; and if I use ClearScan I get the Width information problem.
Is there any way to convert scanned OCR documents into PDF/A with Acrobat X?
I halso have this problem and I'm a little bit confused, that Acrobat X Pro seems to be unable to make its own OCR compatible to PDF/A-2b.
I'd be very thankful for any hints and workarounds!
Sorry for my the lateness in this reply.
I managed to fix / solve my problem by doing something that in my humble opinion is illogical.
What i did essentially is change the order of the OCR and PDF/A steps.
Originally i had the Preflight (Convert to PDF/A-2b) as the very last step, as in my mind that is logical and where you would want to do it.
I however as mentioned changed the order of the steps so the Preflight (Convert to PDF/A-2b) step i put BEFORE the Reconise Text (OCR) step, i.e you OCR the document AFTER you convert it to PDF/A2b. (i know.. weird!! how can be PDF/A2b compliant still?? but it is!!)
So my complete batched processing action is now as follows:
Step 1 - Remove Hidden Information (everything checked!)
Step 2 - Add Document Description
Step 3 - Preflight (Convert to PDF/A-2b)
Step 4 - Recognize Text (using OCR) (English UK, Searchable Image (Exact)
Save to (step 5 essentially) I have made it add "_Processed" to the original file name and used PDF Optimizer on the Output Format (loads of settings here, too many to mention).
This now compresses my PDF's by a good 50-60%, OCR's them and makes them fully PDF/A-2b compliant.
I have processed in the region of 3000 PDF's (another 5000 to go!) with no issues.
I hope this helps.
This problem still seems to exist!
I'm using Acrobat pro 11.0.06
Your "workaround" doesn't work for me (anymore?). I can't use OCR on a PDF/A – neither in an action nor manually without removing the PDF/A-compliance.
The problem seems to be the use of .notdef-glyphs in Acrobat's OCR.
On some documents the OCR produces .notdef-glyphs (which it probably shouldn't in 2014). Those glyphs aren't allowed in PDF/A-2 and PDF/A-3 anymore.
There is a function in the Preflight PDF/A-presets which is supposed to replace .notdef-glyphs but doesn't do anything after all. I even created a custom Preflight profile which should only replace .notdef-glyphs but even this doesn't work!
That's why Preflight reconverts every single PDF via PostScript, thereby losing all the OCR-text.
PDF/A-1b still works with OCRed scans since PDF/A-1 still allows .notdef-glyphs. But it doesn't allow e.g. jpeg2000!
I guess Acrobat shouldn't use .notdef-glyphs in it's OCR anymore and also should make the Preflight-function which is supposed to replace them working!
I reported those problems with PDF/A conversions in May 2012 but Adobe didn't fix anything.
Is there anybody else still ahving these problems?
Or a new workaround?