We have improved the OCR engine a lot in Acrobat 10. I would urge you to try OCRing the file with Acrobat 10 once. For downloading the trial version of Acrobat 10, please go to https://www.adobe.com/cfusion/tdrc/index.cfm?product=acrobat_pro&loc=ap
Okay. How do you get Acrobat 9 to do this???
My company just upgraded our Acrobat software to version 10. I tried the OCR on the same document, and it still does not recognize the vertical text.
Please provide instructions on how to get Acrobat 10 to recognize vertical text (and angled text).
Message was edited by: cjkngrt8765
Yes, I would like to know how to get this to work in version 9 and/or 10. I have tried a test file with both versions and both give me the same exact results...a skew of +4^ to -1^. Anything beyond doesn't get recognized. I have attached a test file for you to try and get any different results...there is a "0" at the 0-degree location and each piece of text is rotated 1 degree (http://www.seanduphily.com/OCRtest.pdf).
This file looks too complicated for OCR engine. I would like to tell you that the OCR engine determines a particular deskew angle before starting the OCR process. In this file, when the engine would be analyzing the skew angle, it must be gettting multiple skew angles and hence it would pick up the one that looks most apt to it. Thus, when you run OCR on the file, the content should be skewed at same angle overall.
Can you please send any real world document on which you are facing the issue?
Additionally, I would like to tell you there is no specific setting you need to specify for OCRing any skewed file.
Thank you for replying with an answer, however based on your response, I'm still confused on how the OCR engine is able to recognize "vertical text". I tried a test based on your comment "Thus, when you run OCR on the file, the content should be skewed at same angle", so I created a simple one: http://www.seanduphily.com/OCR/OCR_test_35_45.pdf . Well I wasn't surprised...nothing was picked up even though all the content was closer to a different angle.
OK...let's try truely vertical...http://www.seanduphily.com/OCR/OCR_test_90.pdf . Still doesn't work. Ok...maybe it doesn't like the font or some other setting is messed up...http://www.seanduphily.com/OCR/OCR_test_0_90.pdf . OK...nothing is messed up...the horizontal text gets picked up with no problems.
Is there a document that you can provide us that shows us this vertical text your reffering to that can be picked up by the OCR engine? There must be some sort of example that you can provide us that way we can see what our own issue may be.
As for "real world' example...http://www.seanduphily.com/OCR/OCR_test_plan01.pdf . The text I'm trying to pickup in this document are for the parcels. Unfortunately the vertical text doesn't get picked up at all.
This site won't let me upload a PDF (which is quite ironic, if you think about it), so please see this "real world" example: ftp://ftp.ni.com/ddraw/cu_sbrio-96xx.pdf
I used the OCR in Adobe Acrobat X Version 10.1.2 on the drawing in the link above, and it did not pick up any of the vertical text.
In fact, it did a very poor job of picking up the horizontal text, as well.
Keep in mind that this is a user-2-user forum and not "formal" Adobe Support.
Yes, sometimes Adobe employees drop by. However, for formal support interaction you'll need to visit the Support page.
If chat does not suffice you'd go to the Support Portal, start a ticket (and make payment), and roll the dice.
Here, the primary resource are the other users.
I did run OCR (Searchable Image (Exact)) across the OCR test plan01 PDF.
Got most (not all) of both vertical and horizontal text.
This was with X and 9 Pro.
Cannot speak to why you get nothing for the vertical.
As the "intent" of Searchable Image / Searchable Image (Exact) is to support find/search rather than recreation of the attributes of the authoring file from which the paper came I don't expect that from the hardcopy out of CAD that I often must process with Acrobat. But, getting nothing has to be an aggrevation.
CAD to PDF file with no intermediate paper-scan-ocr step is what I like best.
Unfortunately, I can't access the OCR_test_plan01.pdf (company firewall is blocking it), so I can't try OCR'ing this file.
Can you try this file? ftp://ftp.ni.com/ddraw/cu_sbrio-96xx.pdf
For me, it wasn't able to pick up any of the vertical text, and it did a lackluster job of recognizing the horizontal text, too.
$39 per incident? That's highway robbery... =/
Used Acrobat 9 on "sbrio".
Again, got horizontal and vertical but not all.
Just a feeling but what's used to output the PDF may be a contributing variable.
I'm thinking that if the source CAD application supports it you might run a drawing through its native PDF output routine or if the CAD application has PDFMaker support try output of a PDF through that.
This ought to yield the text as renderable content on a layer. Flatten layers or not but the text is renderable and accessible via find/search.
Here's what I got.
Thanks for your patience. I have anlalyzed the files and here are my shortcomings:
1: http://www.seanduphily.com/OCR/OCR_test_35_45.pdf - Please note that this is not a skewed file. While refering to skewness for OCR, we mean a file having minor orientation problems encountered while creating the doucment (from a scanner or an image or any other source). Thirty five degrees or forty five degrees angled text cannot be considered as skewed text. Moreover, for real world documents, majority of the content is aligned at the same angle and not at random angles.
2: http://www.seanduphily.com/OCR/OCR_test_90.pdf - This file has a simple workaround. Rotate the file and run OCR. One observation that I have for this file is it contains only vector arts and no images. I would like to ask is file created using some third party application?
3: http://www.seanduphily.com/OCR/OCR_test_0_90.pdf - Same points as above. I rotated the file and some text was recognized for this file. We already have an issue in our database that we are at times not able to recognize vertical text when we have horizontal text in the file.
4: http://www.seanduphily.com/OCR/OCR_test_plan01.pdf - File has a lot of noise which is causing issue for OCR to run successfully on the file. We are looking at this file and would shortly update regarding the issue in this file.
5: ftp://ftp.ni.com/ddraw/cu_sbrio-96xx.pdf - This is a CAD file and we are facing some issues for the same. We are looking at the file and if we have any update, I would let you know.
For your reference, I have share some files. Please find the file at:
Just curious, what is sbrio?
Unfortunately, I don't have access to most native files, so I can't create my own PDF's from them. (On native files that I can access, I'm able to create PDF's that recognize the text.)
I reviewed the OCR'ed file you posted (https://acrobat.com/#d=arq5XMPXHnRABRPS1rr8VQ). I'm not sure what you are seeing, but your results appear to be exactly the same as mine (some horizontal text recognized, no vertical text recognized). If you attempt to search for any of the vertical text using Acrobat's find/search function, nothing is found.
My end goal is to be able to search for and find all characters (text, numbers, and ascii symbols) in a CAD drawing using Acrobat's find function. Currently, I have to manually search (by eye) for words or numbers. In a circuit card assembly drawing with lots of reference designators, this can take a very long time. Being able to automatically find words or numbers using the find function would save my company countless hours (and thus a lot of money).
Sit'n on the dock watching the tide roll out so missed this.
"sbrio" the file "cu_sbrio-96xx.pdf" (the target file at the URL posted in reply #14).
I get to work with CAD drawings landed in PDF rather often.
Those PDF files properly processed out to PDF have all horizontal and vertical renderable text accessible to Find / Search (fwiw the authoring application is Bently Microstation).