Skip navigation
Currently Being Moderated

Can Acrobat's OCR software recognize vertical text?

Apr 11, 2012 4:27 PM

Tags: #acrobat #text #ocr #vertical #acrobat_9

Version:  Abobe Acrobat 9 Standard 9.5.0

OS:  Windows XP Pro

 

The PDF document that I am trying to convert has text going in multiple directions.  Some text is horizontal (going from left to right on the page).  And some text is vertical (going from bottom to top on the page).

 

Is there a way to get the OCR software to recognize the vertical text?  I can only get it to recognize the horizontal text.

 

I tried rotating the page (Document > Rotate Pages).  This didn't work.

 

 

Thank you.

 
Replies
  • Currently Being Moderated
    Apr 16, 2012 5:05 AM   in reply to cjkngrt8765

    We have improved the OCR engine a lot in Acrobat 10. I would urge you to try OCRing the file with Acrobat 10 once. For downloading the trial version of Acrobat 10, please go to https://www.adobe.com/cfusion/tdrc/index.cfm?product=acrobat_pro&loc=a p

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 12:58 AM   in reply to cjkngrt8765

    Yes both Acrobat 9 and Acrobat 10 support vertical and slightly skewed text.

    Thanks.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 30, 2012 6:03 AM   in reply to apangasa

    Yes, I would like to know how to get this to work in version 9 and/or 10.  I have tried a test file with both versions and both give me the same exact results...a skew of +4^ to -1^.  Anything beyond doesn't get recognized.  I have attached a test file for you to try and get any different results...there is a "0" at the 0-degree location and each piece of text is rotated 1 degree (http://www.seanduphily.com/OCRtest.pdf).

     
    |
    Mark as:
  • Currently Being Moderated
    May 2, 2012 11:28 PM   in reply to sduphily

    This file looks too complicated for OCR engine. I would like to tell you that the OCR engine determines a particular deskew angle before starting the OCR process. In this file, when the engine would be analyzing the skew angle, it must be gettting multiple skew angles and hence it would pick up the one that looks most apt to it. Thus, when you run OCR on the file, the content should be skewed at same angle overall.

     

    Can you please send any real world document on which you are facing the issue?

     

    Additionally, I would like to tell you there is no specific setting you need to specify for OCRing any skewed file.

     
    |
    Mark as:
  • Currently Being Moderated
    May 3, 2012 4:30 AM   in reply to apangasa

    Thank you for replying with an answer, however based on your response, I'm still confused on how the OCR engine is able to recognize "vertical text".  I tried a test based on your comment "Thus, when you run OCR on the file, the content should be skewed at same angle", so I created a simple one: http://www.seanduphily.com/OCR/OCR_test_35_45.pdf . Well I wasn't surprised...nothing was picked up even though all the content was closer to a different angle.

    OK...let's try truely vertical...http://www.seanduphily.com/OCR/OCR_test_90.pdf . Still doesn't work.  Ok...maybe it doesn't like the font or some other setting is messed up...http://www.seanduphily.com/OCR/OCR_test_0_90.pdf . OK...nothing is messed up...the horizontal text gets picked up with no problems.

     

    Is there a document that you can provide us that shows us this vertical text your reffering to that can be picked up by the OCR engine?  There must be some sort of example that you can provide us that way we can see what our own issue may be.

     

    As for "real world' example...http://www.seanduphily.com/OCR/OCR_test_plan01.pdf .  The text I'm trying to pickup in this document are for the parcels. Unfortunately the vertical text doesn't get picked up at all.

     
    |
    Mark as:
  • Currently Being Moderated
    May 10, 2012 7:04 AM   in reply to cjkngrt8765

    I would still like to see an example from apangasa that has vertical text that Adobe's OCR engine can recognize.

     
    |
    Mark as:
  • Currently Being Moderated
    May 10, 2012 10:06 AM   in reply to cjkngrt8765

    @cjkngrt8765 -

    Keep in mind that this is a user-2-user forum and not "formal" Adobe Support.

    Yes, sometimes Adobe employees drop by. However, for formal support interaction you'll need to visit the Support page.

    http://www.adobe.com/support/acrobat/supportinfo/

    If chat does not suffice you'd go to the Support Portal, start a ticket (and make payment), and roll the dice.

     

    Here, the primary resource are the other users.

     

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    May 10, 2012 10:17 AM   in reply to sduphily

    @sduphily -

    I did run OCR (Searchable Image (Exact)) across the OCR test plan01 PDF.

    Got most (not all) of both vertical and horizontal text.

    This was with X and 9 Pro.

    Cannot speak to why you get nothing for the vertical.

    As the "intent" of Searchable Image / Searchable Image (Exact) is to support find/search rather than recreation of the attributes of the authoring file from which the paper came I don't expect that from the hardcopy out of CAD that I often must process with Acrobat. But, getting nothing has to be an aggrevation.

    CAD to PDF file with no intermediate paper-scan-ocr step is what I like best.

     

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    May 15, 2012 7:54 PM   in reply to cjkngrt8765

    @cjkngrt8765 -

     

     

    Used Acrobat 9 on "sbrio".

    Again, got horizontal and vertical but not all.

    Just a feeling but what's used to output the PDF may be a contributing variable.

    I'm thinking that if the source CAD application supports it you might run a drawing through its native PDF output routine or if the CAD application has PDFMaker support try output of a PDF through that.

    This ought to yield  the text as renderable content on a layer. Flatten layers or not but the text is renderable and accessible via find/search.

     

    Here's what I got.

    https://acrobat.com/#d=arq5XMPXHnRABRPS1rr8VQ

     

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    May 16, 2012 4:13 AM   in reply to CtDave

    Thanks for your patience. I have anlalyzed the files and here are my shortcomings:

    1: http://www.seanduphily.com/OCR/OCR_test_35_45.pdf - Please note that this is not a skewed file. While refering to skewness for OCR, we mean a file having minor orientation problems encountered while creating the doucment (from a scanner or an image or any other source). Thirty five degrees or forty five degrees angled text cannot be considered as skewed text. Moreover, for real world documents, majority of the content is aligned at the same angle and not at random angles.

    2: http://www.seanduphily.com/OCR/OCR_test_90.pdf -  This file has a simple workaround. Rotate the file and run OCR. One observation that I have for this file is it contains only vector arts and no images. I would like to ask is file created using some third party application?

    3: http://www.seanduphily.com/OCR/OCR_test_0_90.pdf -  Same points as above. I rotated the file and some text was recognized for this file. We already have an issue in our database that we are at times not able to recognize vertical text when we have horizontal text in the file.

    4: http://www.seanduphily.com/OCR/OCR_test_plan01.pdf  - File has a lot of noise which is causing issue for OCR to run successfully on the file. We are looking at this file and would shortly update regarding the issue in this file.

    5: ftp://ftp.ni.com/ddraw/cu_sbrio-96xx.pdf - This is a CAD file and we are facing some issues for the same. We are looking at the file and if we have any update, I would let you know.

     

    For your reference, I have share some files. Please find the file at:

    https://acrobat.com/#d=Isje3HucsXQkZHuhUYfZuA

    https://acrobat.com/#d=ykO4PxqxCiuJjvoS7qZWrw

    https://acrobat.com/#d=69XLv7xeGHX1Gqc7q5RdNg

    https://acrobat.com/#d=SzNW9bDIIG2OpuHuXrDeCw

     

    Thanks.

     
    |
    Mark as:
  • Currently Being Moderated
    Feb 17, 2014 5:12 PM   in reply to cjkngrt8765

    Sit'n on the dock watching the tide roll out so missed this.

     

    "sbrio" the file "cu_sbrio-96xx.pdf" (the target file at the URL posted in reply #14).

    I get to work with CAD drawings landed in PDF rather often.
    Those PDF files properly processed out to PDF have all horizontal and vertical renderable text accessible to Find / Search (fwiw the authoring application is Bently Microstation).


    Be well...

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points