21 Replies Latest reply on Sep 15, 2016 10:00 AM by Andrew Fincke

    OCR in Adobe Acrobat Pro DC

    lanorc56965981

      I just want to definitively know if Adobe Acrobat Pro DC has OCR that will allow me to turn PDFs into searchable documents.

        • 2. Re: OCR in Adobe Acrobat Pro DC
          chrisc55043406

          Could you direct me to the instructions for this? The only thing I can find it to convert to text. That would be really helpful. I'm under a major time crunch on reviewing contracts and this is killin me! LOL

          • 4. Re: OCR in Adobe Acrobat Pro DC
            Andrew Fincke Level 1

            A while back you helped me get the OCR to recognize German characters.  When I save the file I'm to hit "Settings" and specify "German."  Now I'm trying to enlarge a Hebrew magazine article in order to make it easier to read.  I'm going page by page, inserting a blank page to give Adobe Acrobat room to move and then cutting text from page 1 and pasting it to page 2, where it will then be doubled in font size.  When I paste the Hebrew, it's ending up gibberish (English transcriptional characters) instead of Hebrew.  Here's page one of the article, which concerns Hebrew semantics in the books of Samuel

             

            To morrow, by that time the ſonne be hote, ye ſhal haue helpe   1 Sam. 11:9

             

            Wordplays_Page_01.jpg

            • 5. Re: OCR in Adobe Acrobat Pro DC
              Andrew Fincke Level 1

              Here's the link to the whole file: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online. I can't get it to work with Acrobat Pro DC, which I'm still testing.  Also not with Pro X.  The title means "Wordplays, coined words and Etiolgies as a rhetorical tool in the Book Samuel."   

              the name of it called Babel; becauſe the LORD did there confound the language of all the earth   Gen. 11:9

              • 6. Re: OCR in Adobe Acrobat Pro DC
                Karl Heinz Kremer Adobe Community Professional

                Acrobat's OCR is not as feature-rich as a dedicated OCR application. One drawback is that it only supports one language per OCR job. This is what may trip up your OCR results. Did you select Hebrew in the language settings?

                 

                Using Acrobat's own OCR engine is convenient when you want quick results, but for more challenging OCR jobs, I keep a dedicated OCR application around (I use Abbyy's FineReader for anything that is too complex for Acrobat).

                1 person found this helpful
                • 7. Re: OCR in Adobe Acrobat Pro DC
                  Andrew Fincke Level 1

                  It's kind of odd, Karl Heinz, for two Aryans like us to be knocking heads over a Hebrew magazine article.  But here's the link to page 2, whose predominately Hebrew text leaves little room for language-confusion.  After creating the file through extraction, I switched keyboards and performed a search on several Hebrew words to get the Hebrew OCR operating, then saved the OCR-compliant file to OneDrive. https://1drv.ms/b/s!AlNTAVddbaJxqWZL_Qjvml2NwfjN  We should be able to open this with Adobe Pro DC, insert a blank second page, cut and paste the bottom half of page one into the blank page and then double the font size of both pages to get a two-page large print, easy to read version.

                  The thoughts of the righteous are right: but the counſels of the wicked are deceit  Prov. 12:5

                  1 person found this helpful
                  • 8. Re: OCR in Adobe Acrobat Pro DC
                    Andrew Fincke Level 1

                    Let's see if I correctly reconstruct what happened.  I opened the file WordPlaysPage2 from the OneDrive link in my last message.  I then chose "Download" in order to get a pdf for Adobe Pro DC.  I had to wait 5 seconds while the OCR did heaven knows what.  When the .pdf opened in Acrobat, every small bit of text was enclosed in boxes.  I chose "Organize Pages" and added a blank page 2.  I hit "Organize Pages" again and chose the option "Edit PDF" from the drop down menu.  Here's what I got to work with: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online. I named it WordPlaysPage2Screwedup.  Is this some kind of antizionist plot against Israel?  Moshe Garsiel - the author of the article - is a respected Bible scholar, who enjoys writing backwards (right to left) in a strange script.

                    and ſpeake not to vs in the Iewes language, in the eares of the people that are on the wall  Isaiah 36:11

                    • 9. Re: OCR in Adobe Acrobat Pro DC
                      Karl Heinz Kremer Adobe Community Professional

                      There is not much control you have over the OCR process in Acrobat - select the language, and select the desired output format. That's it. I don't think there is anything else you can do to improve the results using this document.

                       

                      When you zoom into the document, you'll see a lot of pixelation in the characters - more than the resolution of 200dpi would suggest. If you cannot get a better version of this document (one with a higher resolution and/or less pixelation), I doubt that Acrobat can do anything different with this document. I did run the document from your Aug 21 comment through FineReader, and it did a much better job (but still not perfect, which is very likely due to the low quality image in the source PDF): http://khkonsulting.com/files/AUC/AF_8956910 _Wordplays_FR.pdf

                       

                      At this point, you would need somebody from Adobe's Acrobat team to take a look at your files to see if the OCR algorithm can be fixed/tweaked/modified so that it can do a better job with your document. The best way to get their attention it so file a bug report: Feature Request/Bug Report Form

                      • 10. Re: OCR in Adobe Acrobat Pro DC
                        Andrew Fincke Level 1

                        Thanks Karl Heinz.

                        I sent a bug report.  The link to KHconsulting was a dead end.  "Low quality image" is a euphemism.  I've been working with the Hebrew Bible for 30 years, and my understanding of it is still "low quality," as you put it.

                        And I wept much, becauſe no man was found worthy to open and to reade the booke, neither to looke thereon   Rev. 5:4

                        • 11. Re: OCR in Adobe Acrobat Pro DC
                          Andrew Fincke Level 1

                          My impression is that Adobe Pro DC is not for me.  While editing .pdfs is the eighth world wonder, adding a letter, at most a sentence, here and there is not worth $15 a month.  When I'm working with a .pdf, the program itself pipes in and says, "Stop what you're doing!  You're tasking my resources! Export the file to Word!"

                          And he confeſſed, and denied not;    John1:20

                          • 12. Re: OCR in Adobe Acrobat Pro DC
                            Andrew Fincke Level 1

                            Karl Heinz,

                            You won't believe it.  Here's another Hebrew article - this one by the same guy, Moses Garsiel on the topic of the compositional history of the Biblical books of Samuel (story of David and Saul).  Microsoft OneDrive - Access files anywhere. Create docs with free Office Online. When I opened it in Adobe Pro DC, the program destroyed it, putting boxes all through the OCR'd page  (the program goes page by page).  When you click on the page that's been OCR'd, the text turns into mumbo jumbo of letters shoved into each other and large gaps between the clumps.

                             

                            they could not reade the writing, nor make knowen to the king the interpretation thereof    Dan. 5:8

                            • 13. Re: OCR in Adobe Acrobat Pro DC
                              Andrew Fincke Level 1

                              So I put GarsielBooksofSamuel on a Flash Drive and walked to the other laptop, which still has Acrobat Pro X.   I opened GarsielBooksofSamuel and performed several searches on words near the end of the article, where an extensive bibliography contains a lot of English words.  The aim was to get the OCR working English and Hebrew.  I extracted page 28 and saved it as GarsielSamuel28.pdf, Microsoft OneDrive - Access files anywhere. Create docs with free Office Online. which I then opened in Acrobat Pro X.  I hit VIEW and SHOW/HIDE and TOOLBAR ITEMS and SELECT AND ZOOM and SELECT TOOL, which - after it appeared on the task bar - I activated and selected the top half of page 28.  I then hit EDIT and COPY.  I then hit FILE and CREATE and PDF FROM CLIPBOARD.  Its name is GarsielSamuel28Top.  It looked like a new-born gooseling, with a scrawny font and misaligned (left to right instead of right to left), but it was legible.  I rushed the burping babe in its flash drive bassinet to the  Acrobat Pro DC Trial computer, where I aligned the text, increased the font-size in stages to 26, the maximum the page editor allows, and activated the Windows Hebrew keyboard to repair the OCR screw-ups: ו (vav) for י (yud), ) for (, ח (chet) for  ה (he).  Also eliminated 4 or 5 extraneous line breaks.  Here's the result: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.  The question is: Could I have done all this without Pro X 11?  Does Pro DC have a SELECT tool?  Please reply before my 25 days of trial expire!

                              1 person found this helpful
                              • 14. Re: OCR in Adobe Acrobat Pro DC
                                Karl Heinz Kremer Adobe Community Professional

                                Yes, you can select and copy text with Acrobat DC as well. The easiest way to do this (which also works with older versions of Acrobat) is to go into Acrobat's Preferences (e.g. Edit>Preferences on Windows), then select the "General" category and check the setting "Make hand tool select text & images". How you can select text (or images) without having to select a special tool - as long as the hand tool is selected (the hand tool is the "Hand" icon on the toolbar, and is selected by default).

                                 

                                If you want to use the actual select tool, it's the arrow right next to the hand tool:

                                 

                                2016-08-25_10-17-46.png

                                • 15. Re: OCR in Adobe Acrobat Pro DC
                                  Andrew Fincke Level 1

                                  Here's the mess Pro DC makes of my Hebrew article when I hit EDIT PDF: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online. Pro X doesn't do that.

                                  and they maruelled at his anſwere, and helde their peace   Luke 20:26

                                  • 16. Re: OCR in Adobe Acrobat Pro DC
                                    Karl Heinz Kremer Adobe Community Professional

                                    If Acrobat DC is giving you different results than previous versions of Acrobat, you may want to suggest to file a bug report: Feature Request/Bug Report Form

                                    1 person found this helpful
                                    • 17. Re: OCR in Adobe Acrobat Pro DC
                                      Tariq Dar Adobe Employee

                                      Hello Everyone,

                                       

                                      Sorry for the inconvenience.

                                       

                                      This issue is known to us and product team will be investigating this issue. If I require any further information like sample files will be sending you a private message.

                                       

                                      Thank you for your patience.

                                       

                                      Regards,

                                      Tariq Dar.

                                      • 18. Re: OCR in Adobe Acrobat Pro DC
                                        Andrew Fincke Level 1

                                        For those Antisemites who think this Hebrew is a plot to introduce viruses here's a translation of GarsielSamuel28top (starting after the first period):

                                        Political (and also academic) problems hinder the efforts of researchers to get access to areas beyond the Green Line.  This short summary tells us that the archaeological achievements are far from definitive and that the road is long and the distance to the end is enormous.

                                             Therefore it seems that the analysis of Jerusalem in the time of David by Jonathan Aharoni, writing more than 30 years ago, is still valid: "The momentous changes which Israel underwent at this time are especially evident in Jerusalem, but the knowledge of ancient Jerusalem is sadly defective despite the extensive continuing digs of the last 100 years."   Aharoni cites 3 reasons: 1) theft of stones from the early days by later settlers; 2) Jerusalem never knew a time of vacancy from then until now; and thus most of the early buildings were destroyed, leaving few remnants hidden beneath piles of rubble and bottomless dumps; 3) Foreign parts of the city are off-limits to the digger's spade, because they are either places of worship, cemeteries or modern buildings.  It seems that the assessment of the distinguished scholar is more balanced than that of some of his colleagues and students who have rushed to make generalizations based on a handful of items from the days of David and Solomon.

                                        and the writing of the letter was written in the Syrian tongue, and interpreted in the Syrian tongue.   Ezra 4:7

                                        • 19. Re: OCR in Adobe Acrobat Pro DC
                                          try67 MVP & Adobe Community Professional

                                          What does this have to do with anything discussed here?

                                           

                                          Also, maybe enough with the biblical quotes at the end of every post?

                                          • 20. Re: OCR in Adobe Acrobat Pro DC
                                            Andrew Fincke Level 1

                                            Take the double-page scan "Hillary.pdf.  Make a copy: HillaryRightPages.pdf.  Close and save.  Two files- each 20 pages long.  Take Adobe Acrobat 11 Pro X.  Bring up the thumbnails at the left of the work window.  Crop the file to make only left pages.  Save.  You've got 20 pages of Hillary lefts only.  Open HillaryRightPages.pdf.   Do the same thing, only this time cropping to right halves of the double scans.  Close and save.  Open or access Hillary.pdf.  Go to thumbnail 20.  Insert HillaryRightPages.pdf after page 20 of Hillary.pdf.  Drag  pages 21-40 one by one into their proper places - 21 after 2, 22 after 4, 23 after 6 etc. Check to see everything's in correct order and cropped correctly.  Print your booklet, or double-sided fit to page if you want bigger.  It's all much more complicated - if at all possible - with Adobe Acrobat Pro DC.

                                            Olde things are paſt away: behold, al things are become new  2 Cor. 5:17

                                            • 21. Re: OCR in Adobe Acrobat Pro DC
                                              Andrew Fincke Level 1

                                              That's a wonderful thing: Adobe Acrobat Pro DC.  Maybe the next version will incorporate the Adobe Acrobat Pro X 11 features that permit large scale editing of documents.

                                               

                                              And the land was not able to beare them, that they might dwell together Gen. 13:6