-
1. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
PoojaSehgal Feb 27, 2013 8:59 PM (in response to TwitchOSX)Hello
Can you provide us with the PDF with which you tried this?
-
2. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Bernd Alheit Feb 28, 2013 3:51 AM (in response to TwitchOSX)What OCR option do you use?
-
3. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Feb 28, 2013 10:52 AM (in response to PoojaSehgal)The PDF is 42mb......
-
4. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Feb 28, 2013 10:54 AM (in response to Bernd Alheit)Just opened it in Acrobat, went to the right under Tools, Recognize Text and clicked on "In This File" which pops up the Recognize Text window. I left it on All Pages and left it on default Settings which are
Primary OCR Language: English (US)
PDF Output Style: Searchable Image
Downsample To: 600 dpi
After I hit OK, it did it's thing and then I did a File / Save As / Microsoft Word / Word 97 - 2003 Document
-
5. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Feb 28, 2013 10:58 AM (in response to TwitchOSX)There you go then. You asked for a "searchable image".
This keeps the original image that you OCR'd, and adds hidden text so you can search for text.
When you export to Word perhaps the hidden text is kept, perhaps it is lost, but certainly the huge scans are going to go into the Word document because they are really there in every sense: what you see on screen, what you print.
I have to say, if I wanted to OCR to Word I would not involve Acrobat.
This mistake has been made ever since Acrobat 1.0. It goes: "I want to go from X to Y. I see I can go from X to PDF, and from PDF to Y. So obviously that's a good way to do it". In fact, it's generally a terrible way to do it, and only useful if all other avenues have been exhausted!
-
6. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Feb 28, 2013 11:05 AM (in response to Test Screen Name)Uh... yea, the PDF itself ends up being a searchable document but when you save to word, it just saves the text as a word document. What does that have to do with a searchable PDF at that point??
"PDF Output Style - Searchable Image" not.... Word Output Style...
-
7. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Feb 28, 2013 11:23 AM (in response to TwitchOSX)When you save to Word, images are preserved - right? Even if you also get visible editable text. (Is it editable text?)
I don't understand your last point, what do you mean by "Word output style"?
-
8. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Feb 28, 2013 11:43 AM (in response to Test Screen Name)What I meant was, I didn't give him the PDF with searchable text (Which is only 48mb). I sent a Word document. And within that Word document, there is exactly 1 graphic that the scan picked up. Why would a word document with 1 image in it be 158mb? I ended up having to burn him a disc because I couldnt email it at 158mb.
When you do a PDF Output Style - Searchable Image thats what the PDF output is. Which should have nothing to do with saving as a Word Document right?
-
9. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Feb 28, 2013 4:19 PM (in response to TwitchOSX)Of course the PDF you make has EVERYTHING to do with saving as Word format because the contents of the PDF are turned into Word. And the PDF has images, so the Word file has images. It seems probable to me that you need to change your OCR options IF you must do things this way.
-
10. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
PoojaSehgal Mar 1, 2013 3:58 AM (in response to TwitchOSX)Hi
Can you please upload your scanned PDF on workspaces.acrobat.com and share the document with me (sehgal@adobe.com)?
Thanks
Pooja
-
11. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Mar 1, 2013 2:09 PM (in response to Test Screen Name)Hmm.... from what I can tell, all it SHOULD do is extract the text and create a Word document. Why the Word document would become 3 times larger than the PDF is beyond me. It's just an 80 page word document. Should only be like a couple hundred KB
-
12. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Mar 1, 2013 2:16 PM (in response to TwitchOSX)I'm not sure why you think it should do that. Have you read anything to say that converting to Word will ignore the pictures?
I'd expect it to get much bigger in Word because PDF compresses pictures much better than Word.
-
13. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Mar 1, 2013 3:18 PM (in response to Test Screen Name)There are no pictures! Well, one picture. The entire thing is text. Thats what I'm saying.
-
14. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Mar 1, 2013 3:28 PM (in response to TwitchOSX)I know that's what you are saying but I don't agree. You said you chose searchable image when doing the OCR. So there is an image because you asked for one. A great big high resolution image covering each page.
-
15. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Mar 1, 2013 3:35 PM (in response to Test Screen Name)Yea.... "searchable image" for the PDF output. As in, after it OCRs the document, you get a searchable PDF. But when you export to Word, it only brings over text and any images that it sees in the document, not each PDF "graphic" page.
-
16. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
Test Screen Name Mar 1, 2013 3:46 PM (in response to Test Screen Name)Since we can't agree, I recommend you follow up the offer in post 10.
-
17. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
TwitchOSX Mar 1, 2013 4:02 PM (in response to Test Screen Name)I did. Thanks for the discussion though.
-
18. Re: Ran OCR on a 80 page document. Exported to Word. HUGE Word doc. HUH? It's just text!
PoojaSehgal Mar 3, 2013 9:44 PM (in response to TwitchOSX)Hi Chris
I did an export of your document that you sent me through "send now".
Scanned PDF has 55 pages. After exporting it to .doc format the file size was 2.3 MB, which contained text only.
Please not that you do not need to run OCR explicitly on the document. Just keep the OCR option on while Saving the file as Word Document. On 'Save As Other' Dialog you get settings button where you can keep the OCR on.
Acrobat XI Pro (11.0.2)
Windows-7 64Bit
Regards
Pooja




