This content has been marked as final. Show 5 replies
I don't know the answer for OCR. I would be inclined to try investigagate ways to avoid it that would involve loading the the content into a swf format...e.g. like flashpaper... or search for some other way to convert the pdf to swf whilst retaining the formatting/appearance.
The flashpaper API has for example text searching capability built in - I'm just not sure how much of the interface you can hide (or may not be permitted to via the licencing agreement). And if necessary then its also possible to use bitmapData copies of a swf format if you need to manipulate it as an image once the swf has loaded.
Of course if the original source is bitmap you would still need to adress the OCR problem. Don't know if that helps.
Thanks for the reply GWD.
I think I agree that I need to look at other techniques, but everything I've tried lacks in either the text layout, image quality or both. That flashpaper API sounds like a useful tool, I'll give that a try. Any good references? I can alter the conversion method of the pdf to output bitmap data if required (although file sizes will boost)
One of the problems with dll'ing an OCR program is that most OCR packages require some form of human input so I want to make the entire process as web-automatic as posible.
Plus there are a lot of users on the site at the moment and I don't want server downtime becasue an OCR beast is kicking and screaming in the background, haha.
Sorry I can't suggest references, its been a while since I played with flashpaper...
With flashpaper (which is a printer driver) you can print to either flashpaper format or pdf. Both formats retain all the formatting... which is what you need. And there is an API exposed for manipulation and it has text search etc built in. So you could just load in the flashpaper swfs instead of static images... But my suggestion is just a conceptual approach. It's what I would investigate if I was trying to do something like what you are describing. It may not end up being suitable... I just can't think of anything else.
Thanks for your advice GWD, it's greatly appreciated. Just had a look at FlashPaper, and the example on the homepage is exactly what I'm after. Plus at £78 you can't really argue.
All the best
You're welcome... and good luck with it.