5 Replies Latest reply on Aug 9, 2007 7:19 AM by Greg Dove

    word search image

    pemcconnell
      Hi guys,

      This one's quite advanced...

      I'm working on a online book project that takes a .pdf, converts each page into a JPEG and stores them to a unique folder. I have created a flash file that the user is directed to upon the .pdf's upload which detects via a querystring, the folder location, and total number of pages. This, as you have probibly guessed, is the online book, which has all the bells and whistles of page turn animations, a goto page function etc...

      My problem is this:
      I need the application to have a word search function. I know that ideally the .pdf would have been converted to text, but I couldn't as the layout of test to images was nowhere near perfect on the tests I ran.

      Is there any OCR styled approach that I could use in flash? I have created JAVA applets which have performed this task before, but never within a .swf.

      An option is to install an OCR program on the server which will run dynamically after the .pdfs upload, and store the text into a database, however, at best (I assume), this will only allow the user to be taken to the page, and not the line, let alone creating a transulcent highlight, for example, over the top of the text.

      I don't expect an answer, but any feedback would be geatly appreciated.

      Thanks in advance,

      Peter McConnell

      ---------------------------
      EG-Consulting.com
      pmcconnell@eg-consulting.com
        • 1. Re: word search image
          Greg Dove Level 4
          I don't know the answer for OCR. I would be inclined to try investigagate ways to avoid it that would involve loading the the content into a swf format...e.g. like flashpaper... or search for some other way to convert the pdf to swf whilst retaining the formatting/appearance.
          The flashpaper API has for example text searching capability built in - I'm just not sure how much of the interface you can hide (or may not be permitted to via the licencing agreement). And if necessary then its also possible to use bitmapData copies of a swf format if you need to manipulate it as an image once the swf has loaded.
          Of course if the original source is bitmap you would still need to adress the OCR problem. Don't know if that helps.
          • 2. Re: word search image
            pemcconnell Level 1
            Thanks for the reply GWD.

            I think I agree that I need to look at other techniques, but everything I've tried lacks in either the text layout, image quality or both. That flashpaper API sounds like a useful tool, I'll give that a try. Any good references? I can alter the conversion method of the pdf to output bitmap data if required (although file sizes will boost)

            One of the problems with dll'ing an OCR program is that most OCR packages require some form of human input so I want to make the entire process as web-automatic as posible.

            Plus there are a lot of users on the site at the moment and I don't want server downtime becasue an OCR beast is kicking and screaming in the background, haha.
            • 3. Re: word search image
              Greg Dove Level 4
              Sorry I can't suggest references, its been a while since I played with flashpaper...

              With flashpaper (which is a printer driver) you can print to either flashpaper format or pdf. Both formats retain all the formatting... which is what you need. And there is an API exposed for manipulation and it has text search etc built in. So you could just load in the flashpaper swfs instead of static images... But my suggestion is just a conceptual approach. It's what I would investigate if I was trying to do something like what you are describing. It may not end up being suitable... I just can't think of anything else.
              • 4. Re: word search image
                pemcconnell Level 1
                Thanks for your advice GWD, it's greatly appreciated. Just had a look at FlashPaper, and the example on the homepage is exactly what I'm after. Plus at £78 you can't really argue.

                Cheers again,

                All the best

                Peter McConnell
                ---------------------
                E.G Consulting
                pmcconnell@eg-consulting.com
                • 5. Re: word search image
                  Greg Dove Level 4
                  You're welcome... and good luck with it.