7 Replies Latest reply on Nov 9, 2011 10:35 AM by John Hawkinson

    Scanning a math book then convert to epub???

    arthurema

      Hmm...I have tried many ways to create an epub from scanning a math book.  This task has proven to be quite a challenge.  Can anyone recommend the steps to take in order to create a quality epub file?  I have tried to import from a scanner into Acrobat using clearscan then saving as a PDF.  That PDF is then Placed into InDesign and exported as an epub.  This conversion creates a 12M epub from a 3M PDF that is no loger searchable and hardly portable.  This is needed for a student with physical disabilities and I have the publishers authorization for this conversion.  You can imagine the challenges of having formatted math equations in the text book.  Thank you in advance for assistance.

        • 1. Re: Scanning a math book then convert to epub???
          John Hawkinson Level 5

          As you have gathered, once you get it in PDF form, InDesign ceases to be a useful tool for EPUBs.

          I think you want to take your scanned document with OCR and save it as a Microsoft Word (or RTF) file. Ideally your scanning software will do that with the equations and soforth as inlined images.


          Then, place that document in InDesign and autoflow it. Then export to EPUB.

           

          But there may be other workflow that work better than InDesign here. Have you tried using Calibre or Sigil?

           

          InDesign is really useful for EPUBs if you already have an InDesign document that you want to repurpose for EPUB. But it's not a generic EPUB management tool.

           

          disclaimer: I don't do EPUBs, I just watch in the wings.

          • 3. Re: Scanning a math book then convert to epub???
            tonyharmer Level 3

            Hi

             

            I'm sorry to be the bearer of well, not bad news exactly but not the news you're hoping for. There's no shortcut available for this one, I'm afraid.

            Hmm...I have tried many ways to create an epub from scanning a math book.  This task has proven to be quite a challenge.  Can anyone recommend the steps to take in order to create a quality epub file?  I have tried to import from a scanner into Acrobat using clearscan then saving as a PDF.  That PDF is then Placed into InDesign and exported as an epub.  This conversion creates a 12M epub from a 3M PDF that is no loger searchable and hardly portable.  This is needed for a student with physical disabilities and I have the publishers authorization for this conversion.  You can imagine the challenges of having formatted math equations in the text book.  Thank you in advance for assistance.

            You're going to need to author the thing almost from scratch, and use a product to manage the mathML (math markup language) aspects of this. ePub files need linear, structured text to work well, it's as simple as that, really.

             

            Scanning a pdf will only give you pictures of the pages, so as for searching, there is nothing to search for (just a bunch of pixels). On the scanning and OCR front, it is (almost certainly) unlikely to work with equations.

             

            Ferdinand Schwoerer produces an InDesign plug-in, MT-Editor, which you can order by email: order@movemen.com

             

            Like I said, sorry - but there's not really a shortcut here! On the upside though, if you convert it you could probably do a deal with the existing publishers that might make you a buck or two!

             

             

            • 4. Re: Scanning a math book then convert to epub???
              John Hawkinson Level 5

              Tony:

              Scanning a pdf will only give you pictures of the pages, so as for searching, there is nothing to search for (just a bunch of pixels). On the scanning and OCR front, it is (almost certainly) unlikely to work with equations.

              My assumption is that OCR will operate on the main text, and images of equations will be sufficient, and that the OCR/scanning software can be set up to properly make equations images.

               

              A lot depends on how the text is structured, of course, since a lot of inline equations may wreak havoc. But I suspect that, from a search perspective, it is sufficient to search for the words and not the equations.

              • 5. Re: Scanning a math book then convert to epub???
                tonyharmer Level 3

                Hi John

                 

                I actually have a client with some experience of this one (which is how I come to know about inMath, and MT-Editor), and to be perfectly honest, there's so much clean-up work involved and potential for error that it really begs the question as to whether or not it's worth it.

                 

                So, while there probably is a workflow to be had in your suggestion, I'd love to hear how that works out, should it be employed.

                 

                I think that some people out there (and this is not aimed at anyone here in this instance, so please don't be offended) are after Hogwarts when it comes to ePub, not InDesign.

                 

                 

                • 6. Re: Scanning a math book then convert to epub???
                  arthurema Level 1

                  Great remarks everyone, thank you.  After using clearscan OCR, the 750 page PDF is sized at 22M, quite large.  The PDF is very representative of the original book.  The file was then exported as a Word-type file and yes, requires much formatting and clean up, too much to be useful.  I found that some of the equations are converted to text and some equations are mixed text and image which results in major issues.  Retaining the PDF seems to be simpliest solution at this time since Adobe released  Acrobat Reader for mobile.  In the meantime, I'll look into MathML and MT-Editor.

                  • 7. Re: Scanning a math book then convert to epub???
                    John Hawkinson Level 5

                    There is OCR software that renders directly to Word, rather than to PDF.

                    I guess ClearScan OCR is an Adobe technology, so it probably does not qualify.

                    Once you go through PDF, you're likely to lose enough structure to make it very hard.

                    I would look at using other OCR software instead.