May 5, 2011

    How to batch process OCR with files that have searchable text cover pages




      I have a lot of academic article papers and articles that I want to make searchable.


      The problem I'm faced with is that they all have a computer generated cover page with searchable text (the disclaimer, title, citation and so on). The rest of the pages are all scanned images.


      I can obviously do ech one manually, and omit the first page, but there are too many for that, really.


      Is there another way - a way to batch process OCR but omitting the first page of each file?


      EDIT: The best solution I've come up with is to extract and delete the first page of each, so it's saved automatically as "filename 1", which makes combining the files via explorer context menu easier later.