Skip navigation
Crypozza
Currently Being Moderated

Help with pdf text extraction

Jun 15, 2012 6:04 AM

Hello, i have problems extracting text from one pdf file.
All other pdf works fine (tested on 100+ files), but this one fails since my parser can't find BT ET Tj etc... tags
But there are lot of texts, other apps somehow parsing this.
Question is: it is only way to store text in pdf, placing it between BT ET ?

Or there are some other technics? By reading pdf reference i didn't find the answer.

 

Thanks.

 
Replies
  • Currently Being Moderated
    Jun 15, 2012 7:34 AM   in reply to Crypozza

    Is the text inside of Form XObjects?  Is the text in a Pattern?   Or Annotation Apperances?

     

    These are all types of Content Streams that text can appear in…

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 15, 2012 10:06 AM   in reply to Crypozza

    ISO 32000-1:2008, 8.10 “Form XObjects”.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points