7 Replies Latest reply on Feb 20, 2014 2:39 PM by Test Screen Name

    Acrobat 11what to do with the f character.

    NoPrevaricator Level 1

      Hello all:

       

         I am parsing text from a PDF using the Acrobat Javascript API.  The code looks like this

       

         for(var a = 0; a < PageCount; a++)

         {

        for(var b = 0; b < NumWords; b++)
             {

       

                   var TheWord      = this.getPageNthWord(a, b, false);

      ...

       

      The problem with this code is that if the word being returned starts with an f or sometimes a Th, the getPageNthWord() function will not recognize the whole word.

       

      For example, the test passage has in it the word "flyway".  getPageNthWord() returns this word in two segments.  In this case the first two letters "fl" are translataed into character decimal 186 and the last four characters are returned as they are.  So I get two returns from getPageNthWord():

       

      the Fahrenheit short-hand sign

      "yway"

       

      In another part of the document the word "fishing" appears.  It is parsed into

       

      blank value

      shing

       

      In other words in all cases here, the word is split into two with the first return gobbling up two of the characters while the second returns the last X numbr of characters in the word.  What the first ttwo chars are actually interpreted as varies...I can find no pattern, though I suspect there is one.

       

      Anyone have any thoughts on what might be going on  here and how to counteract it?

       

      FYI, the rest of the document is reported on faithfully.

       

      R,

      JOhn