10 Replies Latest reply on Nov 19, 2017 11:14 AM by Thom Parker

    Get text/number co-ordinates in a pdf

    syedu35318304

      Hi. Im trying to extract the x.y coordinates of any word/number matches in the pdf. Below is the code i am using. This works great for words like "California" but when i use numbers with symbols like 3.0-234 this doesn't work. Please help as this is very urgent for me. (I have tried giving 'false' in the 3rd parameter of getPageNthWord but still doesn't work)

       

       

      for (var p = 0; p < this.numPages; p++)

       

        {

       

        var numWords = this.getPageNumWords(p);

       

        for (var i=0; i<numWords; i++)

       

        {

       

        var ckWord = this.getPageNthWord(p, i, true);

       

        var num = 'California';

        var n = num.toString();

        if ( ckWord == n)

       

        {

       

        app.alert("Mouse position is: " + this.mouseX + "," + this.mouseY, 3);

       

        }

       

        }

       

        }

        • 1. Re: Get text/number co-ordinates in a pdf
          try67 MVP & Adobe Community Professional

          Try printing out all the words in the file to the console and then you'll

          see what the issue is.

           

          On Sun, Nov 12, 2017 at 6:52 PM, syedu35318304 <forums_noreply@adobe.com>

          • 2. Re: Get text/number co-ordinates in a pdf
            try67 MVP & Adobe Community Professional

            By the way, your code does not extract the coordinates of the word you're

            looking for, but of the mouse cursor...

            • 3. Re: Get text/number co-ordinates in a pdf
              gkaiseril MVP & Adobe Community Professional

              Searching for "words" or text strings gets tricky since there are many types of characters that may or may not need to be accounted for.  The biggest issue will be what is called white space. There are usually non-printable characters like space, new line, carriage return, horizontal tab, vertical tab, form feed, ";", ":", ".", etc.

               

              You should review the getPageNthWord method and pay close attention to the "bStrip" parameter. I expect you will need to search for single words and then also write code to search for multiple word combinations.

               

               

              Without the "bStrip" parameter or it set to "false"" I get a sample output of:

               

              0 word: |Word | length: 5

              1 word: |30.24-| length: 6

              2 word: |0 | length: 2

              3 word: |California. | length: 12

              4 word: |test | length: 5

              5 word: |word

              | length: 6

               

              With the "bstrip" parameter se to true I get a sample output of:

               

              0 word: |Word| length: 4

              1 word: |30.24| length: 5

              2 word: |0| length: 1

              3 word: |California| length: 10

              4 word: |test| length: 4

              5 word: |word| length: 4

              • 4. Re: Get text/number co-ordinates in a pdf
                syedu35318304 Level 1

                Hi, thanks for the quick response. Do you think it's a good/right approach to get coordinates from the mouse location? If not, what is the preferred way to get so.

                • 5. Re: Get text/number co-ordinates in a pdf
                  try67 MVP & Adobe Community Professional

                  No, it's not. There's no relation between the mouse's location and the location of the word.

                  You need to use the getPageNthWordQuads method to get an array that defines the location(s) of the word on the page.

                  • 6. Re: Get text/number co-ordinates in a pdf
                    Thom Parker Adobe Community Professional

                    There are also different types of coordinate systems for a PDF. Here is an article that explains

                     

                    https://acrobatusers.com/tutorials/auto_placement_annotations

                    • 7. Re: Get text/number co-ordinates in a pdf
                      syedu35318304 Level 1

                      Hi all,

                       

                           i had successfully extract the coordinates using   getPageNthWordQuads. But i have a problem with extracting special charecters like " - ".

                      For ex: 13-jul-2011 will extrat like 3 words. Is there any possibuility to extract Quads with special charecters also included in a word.

                       

                       

                      Thanks in Advance,

                      • 8. Re: Get text/number co-ordinates in a pdf
                        Thom Parker Adobe Community Professional

                        The quad always includes the associated punctuation. Try this

                         

                        Find the index of a word that contains punctuation with this code

                         

                        len = getPageNumWords(pageNum)

                        for(i=0;i<len;i++)

                        console.println(i+ ": " + getPageNthWord(pageNum,i,false));

                         

                        Run it in the cosole window

                         

                        Then run this code on the word that includes a comma or dash. In this case it's word number 3

                         

                        qds = getPageNthWordQuads(pageNum,3)

                        rect = [qds[0][0],qds[0][5],qds[0][2],qds[0][1]]

                        addAnnot({page:pageNum,rect:rect,type:"Square"})

                         

                        You'll see that the added rectangle surrounds the punctuation

                        • 9. Re: Get text/number co-ordinates in a pdf
                          syedu35318304 Level 1

                          Thanks Thom. One more doubt. How do i export the output values to a text file (anywhere in the local drive)?

                          Following is my code for finding a text and generating the xy co-ordinates.

                           

                           

                          for (var p = 0; p < this.numPages; p++)

                            {

                            var numWords = this.getPageNumWords(p);

                            for (var i=0; i<numWords; i++)

                            {

                            var ckWord = this.getPageNthWord(p, i, true);

                            if ( ckWord == "Adobe")

                            {

                            var q = this.getPageNthWordQuads(p, i);

                            var a = q.toString();

                            var b = new Array();

                            b = a.split(",");

                            var x1= b[0];

                            var y1= b[1]

                            var x4= b[6];

                            var y4= b[7];

                            var x=(parseInt(x1)+parseInt(x4)/2);

                            var y=(parseInt(y1)+parseInt(y4)/2);

                            }

                            }

                            }

                           

                          My question is to how do i write the x and y value to a text file .

                          • 10. Re: Get text/number co-ordinates in a pdf
                            Thom Parker Adobe Community Professional

                            There is no way with JavaScript alone to write a random file to the local file system.  There are a couple of workarounds.

                            There used to be a way to do this with the "doc.exportDataObject" function, but it has since been restricted.

                             

                            1. The Easy way. Write the text data to a file attachement with the "doc.createDataObject" function. I do this a lot. Although the data is not written to file system, its very easy for the user to drag and drop it anywhere they want. And there is an advantage to having the data attached to the PDF where it was created.

                             

                            2. Create a new PDF with the "Report" object. Write the xy text to the "Report", then save it as text. This writes the text data to a specific location on the file system.

                             

                            There are several variations on this theme. For example, you could create a blank PDF with "app.newDoc" then add one large form field or text annotation and write all the text data to the field/annot. Then flatten and save as text.

                             

                            3. There are other tricks if you can write a plug-in or an IAC App. I once wrote a VBA add-in to Excell that sucked data out of a PDF, then saved the excel file.