7 Replies Latest reply on Oct 31, 2012 4:53 AM by TauChiba

    ExtendScript: Get all text from a document

    TauChiba Level 1

      Hi all.

       

      I have the following task: I need to translate a document into another language using ExtendScript. So, as "input" I have a document with a text/graphics/tables/etc. in Language_1 and a "somehow-separated file", which will contain data about translation into the Language_2. E.g.:

       

      Some_text_in_language_1     Some_text_in_language_2

      Some_other_text_in_language_1     Some_other_text_in_language_2

       

      To get the source text from the document, I've tried to use this:

       

      var pgf = doc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;

      while(pgf.ObjectValid()){

           var test = pgf.GetText(Constants.FTI_String);

           var text, str;

           text = "";

           for (var i=0; i < test.len ; i +=1)

           {

                var str=test[i] .sdata.replace(/^\s+|\s+$/g, '') ; 

                text = text + str;

                PrintTextItem (test[i]);

           }                      

           pgf = pgf.NextPgfInFlow;

      }

       

      But with this, I can only access the regular text in the document (e.g. the text in tables remains untougched). Is there any way I can the all textual data from specified document? Or maybe, the full list of controls, which can contain it, to iterate throught them and extract it one-by-one? Or maybe there's a better way to solve this problem?

       

      Thanks in advance! Any advice would be greatly appreciated.

        • 1. Re: ExtendScript: Get all text from a document
          Wiedenmaier Level 3

          Hi,

          GetText delivers an array of text items.

          Text items could be text but also table anchors, markers etc.

          You'll never get text of a table if you call pgf.GetText(Constants.FTI_String).

           

          If you want to have text and table, you have to call

          var textItems = pgf.GetText(Constants.FTI_String | Constants.FTI_TblAnchor);

          After that you have to loop through the textitems, and check for table anchors. Then you can get text from that table resp. table cells.

           

          If you want to have all kind of text item types, you can call

          var textItems = pgf.GetText(-1);

           

          Hope this helps

          Markus

          1 person found this helpful
          • 2. Re: ExtendScript: Get all text from a document
            TauChiba Level 1

            Thanks a lot for the answer, Markus. Indeed, the "-1" seems like my salvation

             

            Just to clarify one thing: with this construction, I get all the textual data in a straight way also as though anchors. In my test document, I've noticed only table anchors. Is there any other elements, that can contain text and will be returned by this construction as anchors?

            • 3. Re: ExtendScript: Get all text from a document
              4everJang Level 3

              There is another way to loop through ALL paragraphs in a document, regardless whether they are in a table or in the main text flow. You can use the FirstPgfInDoc property of the document and loop through all Pgf objects using the NextPgfInDoc property of the Pgf until you reach an invalid object. Note that this also includes all paragraphs in the master and reference pages, so it might be useful to check where the Pgf is located (on a body page or not). There is a script on this forum that does that - I believe it was created and posted by Rick Quatro.

               

              Working your way through the main text flow does not guarantee that you have all the visible text in the doc. There may be multiple flows and there may also be text frames that are placed inside anchored frames. Those text frames are not contained directly in the main flow of the document.

               

              Good luck with your scripting

               

              Jang

              1 person found this helpful
              • 4. Re: ExtendScript: Get all text from a document
                TauChiba Level 1

                Thanks for response, Jang.

                Working your way through the main text flow does not guarantee that you have all the visible text in the doc. There may be multiple flows and there may also be text frames that are placed inside anchored frames. Those text frames are not contained directly in the main flow of the document.

                Wow, that's frustrating I feel like I'm trying to dig a ground with a spoon. Well, that's what was the reason, why I've posted my task. Maybe you could give an advice on alternative way to achieve this goal?

                • 5. Re: ExtendScript: Get all text from a document
                  Wiedenmaier Level 3

                  Just to clarify one thing: with this construction, I get all the textual data in a straight way also as though anchors. In my test document, I've noticed only table anchors. Is there any other elements, that can contain text and will be returned by this construction as anchors?

                   

                  markers, cross references, variables, footnote, hypertext, equations, text insets, call outs placed on graphics like textframes or text lines.

                  some hints.

                  Table title you will get from table object with property "FirstPgf".

                  markers have a property "MarkerText"

                  for xrefs you have to get xrefformat an the definition there.

                  for variables you have to get the variable format and the definition there

                  for equations you have to get the MathFullForm property.

                  and so on.

                  1 person found this helpful
                  • 6. Re: ExtendScript: Get all text from a document
                    Wiedenmaier Level 3

                    BTW: you can use Save as XML (in a unstructured document, too).

                    So you will have your content in the text flow in the xml file and can process that with an very easy XSLT Stylesheet.

                    Be aware: not all objects (markers a.s.o) are exported to xml in the standard way, as I can see.

                    Markus

                    1 person found this helpful
                    • 7. Re: ExtendScript: Get all text from a document
                      TauChiba Level 1

                      Okay, thanks again to all for your comments.

                       

                      I realized that the problem was in my approach. And I ended up with scripting the file translation based on "Find/Replace" function (decided to notice for the ones, who will face same problem).

                       

                      Good luck!