14 Replies Latest reply on Mar 13, 2015 5:49 AM by K.Daube

    How to get formatted text into arrays

    K.Daube Level 1

      Dear experts and helpers,

      For my project I import an RTF file and then read the data from it into 3 arrays. This works fine when just using the string contents of the paragraphs. However, the final script should be able to read and replace formatted text...
      Why use the intermediate arrays? Because otherwise I need to switch back and forth between two fm-documents (and one may be a book component).

      The imported file starts with a number of lines separated into two items by a TAB (» denotes a TAB, in FM \x08)
      [[Garneau, 1990 #12]]    »   [9]
      The right item may also be locally formatted text, e.g. [9]
      Then follow the same (or smaller) number of paragraphs with formatted text like this:
      [9] » D. Garneau, Ed., National Language Support Reference Manual (National language Information Design Guide. Toronto, CDN: IBM National Language Technical Centre, 1990.

       

      Is it possible to replace in the body of the function below the following piece

        while(pgf.ObjectValid()) {
          pgfText = GetText (pgf, newDoc);
          gaBibliography.push(pgfText);
          pgf = pgf.NextPgfInFlow;
        }
      

      with this

        while(pgf.ObjectValid()) { 
          gaBibliography.push(pgf);
          pgf = pgf.NextPgfInFlow;
        }
      

      Do I need a special declaration of the array gaBibliography ?
      And how to get the right part of the intro lines as formatted thingy into array gaFmtCitsFmt ?

       

      Currently I read into arrays only the 'strings' (function GetText not shown):

      var gaFmtCitsRaw  = [];                           // left column in processed RTF
      var gaFmtCitsFmt  = [];                           // right column in processed RTF
      var gaBibliography= [];                           // bibliography lines from processed RTF
      // filename is something like E:\_DDDprojects\FM+EN-escript\FM-testfiles\BibFM-collected-IEEE.rtf 
      
      function ReadFileRTF (fileName) {
        var nCits=0, nBib = 0, openParams, openReturnParams, newDoc, pgf, pgfText ;
        var TAB = String.fromCharCode(8);               // FM has wrong ASCI for TAB
        var parts = [];
        
        openParams = GetOpenDefaultParams();
        openReturnParams =  new PropVals();  
        newDoc = Open (fileName, openParams, openReturnParams);  
        pgf = newDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;  // get first pgf in flow
      
      // --- read the temp/formatted citations  
        while(pgf.ObjectValid()) {
          pgfText = GetText (pgf, newDoc);
          if (pgfText.substring (0,2) == "[[") {        // citation lines start with [[
            parts = pgfText.split(TAB);                 // get the two parts of the line
            gaFmtCitsRaw.push (parts[0]);               // Push the result onto the global array
            gaFmtCitsFmt.push (parts[1]);
            pgf = pgf.NextPgfInFlow;
          } else { break }
        }
      
      // --- read the bibliography
        while(pgf.ObjectValid()) {                      // until end of doc
          pgfText = GetText (pgf, newDoc);
          gaBibliography.push(pgfText);
          pgf = pgf.NextPgfInFlow;
        }
        newDoc.Close (Constants.FF_CLOSE_MODIFIED);
      } // --- end ReadFileRTF
      

       

      The next questions then will be how to modify Ian Proudfoot's FindAndReplace script to handle formatted text as replacement. IMHO i will need to use copy/paste ...

        • 1. Re: How to get formatted text into arrays
          frameexpert Level 4

          Hi Klaus, You can push paragraph objects into an array without a special declaration. I am pressed for time, but will try to look at the rest of your question later. -Rick

          • 2. Re: How to get formatted text into arrays
            Russ Ward Level 4

            Klaus, I would suggest that copy/paste might be the easiest way. However, I would not suggest that it is 100% reliable. Usually, I think, but I would not bet on it.

             

            The alternative is to query the text range of each paragraph for any format changes, store each set of properties from the original, then iterate over the new text and reapply. You can find out where formatting changes occur with something like:

             

            textItems = doc.GetTextForRange (textRange, Constants.FTI_CharPropsChange);

             

            Now, I realize this doesn't tell you much and the truth is that it is a complicated concept. I would have to spend all day writing about it, because you need an intimate knowledge of text ranges and text item structures to make it work. Obviously, I can't do that.

             

            What I can do is provide a working sample that shows the concept, although for a somewhat different application. I ran into this same type of issue with a script that applies character formatting, where I wanted to have an Undo feature as well. In order to accomplish an undo, I have to effectively remember the original formatting of the entire text snippet where the new formatting was applied. This is similar to what I think you want... to remember (and reapply) the original formatting of text snippets from the imported RTF content. If you are interested, go here and get the script called ADVANCED_Create_formatting_shortcuts.jsx:

             

            FrameMaker ExtendScript Samples - West Street Consulting

             

            Then, look up the following functions:

             

            CaptureChrFormatUndoSnapshot()

            UndoChrFormatApply()

             

            Please accept the disclaimer that this is a complicated concept embedded within a complicated script. I hope it can be of some assistance.

             

            Russ

            • 3. Re: How to get formatted text into arrays
              K.Daube Level 1

              Thanks to Rick and Russ for the intitial feedback. Russ, Your example is really complicated, but thanks to your extensive comments I should get at least some insight.

              My major problem seems to be the understanding of textrange.

              - How can I 'grab' a full paragraph?

              - How can I 'grab' a part of a paragraph, such as the part behind the first TAB character?

              I know that You all do not have much time - in particular compared with me as a retired person. I hope to be patient enough for You. I'm experimenting a lot to enhance my knowledge - mainly based on examples from others.

              • 4. Re: How to get formatted text into arrays
                Russ Ward Level 4

                Klaus,

                 

                Working with text is about the most complicated thing to do within FrameMaker. It seems counter-intuitive, since it is about the easiest thing to do with the GUI. But alas, once you remove the ability to select with a mouse and type with a keyboard, text becomes a wild jungle of complexity.

                 

                Text ranges are not too bad, once you get the general idea. It is just that... a range of text, like something you would select with a mouse. Like a mouse selection, it starts before some character in some paragraph and ends after some character in some paragraph. It may be the same paragraph, which is a selection within a paragraph. The character can even be the same, which is then just an insertion point (cursor) somewhere.

                 

                So, a text range is a data structure that defines two paragraphs and two characters. In the jargon of scripting, the character is called an "offset." An offset is simply the number of characters past the beginning of said paragraph, where 0 is the beginning.

                 

                For example, if you want to capture the first five characters of a paragraph as a text range, you can do this, where 'pgf' is some paragraph object:

                 

                var textRange = new TextRange();

                textRange.beg.obj = pgf;

                textRange.beg.offset = 0;

                textRange.end.obj = pgf;

                textRange.end.offset = 5;

                 

                If you want to capture a whole paragraph, change that last line to the number of characters in the pgf, or you can do this:

                 

                textRange.end.offset = Constants.FV_OBJ_END_OFFSET;

                 

                ...where that constant is just some built-in thing that means "get me to the end of whatever." It's a convenience of the interface.

                 

                I'll also note that a text range is actually just an array of two text location structures, one named 'beg' and one named 'end.' If you think of a text location as defined by paragraph and an offset from the first character, maybe that will make more sense.

                 

                Text item structures are a whole new mess of complexity. I can't possibly go into an explanation of them here.

                 

                I think that many ES developers (definitely myself included) still use the FDK documentation because it is considerably more comprehensive. The two interfaces are largely parallel, but of course somewhat different in the language syntax. Consider that as a potential resource.

                 

                Russ

                • 5. Re: How to get formatted text into arrays
                  K.Daube Level 1

                  Thank You Russ for Your explanations.

                  Working with text is about the most complicated thing to do within FrameMaker

                  Oh Lord, and FM is all about handling text ...
                  I understand the concept of a text range - and have already fiddled around with various types of text items (including concatinating them if they are strings).
                  But: how to convert a text range into a selection (e.g. for copying it into the clipbaord)?

                  FDK says about SetTextRange: «Set the text selection or insertion point by setting the property that specifies the text selection»

                  var oDoc = app.ActiveDoc;
                  var pgf  = oDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;
                  var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;
                  
                     var tr = new TextRange();                      //get text selection for paragraph             
                     tr.beg.obj = tr.end.obj = pgf;
                     tr.beg.offset = 0;
                     tr.end.offset = Constants.FV_OBJ_END_OFFSET;    
                  // var sel = oDoc.TextSelection(pgf);             // Err: TextRange() is not a function ???
                  
                  oDoc.Copy();             // Docu says: Copies the current selection to the FrameMaker Clipboard
                  alert ("what's in the clibboard?");               // from outside FM
                  
                  // Add a new paragraph after the current paragraph.  
                  var newPgf = oDoc.NewSeriesPgf (lastPfg);         // OK
                    oDoc.Paste();                                   // only non FM stuff is pasted
                  

                   

                  In line 16 something is pasted, if the clipboard contains only text (from outside of FM). If I manually select something in the doc and run the script, nothing is pasted.

                  If I empty the clipboard and run this snippet, nothing is pasted (and ClipBoardInspector does not see anything).

                  • 6. Re: How to get formatted text into arrays
                    frameexpert Level 4

                    Hi Klaus, Before you copy the text range, you have to select it first. -Rick

                     

                    oDoc.TextSelection = tr;

                    • 7. Re: How to get formatted text into arrays
                      K.Daube Level 1

                      Thank You very much, Ric.

                      I had the following statement, but got a strange error message and hence removed it ...

                      var sel = oDoc.TextSelection(pgf);   // Err: TextRange() is not a function ??? 
                      
                      

                       

                      So now I get closer peu à peu:

                      var gaText = new Array ();
                      var oDoc = app.ActiveDoc;
                      var pgf  = oDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;
                      var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;
                      
                        var tr = new TextRange();                       //get text selection for paragraph            
                        tr.beg.obj = tr.end.obj = pgf;
                        tr.beg.offset = 0;
                        tr.end.offset = Constants.FV_OBJ_END_OFFSET;   
                      
                        oDoc.TextSelection = tr;
                        oDoc.Copy();                                    // Clipboard OK (rtf)
                      
                        gaText.push (tr);                               // not correct object
                      
                      // Add a new paragraph after the current paragraph. 
                        var newPgf = oDoc.NewSeriesPgf (lastPfg);       // OK
                      
                      // oDoc.Paste(Constants.FF_INTERACTIVE);          // no dialogue 
                      // error = oDoc.Paste(0);                         // nothing pasted
                      
                        var textLoc = new TextLoc (newPgf, 0);  
                        oDoc.AddText (textLoc, gaText [0]);             // [object TextRange]
                      

                       

                      I had no success at all with the Copy method (see the comments) and hence jumped further to what I really need: have the stuff in an arry and place it again.

                      But I do not have the correct object on the array. The text inserted is [object TextRange] - how to get the contents of this text range?

                       

                      Replacing line 23 by

                      oDoc.AddText (textLoc, oDoc.GetTextForRange(gaText [0]));
                      

                      Inserts nothing - but also no error is reported.

                      Do I need to handle TextItems here ?

                      • 8. Re: How to get formatted text into arrays
                        frameexpert Level 4

                        Hi Klaus, Here is a utility function that will get text from a text range or text object (Pgf, TextFrame, TextLine, SubCol, Cell, etc.):

                         

                        function getText (textObj, doc) {
                            // Gets the text from the text object.
                        
                            var text = "";
                            // Get a list of the strings in the text object or text range.
                            if (textObj.constructor.name !== "TextRange") {
                                var textItems = textObj.GetText(Constants.FTI_String);
                            } else {
                                 var textItems = doc.GetTextForRange(textObj, Constants.FTI_String);
                            }
                            // Concatenate the strings.
                            for (var i = 0; i < textItems.len; i += 1) {
                                text += (textItems[i].sdata);
                            }
                            return text; // Return the text
                        }
                        

                         

                        Then line 14 in your code should be:

                         

                        gaText.push (getText (tr, oDoc));
                        

                         

                        As far as the copying/pasting, I have had some success with ExtendScript. It basically works like this:

                         

                        // Get your text range from some where and make sure it is selected.
                        doc.TextSelection = tr;
                        
                        // Push the current contents of the clipboard onto the clipboard stack so it can be restored later.
                        PushClipboard ();
                        
                        // Copy the selected text.
                        doc.Copy ();
                        
                        // Get the target text range or text location and select it.
                        // This example shows a location at the beginning of a paragraph.
                        targetTr = new TextRange (new TextLoc (pgf, 0), new TextLoc (pgf, 0));
                        doc.TextSelection = targetTr;
                        
                        // Paste the text from the clipboard.
                        doc.Paste ();
                        
                        // Restore the original clipboard contents.
                        PopClipboard ();
                        
                        • 9. Re: How to get formatted text into arrays
                          K.Daube Level 1

                          Again, Ric, thank You very much for the time you spend for me.

                          The GetText () routine is already in heavy use in my script - but here I need to find a method to place already formatted text. The format of bibliographic citations vary - and they may contain italic, bold, underline and even expont as emphasis.

                          D. Garneau, Ed., National Language Support Reference Manual (National language Information Design Guide. Toronto, CDN: IBM National Language Technical Centre, 1990.

                          The first paragraph which I read with the test script cotains a word in italics - and of course this format is not  transported with the GetText () function. I need to conserve the formatting to be able to replace the temporary citations with the formatted ones:

                          from [[Garneau, 1990 #12]] To [9] to the above mentioned paragraph. If I have all these elements (from the imported RTF) in arrays, I can use indizes and do not need to loop through paragraphs to find them...

                          This finally found paragraph then replaces the temporary citations in FM footnotes.

                          I also need to figure out what to do in Russ Wards SearchReplace script to handle this. Maybe I must find the equivalent to "Replace by Paste". And that's the reason why I was experimenting with Paste also. But in my tests oDoc.Paste() or oDoc.Paste(0) do nothing, And I remeber Russ' note about the unreliability of copy/paste...

                          I didn't think that text handling is that complicated in Escript...

                          • 10. Re: How to get formatted text into arrays
                            Russ Ward Level 4

                            Klaus,

                             

                            Copy/paste should work. When I question the reliability, I don't mean that the basic functionality is unreliable. I'm just saying that I would not always trust it to absolutely maintain the original formatting. And, that's just because FrameMaker is kind of designed to resist format overrides. That said, normally a copy/paste will normally maintain format overrides. I just wouldn't bet my piano on it.

                             

                            If your paste action is doing nothing, I would suggest that you are doing something wrong. Either:

                             

                            - There is nothing on the clipboard, because you did not properly select some text before executing Copy()

                            - Your insertion point for the paste is an invalid location.

                             

                            The way you test this is to combine it with manual actions. For example, run just the Copy() operation alone, then manually try to paste the content somewhere. If it doesn't work, you know the problem is with the copy attempt. If it does, then try the same thing with the paste. Manually copy something, then run a script that goes straight to the paste. The clipboard is all the same... so you just have to iteratively step through the actions while troubleshooting. Lots of ES troubleshooting follows this methodology, since there is really no way to "see" what is happening inside the app.

                             

                            Russ

                            • 11. Re: How to get formatted text into arrays
                              K.Daube Level 1

                              Thanks Russ for this advice. The first part (fille the clipboard) was OK, I tested the contents of the clipboard with a clipboard-inspector utility. Neverless I tested again:

                              var oDoc = app.ActiveDoc;
                              var pgf  = oDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;
                              var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;
                              
                                var tr = new TextRange();                       //get text selection for paragraph             
                                tr.beg.obj = tr.end.obj = pgf;
                                tr.beg.offset = 0;
                                tr.end.offset = Constants.FV_OBJ_END_OFFSET;    
                              
                                oDoc.TextSelection = tr;
                                oDoc.Copy();                                    // Clipboard can be pasted manuall
                              
                              // Add a new paragraph after the current paragraph.  
                                var newPgf = oDoc.NewSeriesPgf (lastPfg);       // OK
                                var textLoc = new TextLoc (newPgf, 0);          // cursor nowhere  
                              

                              Run script
                              => new para at end
                              Put cursor therein and paste => OK
                              ... OK all the time

                               

                              However, the second part (paste) is quirky - don't know where the problem really is. Since on my system the ESTK has no connection to FM (since FM-10) I tested on the system of my wife - but the same effects there:

                              var oDoc = app.ActiveDoc;
                              var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;
                              
                              // Add a new paragraph after the current paragraph.  
                                var newPgf = oDoc.NewSeriesPgf (lastPfg);       // OK
                                var textLoc = new TextLoc (newPgf, 0);  
                                
                               oDoc.Paste ();                         // random results
                              

                              Copy the first paragraph manually
                              Run script
                              => new para at end, nothing pasted into
                              copy again manaully
                              run script
                              => new para at end, nothing pasted into
                              copy again manaully
                              run script
                              => new para at end, nothing pasted into
                              run script (with old clipboard contents)
                              => new para at end, pasted as second para
                              run script (with old clipboard contents)
                              => new para at end, pasted as third para

                               

                              Is this black or white magic?

                              • 12. Re: How to get formatted text into arrays
                                frameexpert Level 4

                                Klaus, OK, before you paste, you need to set the TextSelection to your insertion point.

                                 

                                // Add a new paragraph after the current paragraph.  
                                var newPgf = oDoc.NewSeriesPgf (lastPgf);
                                var textRange = new TextRange (new TextLoc (newPgf, 0), new TextLoc (newPgf, 0));
                                
                                
                                oDoc.TextSelection = textRange;
                                oDoc.Paste ();
                                

                                 

                                -Rick

                                • 13. Re: How to get formatted text into arrays
                                  Russ Ward Level 4

                                  I'll also mention that text locations/ranges are a little weird with the last paragraph in the flow. Something to do with the end-of-flow mark, or something else, I don't know. Any time I build a document paragraph by paragraph, I always leave that last paragraph as empty, then insert next-to-last paragraphs and build within those. Then at the end, just delete the last empty paragraph. I find it much cleaner.

                                   

                                  Russ

                                  • 14. Re: How to get formatted text into arrays
                                    K.Daube Level 1

                                    Thank You both! This part now works.

                                    // Copy first paragraph to the end of flow
                                    var oDoc = app.ActiveDoc;
                                    var pgf  = oDoc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;
                                    var lastPfg = oDoc.MainFlowInDoc.FirstTextFrameInFlow.LastPgf;
                                    
                                      var tr = new TextRange();                       //get text selection for paragraph             
                                      tr.beg.obj = tr.end.obj = pgf;
                                      tr.beg.offset = 0;
                                      tr.end.offset = Constants.FV_OBJ_END_OFFSET;    
                                    
                                      oDoc.TextSelection = tr;                        // get the paragraph
                                      oDoc.Copy();                                    // Clipboard contains formatted stuff
                                    
                                      var newPgf = oDoc.NewSeriesPgf (lastPfg);       // Add a new paragraph at end of flow
                                      var textLoc = new TextLoc (newPgf, 0);  
                                      var textRange = new TextRange (textLoc, textLoc);
                                      oDoc.TextSelection = textRange;                 // set TextSelection to insertion point
                                    
                                      oDoc.Paste ();                                  // "replace" TextSelection
                                    

                                    Steep learning curve!

                                    Mind boggling concept (or limited brains)

                                    One little step - but for walking every step has equal importance.