Replacing all text in book with sample text

Report · Jul 11, 2013

Hello fellows,

Is there a way to replace all the textual content in a book with some sample text using Extendscript?

If yes, could you please point me to the relevant APIs?

Thank you for your help in advance!

Report · Jul 14, 2013

Of course this is possible, but it might get very tricky depending on the structure of the document you want to process and on the type of sample text you want to replace the content with. If the sample text should more or less look like readable text, you have to figure out a way to cut that text up into substrings and place them in the right locations so that the end result looks like the sample you are trying to create. It is easier if the output can be complete bogus, as you will not need to care about the length of the text strings that are replaced in each of the paragraphs or subparagraphs your script will find.

Lots of issues to handle. Possibly, using the Find and Replace function would be something to look at in the FrameMaker Scripting Guide. But that function is not exactly the easiest to handle from a script. The other option is to walk through all paragraphs in the flow, then walk through all anchored frames, figure out if they have paragraphs in them, then walk through all table cells and process the text in those.

An important issue is what to do with the many markers and anchors that FrameMaker puts in the text flow. These may or may not take up space in the text (depending on the type of marker), and you do not want to remove all of them. You may want to remove the index and cross-reference markers, but certainly not the anchors for tables and such. Also, walking through all paragraphs in a flow does not lead you through the text that appears in tables or in anchored frames.

Good luck

Jang

Report · Jul 14, 2013

Hi Jang,

I appreciate your response and detailed explanations!

I changed my mind - I would not replace the content with some bogus text as it may also change the amount of text lines present in files - and that's what I would like to avoid.

I would rather prefer to obfuscate the existing text, without changing its amount and flow.

Are you aware of any tools/APIs that can do that?

Thanks for your suggestions in advance!

Report · Jul 14, 2013

It would still come down to walking through the entire list of paragraphs (in all flows that appear on body pages), then through all table cells and also through any text appearing in anchored frames, although I assume that leaving the text in anchored frames (call-outs or sidehead notes) untouched might be acceptable.

I will have a look at the required loops and post a possible solution later today. I do not want to post code that I have not tested first. And this type of script might come in handy for some of my clients, too.

I am assuming that replacing every single character with an 'x' does the trick for you? It would not change the formatting or text flow.

Jang

Report · Jul 14, 2013

Just a caution: replacing every character with an "x" could definitely change the text flow.

Here is the sentence above with each non-space character changed to an "x":

xxxx x xxxxxxxx xxxxxxxxx xxxxx xxxxxxxxx xxxx xx xxx xxxxx xxxxxxxxxx xxxxxx xxx xxxx xxxx

As you can see, the line is a bit shorter.

Rick

Report · Jul 14, 2013

Jang, Rick,

Hi!

First of all, thank you guys for your responses. I appreciate that much!

I wonder if it is possible to obfuscate the content without changing the file size/number of chars on a page. This looks like a quite challenging puzzle.

Have a great day!

Report · Jul 14, 2013

Yes, the point that Rick was making crossed my mind, too. Using regular expressions, you can easily specify which characters should be replaced. Listing a number of same-width characters and replacing them with an "x" is as easy as this:

sTextString.replace ( /[abcdefghknopqrsuvyz]/g, "x" );

This would leave the non-mentioned characters as they are. Depending a little bit on the fonts that are used, of course, as they might have different character widths for more than the ones I left out. But even if you only replace a couple of characters throughout the doc, it would be enough to get the desired effect.

The difficulty is getting to the text strings and also to replace them. There are many objects for which a GetText method exists, but you have to set the flags such that you actually get the right text strings out of that method. And then you have to figure out where in the doc the text string is, then delete the existing one and add the replacement. All of this has to be done without deleting any markers or anchors.

Another approach might be to set the text location to the first character in the main flow and then walking through the entire document character by character, testing each one and replacing it where required. But I don't think you would get into tables with that method, as those are linked to the running text via an anchor in that running text. So the table cells will have to be processed separately. And if you do have a method to change text in table cells, you can use that method on all paragraphs.

I think Rick has more experience in tweaking text strings. I am usually working on structured documents and only handling the element objects and their hierarchy, not so much changing the text content of documents. Rick, do you have an approach to walk through all text strings in a document and replace them without breaking anything ?

Ciao

Jang

Report · Jul 14, 2013

Hi Jang,

Thanks again for your response! It definitely sheds light on the direction I should look into more. As you said, the main challenge here is to replace the text without touching table anchors, merging paragraphs, and sticking headings to the body text.

Have a beautiful day!

Report · Jul 14, 2013

Hi Jang,

As part of playing with the code, I came up with the following code (took bits and pieces from various posts here on the forum). I saved it as a JSX file and ran in FM - no change whatsoever - it returns "Undefined". If I understand the code correctly, it should do the job. Any idea what I am doing wrong?

var pgf = doc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;

while(pgf.ObjectValid()){

var test = pgf.GetText(-1);

var text, str;

text = "";

for (var i=0; i < test.len ; i +=1)

{
var str=test.sdata.replace(/[abcdefghknopqrsuvyz]/g, "xyz");

text = text + str;

PrintTextItem (test);

}

pgf = pgf.NextPgfInFlow;

}

Thank you in advance!

Report · Jul 14, 2013

Hi,

Your script handles the text items that are retrieved from the document, not the document itself. The test strings returned by GetText are copies of whatever is in the document. So you have to get the document locations, clear the text in the document (without removing any anchors or markers, and then add the tweaked text.

This is why I mentioned it is not as simple as it looks. But my script works, so you can use that.

Thanks for the challenge

Jang

Report · Jul 14, 2013

Hello again,

I used the script on another document and Frame crashed. Use the script with care. I guess the text selection is not exactly right or the end of the loop is a little buggy. But the main part of the script works. If I find the error later, I will post it here.

Ciao

Jang

Report · Jul 14, 2013

Hi Jang,

Thank you very much for your great script and for your willingness to help newbies like me! You are a real professional!

Your script is impressive! It managed to replace 99.9% of the content in a file, and the file only lost 1KB from its size. I wonder what could make it pass over some sentences in several paragraphs, and a table. This is more than interesting and worth exploring. Another caveat: there is no "Undo" (not criticism at all -- just food for thought/regular user input ).

I guess if one wants to run this script on a book, the script must be somehow called from the sample iterator.jsx provided by Adobe or something similar. Am I right?

Thank you very much again for sharing the script with us!!!

Report · Jul 14, 2013

The Undo is a matter of Revert to Saved. I don't think any other method would be feasible, as you really want to replace almost ALL the text in a file.

In the meantime I have experimented on a couple of other, more realistic, test documents and I got some crashes in FM11. So instead of using the TextSelection and then walk through all paragraphs in the document, I now use the proper method to select the first paragraph in the first text frame in the main flow and then walk through the linked list of paragraphs down to the end of the flow.

I am still testing this, as there are unforeseen problems with the selection of paragraphs, so be careful when you want to apply this script, especially in a book. Do extensive testing, single stepping and make backups of all your files before you put this into production. It might be a simple typo but I am not sure of that and I do not have time to make this rock solid - at least not today.

Good luck

Jang

Report · Jul 14, 2013

Hi Jang,

Thank you very much for your prompt response!!! I tested the script on a sample book, so no worries. I am using FMv10. It crashed on me several times while running the script. Could you please post the updated code that uses doc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;?

BTW, what is "linked list of paragraphs"? Interesting. It's the first time I encountered this concept.

Again, **Kudos** to you for crafting this script!

Report · Jul 15, 2013

Hi Jang,

I hope you are doing well!

I am trying to understand how the script works and it is not clear to me what "oTexts.offset;" does. Could you please explain this?

In the meantime, I continued testing the script. When I expose all conditions, the script makes FM crash only when it approaches the last paragraph of the file. If I do not expose all conditions, this happens even earlier. So in the meantime, I can't test if it works on a book, although I included a call to the iterations.jsx file.

Wishing you best of luck!

Report · Jul 16, 2013

Hi Jang,

I noticed that what makes the script crash is the line "oDoc.TextSelection = oTRange;". When I replace oDoc.TextSelection with oRange, the script stops crashing. However, it does not delete the original text - the replacing text is added to the original one. How can I delete the original text? You set oDoc.Clear () to 0, so it's not clear what it is used for in this code.

Please, advise.

Thank you in advance!

Report · Jul 16, 2013

You have to set oDoc.TextSelection to a text range before methods like Clear or Copy have any effect: they take the current TextSelection as the range on which they are applied. The flags in the Clear and Copy methods define options for the method, such as suppressing any warnings that might otherwise occur or what to do with hidden text.

That is why nothing is deleted if you do not set oDoc.TextSelection. What you can try is replace the 0 flag in oDoc.Clear with Constants.FF_VISIBLE_ONLY and let me know if that solves the problem.

About an earlier question: the oTexts array that is returned by the GetText method contains text string objects and each of them has a property "offset" which gives the offset within the current paragraph.

Good luck

Jang

Report · Jul 16, 2013

Hi Jang,

Thank you for your response and for the suggestions!

Unfortunately, nothing changes when using oDoc.Clear (Constants.FF_VISIBLE_ONLY). When oDoc.TextSelection = oTRange, the script crashes consistently. If I replace oDoc.TextSelection with "oRange" defined earlier, the script adds text strings consisting of "x" but does not delete the original text.

In your previous post, you said: "So instead of using the TextSelection and then walk through all paragraphs in the document, I now use the proper method to select the first paragraph in the first text frame in the main flow and then walk through the linked list of paragraphs down to the end of the flow." Could you please demonstrate how you changed the script?

As a side note, I should say that the Adobe extendscript documentation is terrible and quite useless for a newbie like me. Many properties cannot be found, descriptions are not informative. Are you aware of any extendscript resource that provides more detailed information?

Thank you very much in advance!

Report · Jul 16, 2013

Hi,

I am single-stepping through the script and find that it also deletes automatic text, such as auto-numbers in a table of contents. That is where Frame dies. So I have to figure out a way to distinguish editable text from auto-generated text and leave the auto-generated text as is. I am hoping that will solve the crashes and make it rock solid.

I will post a solution later, but it might not be today.

Ciao

Jang

Report · Jul 17, 2013

OK, I think I have this nailed now. The devil is in the details, as usual. I have adapted the flags for the GetText method so that only the relevant text strings are returned, i.e. a text line that spans multiple lines will be retrieved as one single line. Some irrelevant objects are ignored, but some have to be retrieved and used to not touch the text that follows. This is all pretty tricky work, but the code below does the trick without crashing, at least in some of the documents I have tested it on. It processes one single flow, so if you have documents with more than one flow, you will have to repeat the script for each flow - after first placing the text cursor at the start of that flow.

The script is getting a little too long to post here, so if you drop me an e-mail to jang at jang dot nl I will send the jsx file to you. That saves you copying and pasting and possible mishaps due to the conversion from jsx to web to jsx.

Ciao

Jang

Report · Jul 17, 2013

Hi Jang,

Thank you very much for the update. Real developers do not give up -- you are the real one! I absolutely agree with you that the devil is in details, and the devil is really strong if the details are not documented properly.

I will contact you offlist.

Have a great day, and keep up your great work!

Report · Jul 17, 2013

Hi dear Jang,

Not sure you got my email. Looking forward to testing the script on my files.

Thanks again!

Report · Jul 17, 2013

Hi Jang,

Thank you for your reponse! I appreciate your help!

I also kept running tests yesterday, while modifying the script in order to spot the cause of the crash. The original script crashes even when being run on a single file that does not contain automatic text. It stops somewhere in the middle of the file and FM crashes. I am talking about FM 10 files full of tables, variables, conditions, figures, etc. - not just text paragraphs. An additional problem that I haven't managed to address yet is that the table cells are being ignored. I tried adding | Constants.FTI_TblAnchor to the if statement, but it did not make any change. The more I work on this, the more interesting it becomes.

Report · Jul 14, 2013

OK, I have figured it out. Here is a script that works across all text in the main flow after opening the document.

var oDoc = app.ActiveDoc;

var oRange = oDoc.TextSelection;

var oPgf = oRange.beg.obj;

var oTLoc1 = new TextLoc;

var oTLoc2 = new TextLoc;

var oTRange = new TextRange;

var sNewTxt;

while ( oPgf.ObjectValid ( ) )

{

var oTexts = oPgf.GetText ( -1 );

oTLoc1.obj = oPgf;

oTLoc2.obj = oPgf;

for ( i = 0; i < oTexts.length; i++ ) {

if ( oTexts.dataType == Constants.FTI_String ) {

oTLoc1.offset = oTexts.offset;

oTLoc2.offset = oTexts.offset + oTexts.sdata.length;

oTRange.beg = oTLoc1;

oTRange.end = oTLoc2;

oDoc.TextSelection = oTRange;

oDoc.Clear ( 0 );

sNewTxt = oTexts.sdata.replace ( /[a-z]/g, 'x' );

sNewTxt = sNewTxt.replace ( /[A-Z]/g, 'X' );

oDoc.AddText ( oTLoc1, sNewTxt );

}

oPgf = oPgf.NextPgfInDoc;

}

Adobe Community

Replacing all text in book with sample text

1 Correct answer