• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

ExtendScript: Get all text from a document

Guest
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

Hi all.

I have the following task: I need to translate a document into another language using ExtendScript. So, as "input" I have a document with a text/graphics/tables/etc. in Language_1 and a "somehow-separated file", which will contain data about translation into the Language_2. E.g.:

Some_text_in_language_1     Some_text_in_language_2

Some_other_text_in_language_1     Some_other_text_in_language_2

To get the source text from the document, I've tried to use this:

var pgf = doc.MainFlowInDoc.FirstTextFrameInFlow.FirstPgf;

while(pgf.ObjectValid()){

     var test = pgf.GetText(Constants.FTI_String);

     var text, str;

     text = "";

     for (var i=0; i < test.len ; i +=1)

     {

          var str=test .sdata.replace(/^\s+|\s+$/g, '') ; 

          text = text + str;

          PrintTextItem (test);

     }                      

     pgf = pgf.NextPgfInFlow;

}

But with this, I can only access the regular text in the document (e.g. the text in tables remains untougched). Is there any way I can the all textual data from specified document? Or maybe, the full list of controls, which can contain it, to iterate throught them and extract it one-by-one? Or maybe there's a better way to solve this problem?

Thanks in advance! Any advice would be greatly appreciated.

TOPICS
Scripting

Views

3.8K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

Hi,

GetText delivers an array of text items.

Text items could be text but also table anchors, markers etc.

You'll never get text of a table if you call pgf.GetText(Constants.FTI_String).

If you want to have text and table, you have to call

var textItems = pgf.GetText(Constants.FTI_String | Constants.FTI_TblAnchor);

After that you have to loop through the textitems, and check for table anchors. Then you can get text from that table resp. table cells.

If you want to have all kind of text item types, you can call

var textItems = pgf.GetText(-1);

Hope this helps

Markus

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

Thanks a lot for the answer, Markus. Indeed, the "-1" seems like my salvation

Just to clarify one thing: with this construction, I get all the textual data in a straight way also as though anchors. In my test document, I've noticed only table anchors. Is there any other elements, that can contain text and will be returned by this construction as anchors?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

There is another way to loop through ALL paragraphs in a document, regardless whether they are in a table or in the main text flow. You can use the FirstPgfInDoc property of the document and loop through all Pgf objects using the NextPgfInDoc property of the Pgf until you reach an invalid object. Note that this also includes all paragraphs in the master and reference pages, so it might be useful to check where the Pgf is located (on a body page or not). There is a script on this forum that does that - I believe it was created and posted by Rick Quatro.

Working your way through the main text flow does not guarantee that you have all the visible text in the doc. There may be multiple flows and there may also be text frames that are placed inside anchored frames. Those text frames are not contained directly in the main flow of the document.

Good luck with your scripting

Jang

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

Thanks for response, Jang.

Working your way through the main text flow does not guarantee that you have all the visible text in the doc. There may be multiple flows and there may also be text frames that are placed inside anchored frames. Those text frames are not contained directly in the main flow of the document.

Wow, that's frustrating I feel like I'm trying to dig a ground with a spoon. Well, that's what was the reason, why I've posted my task. Maybe you could give an advice on alternative way to achieve this goal?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

Just to clarify one thing: with this construction, I get all the textual data in a straight way also as though anchors. In my test document, I've noticed only table anchors. Is there any other elements, that can contain text and will be returned by this construction as anchors?

markers, cross references, variables, footnote, hypertext, equations, text insets, call outs placed on graphics like textframes or text lines.

some hints.

Table title you will get from table object with property "FirstPgf".

markers have a property "MarkerText"

for xrefs you have to get xrefformat an the definition there.

for variables you have to get the variable format and the definition there

for equations you have to get the MathFullForm property.

and so on.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Oct 29, 2012 Oct 29, 2012

Copy link to clipboard

Copied

BTW: you can use Save as XML (in a unstructured document, too).

So you will have your content in the text flow in the xml file and can process that with an very easy XSLT Stylesheet.

Be aware: not all objects (markers a.s.o) are exported to xml in the standard way, as I can see.

Markus

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Oct 31, 2012 Oct 31, 2012

Copy link to clipboard

Copied

LATEST

Okay, thanks again to all for your comments.

I realized that the problem was in my approach. And I ended up with scripting the file translation based on "Find/Replace" function (decided to notice for the ones, who will face same problem).

Good luck!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines