3 Replies Latest reply on Oct 6, 2011 1:10 PM by L. Guy O'Rojo

    Extract text, unformatted, tagged by paragraph style

    srsgala

      I am using InDesign CS5 and 5.5.  I have a large publication and I want to export the text to Excel and be able to sort by paragraph style.  In our publication, paragraph styles designates specific data types for in-house use.  My final goal is to be able to import the text to excel, by paragraph, tagged by paragraph style, unformatted.  I installed the Rorohiko Text Editor plug-in, and it exports all text, but does not have a function to tag by paragraph style, and style sheet formats are maintained (tabs and line breaks). When I export using the text exporter, or import to XML or Excel, all the line breaks in the style sheet are transferred over too, causing the paragraphs to be broken up into multiple cells.   I want each paragraph to be in it's own cell, and identified by paragraph style.  I managed to export the tagged text.  Maybe I can use a script to extract the text?  I have used the "export all stories" script, but I get the same results with the line breaks. Any help would be appreciated!

        • 1. Re: Extract text, unformatted, tagged by paragraph style
          L. Guy O'Rojo Level 2

          Mac or PC? You neglected to specify.

          • 2. Re: Extract text, unformatted, tagged by paragraph style
            srsgala Level 1

            I have CS5.5 on a Mac and CS5 on a PC.

            • 3. Re: Extract text, unformatted, tagged by paragraph style
              L. Guy O'Rojo Level 2

              -- An example pulled from an old ToC script, with comments for your situation

              -- Applescript (Mac) only, CS4; disregard the page number part, I think this has

              -- changed in CS5

               

              set paraStyleName to "Header_01" -- stylename defined here

              tell application "Adobe InDesign CS4"

                -- set these at the application tell level

                -- always initialize to nothing

                set find text preferences to nothing

                set change text preferences to nothing

               

                set applied paragraph style of find text preferences to paraStyleName -- or use character style

               

                tell document 1

                set foundRefs to find text

                -- result is a list

                set doc_index to ""

                repeat with aRef in foundRefs -- turn the list into paragraphs w tab delimiter

                set doc_index to (doc_index & (text 1 thru -2 of (contents of (aRef as text))) & aTab & (name of parent of parent text frames of aRef)) & aCR

                -- that makes a ToC; you could use something like:

                -- set doc_index to (doc_index & (paraStyleName & aTab & (text 1 thru -2 of (contents of (aRef as text)))) & aCR)

                -- resulting in paraStyleName & tab & text & CR

                end repeat

               

                set doc_index to text 1 thru -2 of doc_index -- eliminate trailing CR

               

                end tell

                -- always end by setting to nothing

                set find text preferences to nothing

                set change text preferences to nothing

              end tell

              -- then write doc_index to file