5 Replies Latest reply on May 17, 2015 8:09 AM by RiverPines

    GREP to clean up table cells

    RiverPines

      I'm importing Word documents into InDesign and using a FindChangeByList script to do a lot of cleanup on the imported text. There are lots of tables in the Word text and I would like them to get cleaned up as well – specifically, I'd like to remove all white space at the beginning and end of every cell. Here's what I'm having trouble removing:

      • single paragraph returns at the beginning of the cell

      • single paragraph returns at the end of the cell (after the last line of text)

      • single spaces after the last word in the cell (this one's not as important as the others)

       

      I was hoping to be able to add GREP searches to my FindChangeByList script but I've had no luck so far. (I can't seem to find a GREP string for "beginning of table cell" or "end of table cell.")

       

      Any help would be appreciated. Thanks!  (Using InDesign CC 2014.2)

       

      -- Don Williams

        • 1. Re: GREP to clean up table cells
          Eugene Tyson Adobe Community Professional & MVP

          There is no GREP string to start and table cell

           

          I'm sure it could be scripted - therefore added to the Find Change By List script you already  have

           

          Here's an example over here.

           

          http://indesignsecrets.com/tackling-tables-through-scripting.php

          • 2. Re: GREP to clean up table cells
            [Jongware] Most Valuable Participant

            Every cell has a little story of its own (which, come to think of it, sounds like a biology primer - or maybe The Shawshank Redemption). There are codes for "Beginning of Story": \A and "End of Story": \Z. So the following would work:

             

            FInd: \A\r+

            Change to nothing

             

            and

             

            FInd: \s+\Z

            Change to nothing

             

            The latter is \s rather than \r so it removes all kinds of whitespace: returns, spaces, tabs, and the smattering of fixed width spaces that Word sometimes inserts. (Afterthought) They remove all returns (and spaces); if by "single returns" you mean "only if there is 1, not when more", use this one:

             

            \A\r(?!\r)

             

            There is a drawback when using this in a global change, as FindChangeByList works on the entire document: all stories are searched, including the main story, footnotes, and anchored objects. You cannot easily add a quantifier "only in tables" to it so if this may be an issue, ask for a specific script for tables only.

            • 3. Re: GREP to clean up table cells
              RiverPines Level 1

              Thanks for the responses.

               

              The GREP find/change codes provided by Jongware work great in the Find/Change dialog box, but I was hoping to be able to include these find/changes in my existing FindChangeByList script. Unfortunately, I can't seem to make them work. Below is my revised script. The first part is just some basic text cleanup stuff so you get an idea of what's been done before the cleanup I'm trying to do in the table cells which comes at the end.


              (The script pasted in as some kind of table and I'm not sure how to format it to be more readable. I'd like to make the GREP column wider -- since it wraps weirdly -- but I can't figure out how to do that.)

               

              In the last five lines in the script I tried the GREP as provided by Jongware, and I also tried using Positive Lookahead and Positive Lookbehind. Nothing seems to work. (Note that I didn't have all five lines at the end active at the same time as I was running the script. I commented out various lines using "//" as I was trying things.)

               

              Am I doing something wrong, or is it just not possible to remove beginning and ending whitespace in table cells without more advanced scripting?

               

              Thanks!

              -- Don

               

               

              grep{findWhat:"~e"}{changeTo:" . . . "}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find ellipses and replace with space-period-space-period-space-period-space.
              grep{findWhat:" /"}{changeTo:"/"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find spaces followed by a slash and replace with a slash.
              grep{findWhat:"/ "}{changeTo:"/"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find slashes followed by a space and replace with a slash.
              grep{findWhat:"\n"}{changeTo:" "}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find soft returns and replace with a single space.

              //

              grep{findWhat:"  +"}{changeTo:" "}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find all double spaces and replace with single spaces.

              //

              grep{findWhat:" \t"}{changeTo:"\t"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find spaces followed by a tab and replace with a single tab.
              grep{findWhat:"\t "}{changeTo:"\t"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find tabs followed by a space and replace with a single tab.
              grep{findWhat:"\t\t+"}{changeTo:"\t"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find all double tabs and replace with a single tab.

              //

              grep{findWhat:" (?=\r)"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find spaces followed by a return and remove the space.
              grep{findWhat:"\r "}{changeTo:"\r"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find returns followed by a space and replace with a single return.
              grep{findWhat:"\t\r"}{changeTo:"\r"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find tabs followed by a return and replace with a single return.
              //grep{findWhat:"\r\t"}{changeTo:"\r"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}NOT USED:  Find returns followed by a tab and replace with a single return.
              grep{findWhat:"\r\r+"}{changeTo:"\r"}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find all double returns and replace with single returns.

              //

              grep{findWhat:"^ "}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find all spaces at beginning of paragraphs (like in table cells) and delete them.

              //

              grep{findWhat:"\A\r"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find returns at beginning of stories (like in table cells) and delete them.
              grep{findWhat:"\s+\Z"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find whitespace at end of stories (like in table cells) and delete it.
              grep{findWhat:"(?<=\A)\r"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find returns at beginning of stories (like in table cells) and delete them.
              grep{findWhat:"\r(?=\Z)"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find returns at end of stories (like in table cells) and delete them.
              grep{findWhat:" (?=\Z)"}{changeTo:""}{includeFootnotes:true, includeMasterPages:true, includeHiddenLayers:true, wholeWord:false}Find spaces at end of stories (like in table cells) and delete them.

              //

              • 4. Re: GREP to clean up table cells
                [Jongware] Most Valuable Participant

                GREP expressions inside the FindChangeByList file are initially regular (Javascript) text strings. That means that combos such as "\r" and "\t" are translated by Javascript before they are fed through to InDesign's GREP find. So the "\t" in your entry actually is fed as a literal "Tab" character into GREP. Now usually this is not a problem, because it works the same. However, if a backslash code is not recognized by Javascript, it throws away the backslash -- so "\A\r" gets translated to "A(literal hard return)" and GREP tries to find this instead.

                 

                The solution is to double all of the backslashes. A double backslash gets translated into a single one, so an input of "\\A\\r" gets read as the exact string "\A\r", which then is moved to GREP and will work as was meant.

                 

                (Theoretically, you don't need to double up all backslashes here. `\r` and `\t` get translated to the same characters as `\\r` and `\\t` inside GREP. But there are some characters that get translated otherwise by Javascript, into an equivalent that is not the same in GREP. From memory: \a, \b; probably some more. So it's safest to always double backslashes.)

                • 5. Re: GREP to clean up table cells
                  RiverPines Level 1

                  Thank you, Jongware. I would not have thought of that. And I also could kick myself for not reading the intro text more carefully in the example FindChangeList.txt file I started with. The last line in the intro mentions "escaping" the backslash characters (by adding another backslash) in the findWhat parameter.

                   

                  The script works perfectly now and it will save me lots of time.