19 Replies Latest reply on Oct 28, 2012 7:28 PM by [Jongware]

    ECleaner

    sperry1975 Level 1

      eCleaner is a mini-editor program that one can use to automatically 'clean up' ones documents from unwanted line breaks. Does anyone know if there is a way to do essentially the same thing in InDesign? Bare in mind that I'm not speaking of going paragraph by paragraph manually I'm speaking of cleaning and entire document quickly and easily like eCleaner can do.

        • 1. Re: ECleaner
          Steve Werner Adobe Community Professional & MVP

          If you use paragraph styles in InDesign (which is highly recommended), you can use the Justification and Hyphenation controls within your paragraph styles to have complete control over how line breaks. InDesign has about the most sophisticated controls of all layout applications.

          • 2. Re: ECleaner
            [Jongware] Most Valuable Participant

            If you can define "unwanted linebreaks", I'm pretty sure a couple of Find/Change or a GREP find/ change operarions can do it.

             

            I'm assuming you are talking about hard returns, though. Proper line breaks *inside* a paragraph can be "fixed" by changing the justification parameters, as Steve said, or by judiciously applied No Breaks, hard spaces, non-breaking hyphens, and the odd soft hyphen or touch of extra tracking. If that IS what you're talking about, I must admit it's pretty hard to believe it can be automated.

            • 3. Re: ECleaner
              sperry1975 Level 1

              I was refering the the unwanted paragraph breaks you get when copying from the web or from a pdf files. If you don't believe the eCleaner 2.02 works then feel free to download it and try it. It's free and works wonders.

              • 4. Re: ECleaner
                sperry1975 Level 1

                Oh, one more thing, as I mentioned in the original post I'm NOT refering to going paragraph by paragraph, but cleaning the entire "documents from unwanted line breaks". If this can be done via Find/Change or GREP that's great. I'd love to know how.

                • 5. Re: ECleaner
                  Peter Spier Most Valuable Participant (Moderator)

                  The following GREP might do most of what you want:

                   

                  Find [\r\n]\l and change to a space character. Follow that with a find for \s\s+ and replace with $1

                   

                  Thisi is looking for any line break or paragraph break followed by a lower case letter and replacing it with a space. It is NOT foolproof, but most of the unwanted breaks will fit that pattern. The second part, also not foolproof, especially if you have tabular data, looks for multiple white spaces and deletes all but the first one. I added this becasue it is not unheard of to have a forced line break following a space. It will also remove breaks you want, though, if your typesetting is sloppy and you end a paragraph with a whitespace after the punctuation, so you might first want to find /s+$ and replace with nothing.

                   

                  If you only want to remove paragraph breaks, but leave forced line breaks, you can use just \r\l instead of [\r\n]\l

                  • 6. Re: ECleaner
                    sperry1975 Level 1

                    Both [\r\n]\l and \r\l appear to work to a degree. When used they also highlight the first letter of the next line therefore if I did a replace with a space then it would delete the first letter of the following line. Thanks for trying!

                    • 7. Re: ECleaner
                      Peter Spier Most Valuable Participant (Moderator)

                      D'Oh.

                       

                      That was dumb of me. Of course it deleted the first letter.

                       

                      Try [\n\r](?=\l)

                       

                      The order of the \n and \r in the class (the surroounding brackets) doesn't matter. (?=\l) is a positive lookahead for a lowercase letter, so you are now selecting only the breaks, and only when followed by  a lowercase letter.

                      • 8. Re: ECleaner
                        [Jongware] Most Valuable Participant

                        > I was refering the the unwanted paragraph breaks you get when copying from the web or from a pdf files. If you don't believe the eCleaner 2.02 works then feel free to download it and try it. It's free and works wonders.

                         

                        Well, that sounds like actual hard returns. "Line breaks" can refer to both manual -- hard returns -- or automatic line breaks.

                         

                        Peter supplied tou with a GREP to remove hard returns that occur in the middle of a sentence. If your paragraps are separated by (at least) 2 hard returns, and you need to get rid of all single ones, that's also possible. One single GREP.

                        • 9. Re: ECleaner
                          [Jongware] Most Valuable Participant

                          [Jongware] wrote:

                           

                          Peter supplied [y]ou with a GREP to remove hard returns that occur in the middle of a sentence. If your paragraps are separated by (at least) 2 hard returns, and you need to get rid of all single ones, that's also possible. One single GREP.

                          .. That single GREP, by the way, would be

                           

                           

                          (?<=[^\r])\r(?=[^\r])

                           

                          -- replace with a single space. This will get rid of all single hard returns -- a pair of hard returns will be left alone. Be ware to first remove all white space before hard returns first.

                          • 10. Re: ECleaner
                            sperry1975 Level 1

                            That's great and all but I tested it and it removed all paragraph breaks, so once again I'm left with the manual way of going through paragraph by paragraph and not being able to do an entire document unless I use eCleaner first. By the way I could do essentially the same thing with \r and replace with a single space and \r is much shorter.

                            • 11. Re: ECleaner
                              Peter Spier Most Valuable Participant (Moderator)

                              Did you try the [\n\r](?=\l) which should remove breaks only if followed by a lower case letter?

                               

                              The find/change by list sample script is a great help, too, in removing extra spaces and empty paragraphs.

                              • 12. Re: ECleaner
                                [Jongware] Most Valuable Participant

                                Yah well that's why I did say

                                 

                                If your paragraps are separated by (at least) 2 hard returns, and you need to get rid of all single ones ..

                                 

                                Perhaps you should post a snippet of your problematic text, so we can actually see why our suggestions are not working for you.

                                • 13. Re: ECleaner
                                  sperry1975 Level 1

                                  It's amazing to me that none of these ideas work and it seems no one can figure out a way to do with InDesign what someone did with such a simple app called eCleaner.

                                  • 14. Re: ECleaner
                                    Peter Spier Most Valuable Participant (Moderator)

                                    You haven't really told us exactly what things you want to correct. I'm sure eCleaner is doing something very similar to what we are suggesting, but chaining many tests together so they execute as a single step. ID can do that, too, with scripting, but you have to define what it is that needs to be changed.

                                     

                                    Saying "clean up" the text is a little nebulous, and as you've seen simply removing paragraph beaks without some sort of restricitions on when they are unwanted can leave you with a real mess.

                                    • 15. Re: ECleaner
                                      sperry1975 Level 1

                                      Once again I was refering the the unwanted paragraph breaks you get when copying from the web or from a pdf files. If you want an example just open a pdf file from the web that is in a size smaller than 8.5 x 11 then open an InDesign document that is 8.5 x 11 then copy the text into the document from the pdf and you will see the unwanted paragraph breaks where they are not suppose to be.

                                      • 16. Re: ECleaner
                                        Peter Spier Most Valuable Participant (Moderator)

                                        [\n\r](?=\l) or \r(?=\l) should find those when they fall in the middle of a sentence. If your line happens to break between sentences in the same paragraph I don't think even eCleaner would be able to tell if the next sentence should start a new paragraph.

                                        • 17. Re: ECleaner
                                          sperry1975 Level 1

                                          Okay, here's a line of text to examine that was NOT run through eCleaner:

                                          [BEGINNING OF EXAMPLE]

                                          She took a notion into her head one day that she

                                          would have a little graveyard all her own. There was

                                          a piece of ground in the garden behind the house

                                          where nothing was planted. A long row of blackberry

                                          bushes hid this corner from the house, and she used

                                          to go down there to play. It was one day after she had

                                          been to visit Thomas Hill, the village undertaker, that

                                          she got the idea of having the grave yard. She went

                                          straight off to the woods, and brought home four pretty

                                          little trees, which she planted in the four corners of

                                          the lot she had chosen; and then, thinking it best to

                                          get permission to use the ground, she went to find

                                          her father.

                                          “Daddy! Daddy!” she called aloud, as he and several

                                          men were threshing grain in the barn. “Will you

                                          give me the northwest corner of the garden?”

                                          “The what, child?”

                                          “The northwest corner of the old garden. It is

                                          bounded on the north by the old apple tree, cast by

                                          the walk, south by the blackberry bushes, and west

                                          by the sweet-corn field.”

                                          There was a general laugh at the conclusion of

                                          this speech. Mother and Hapsey came out to see what

                                          was the matter.

                                          “You needn’t make fun of me,” exclaimed Bertha.

                                          “I tried to be particular, so I could save you the trouble

                                          of going to see the spot.”

                                          “Bertha wants me to deed her the northwest corner

                                          of the garden, mother,” said Mr. Dickinson. “Are

                                          you ready to sign the papers?”

                                          “What do you want it for, my dear?” asked mother.

                                          “Are you going to build a dollhouse?” Her mother

                                          [END OF EXAMPLE]

                                           

                                          Now here's the same line as before after being run through eCleaner:

                                          [BEGINNING OF EXAMPLE]

                                          She took a notion into her head one day that she would have a little graveyard all her own.  There was a piece of ground in the garden behind the house where nothing was planted.  A long row of blackberry bushes hid this corner from the house, and she used to go down there to play.  It was one day after she had been to visit Thomas Hill, the village undertaker, that she got the idea of having the grave yard.  She went straight off to the woods, and brought home four pretty little trees, which she planted in the four corners of the lot she had chosen; and then, thinking it best to get permission to use the ground, she went to find her father.

                                          "Daddy!  Daddy!" she called aloud, as he and several men were threshing grain in the barn.  "Will you give me the northwest corner of the garden?"

                                          "The what, child?"

                                          "The northwest corner of the old garden.  It is bounded on the north by the old apple tree, cast by the walk, south by the blackberry bushes, and west by the sweet-corn field."

                                          There was a general laugh at the conclusion of this speech.  Mother and Hapsey came out to see what was the matter.

                                          "You needn't make fun of me," exclaimed Bertha.

                                          "I tried to be particular, so I could save you the trouble of going to see the spot."

                                          "Bertha wants me to deed her the northwest corner of the garden, mother," said Mr.  Dickinson.  "Are you ready to sign the papers?"

                                          "What do you want it for, my dear?" asked mother.

                                          "Are you going to build a dollhouse?" Her mother

                                          [END OF EXAMPLE]

                                          As you can see eCleaner really works to do a fair amount of the work. Though it's not perfect it's proving to very capable.

                                           

                                          Now here's the same text run through Peter's GREP code \r(?=\l)

                                          [BEGINNING OF EXAMPLE]

                                          She took a notion into her head one day that she would have a little graveyard all her own. There was a piece of ground in the garden behind the house where nothing was planted. A long row of blackberry bushes hid this corner from the house, and she used to go down there to play. It was one day after she had been to visit Thomas Hill, the village undertaker, that she got the idea of having the grave yard. She went straight off to the woods, and brought home four pretty little trees, which she planted in the four corners of the lot she had chosen; and then, thinking it best to get permission to use the ground, she went to find her father.

                                          “Daddy! Daddy!” she called aloud, as he and several men were threshing grain in the barn. “Will you give me the northwest corner of the garden?”

                                          “The what, child?”

                                          “The northwest corner of the old garden. It is bounded on the north by the old apple tree, cast by the walk, south by the blackberry bushes, and west by the sweet-corn field.”

                                          There was a general laugh at the conclusion of this speech. Mother and Hapsey came out to see what was the matter.

                                          “You needn’t make fun of me,” exclaimed Bertha.

                                          “I tried to be particular, so I could save you the trouble of going to see the spot.”

                                          “Bertha wants me to deed her the northwest corner of the garden, mother,” said Mr. Dickinson. “Are you ready to sign the papers?”

                                          “What do you want it for, my dear?” asked mother.

                                          “Are you going to build a dollhouse?” Her mother

                                          [END OF EXAMPLE]

                                          It appears that this code seems to work in the same or similar way as eCleaner. Thanks for the help!

                                          • 18. Re: ECleaner
                                            Peter Spier Most Valuable Participant (Moderator)

                                            You might run across cases where the lines are broken with an empty paragraph between, in shich case my query will not work, but there is a built-in multiple returns to single returns query you can run first that will fix that.

                                            • 19. Re: ECleaner
                                              [Jongware] Most Valuable Participant

                                              ECleaner probably scans for a 'possible original line length'. What a possible original length is, is not really clear. I tried with a monospaced font, but that didn't work -- in that case, line breaks were added on positions where clearly a next word would have fit. So it sort of depends on the original font.

                                               

                                              This quickly written Javascript calculates the average length of all lines, and lines that exceed half of it are joined together. The result is not exactly the same, but it seems to do a fair job nevertheless.

                                               

                                              text = app.selection[0].parentStory;
                                              lines = text.paragraphs.everyItem().contents;
                                              
                                              avg = 0;
                                              for (i=0; i<lines.length; i++)
                                                        avg += lines[i].length;
                                              avg /= lines.length;
                                              
                                              for (i=text.paragraphs.length-2; i>=0; i--)
                                                        if (text.paragraphs[i].contents.length >= avg)
                                                                  text.paragraphs[i].characters.item(-1).contents = ' ';
                                              

                                               

                                              Another strategy: all sentences ending with a lowercase character are most likely broken. This Javascript finds the shortest possible broken line, and then adjusts *all* lines according to this value.

                                               

                                              text = app.selection[0].parentStory;
                                              lines = text.paragraphs.everyItem().contents;
                                              
                                              shortest = 99999;
                                              for (i=0; i<lines.length; i++)
                                              {
                                                        if (lines[i].match(/[a-z]\s*$/) && lines[i].length < shortest)
                                                                  shortest = lines[i].length;
                                              }
                                              
                                              for (i=text.paragraphs.length-2; i>=0; i--)
                                                        if (text.paragraphs[i].contents.length >= shortest)
                                                                  text.paragraphs[i].characters.item(-1).contents = ' ';