8 Replies Latest reply on Jan 30, 2014 5:17 AM by Peter Spier

    Avoid a dash+word to be separated

    camilo umana Level 1

      Screen Shot 2014-01-30 at 05.13.27.jpg

       

       

      In Spanish dashes aren't separated by spaces: Placeland –a disturbed city– that etc.

      That's the reason perhaps the non-breaking space could be used.

       

      How to grep or manage this?

        • 1. Re: Avoid a dash+word to be separated
          Peter Spier Most Valuable Participant (Moderator)

          I think you could set up a character style that applies No Break, then create a GREP style to apply it to the pattern of your dash and the following letter.

           

          For an em dash that would be ~_\w and for the en dash ~=\w

          • 2. Re: Avoid a dash+word to be separated
            camilo umana Level 1

            Yes. it is perfect! Thanks.

             

            I added a + to the end to catch the whole word: ~=\w+

             

            Peter, in the same line, the words that contain the repeated letters, rr, ll and ch cannot be hyphenated in that occurrence.

             

            But these words can be hyphenated at other points:

             

            Carro~ma~to

            Des~en~rollar

            Chi~po~ta~zo

            le~chu~ga

             

             

            I wrote this grep that catches one case

             

            ( \u\l+\Brr) *

             

             

            ideally piping it... [ch|ll|rr]

             

            could you suggest something less naïve?

            • 3. Re: Avoid a dash+word to be separated
              Peter Spier Most Valuable Participant (Moderator)

              I s Spanich assigned as the language? I'm a bit surprised that the Spanish dictionary would allow hyphenation for those cases.

               

              That said, is it sufficient to simply prevent the break in the pair, or do you need to keep the characters on either side as well?

              • 4. Re: Avoid a dash+word to be separated
                camilo umana Level 1

                The idea is to check in the copy after the layout is made...

                many words are not in the dictionary as proper names, for example.

                Thanks!

                • 5. Re: Avoid a dash+word to be separated
                  Peter Spier Most Valuable Participant (Moderator)

                  camilo umana wrote:

                  I added a + to the end to catch the whole word: ~=\w+

                   

                  Are you sure you want to do that? Would you never want a long word preceded by the dash to break at the end of a line? You might be better off with something that would allow a break after a few characters, like ~+\w{1,4} which I think would allow a break after the fourth character.

                  • 6. Re: Avoid a dash+word to be separated
                    Peter Spier Most Valuable Participant (Moderator)

                    I can't believe how bad my typing is this morning.  "Spanich"

                     

                    You didn't really answer if the words should be allowed to break on either side of the pair.

                    • 7. Re: Avoid a dash+word to be separated
                      camilo umana Level 1

                      Sorry, I left.

                       

                      No, nothing referred to long words...

                       

                      Those words where the quoted combinations are present never can hyphenate before the double letter...: never ca~rromato...

                      But in other parts, sure... carro~ma~to

                       

                      The idea is to grep the combinations: [ch|ll|rr]

                       

                       

                      in these paleces we may have a hyphen:

                      Carro~ma~to

                      Des~en~rollar

                      Chi~po~ta~zo

                      le~chu~ga

                      • 8. Re: Avoid a dash+word to be separated
                        Peter Spier Most Valuable Participant (Moderator)

                        OK, so that becomes prrtty simple, I think. You could use "digraphs" inside the class, like [[.ch.][.ll.]], but I don't see the "rr" as a recognized digraph, so there's probably no point in going that way. Instead,  you can go bakc to your original "or" statement, outside a class: LL|Ll|ll|CH|ch|RR|Rr|rr

                         

                        It seems unlikely the upper/lower pairs would ever be present in a location where they might break, so you can problably eliminate them, but if there's a possibility of all caps I'd leave in the upper case versions.

                         

                        Here's a quote about digraphs from Peter Kahrel's book:

                         

                        Digraphs

                        Digraphs such as ae in aerogram and ss in Strasse are matched by [[.ae.]] and

                        [[.ss.]]. This format matches digraphs only when they’re written as two separate

                        letters, so [[.ae.]] doesn’t match æ, nor does [[.ss.]] match ß. It is true of course

                        that any occurrence of ae is matched by the simple expression ae, but there are

                        circumstances where digraphs need to be treated as single characters; see the

                        discussion following “Homemade Wildcards: Character Classes.” InDesign

                        recognizes the following digraphs: ae, Ae, AE, ch, Ch, CH, ll, Ll, LL, ss, Ss, SS,

                        nj, Nj, NJ, dz, Dz, DZ, lj, Lj, LJ. That oe, Oe, and OE aren’t included looks like an

                        omission.

                         

                        And I don't see rr there, either, so a second omission.

                        1 person found this helpful