9 Replies Latest reply on Feb 1, 2012 7:34 AM by Dimitra.p

    Indesign Dictionary - mass export & edit

    Dimitra.p

      Is there a way to export or locate the entire dictionary for a specific language in order to be able to edit all words therein and not just the user added words?

       

      We have a constant problem with a client where a specific hyphenation rule is required but is not provided in the hyphenation options. The client does not want the hyphenated syllables to start with a vowel after a hyphen. What I have been doing in order to avoid this is to right click the word where this happens and add it to the dictionary with my preferred way of hyphenation. It is, however, a time consuming and not very effective way to tackle this issue since I have to manually locate each word and it might take many months/even years until I have built a large enough dictionary.

       

      I'm using Indesign CS-5, Windows 7

        • 1. Re: Indesign Dictionary - mass export & edit
          [Jongware] Most Valuable Participant

          Dimitra.p wrote:

           

          We have a constant problem with a client where a specific hyphenation rule is required but is not provided in the hyphenation options. The client does not want the hyphenated syllables to start with a vowel after a hyphen.

           

          Is that a specific demand of your client, and not related to the regular hyphenation rules for the language the document is in?

          • 2. Re: Indesign Dictionary - mass export & edit
            Dimitra.p Level 1

            Yes, that is not a regular hyphenation rule for the language, more of an aesthetic preference. It is something my client wants to avoid, and keeps correcting it manually but because I' working with books this is too cumbersome.

            • 3. Re: Indesign Dictionary - mass export & edit
              [Jongware] Most Valuable Participant

              So ideally, you would "export" the current dictionary to a plain text file, remove all preferred hyphens before a vowel, and then "import" this dictionary back into InDesign.

               

              It's not that simple For most languages (if not all), InDesign does not carry a complete set of words with all of their breaking points. Instead, each language is hyphenated using a plugin -- a program that applies fixed rules to each word. There might be a small set of words that don't 'play by the rules' and would be recognized as such by the plugin, but that should rarely happen. (A well-known case is Knuth-Liang's otherwise very good algorithm for TeX. It cannot hyphenate the word "manuscript" at all!)

               

              You cannot change the behavior of an existing hyphenation plugin, and writing your own custom language hyphenation module is technically possible, but it would require you to implement every single rule correctly -- and then add the "last minute" requirement 'not before a vowel'.

               

              So I propose a different approach. If you add a GREP style to the paragraph styles that have hyphenation switched on, it can add a No Break attribute to this and the preceding character. The GREP query is:

               

              \w[aeiou]

               

              and it should apply a character style that sets No Break, nothing more. In this screenshot you can see it works for English; the left hand column is hyphenated as usual, the right hand side has no hyphens at all before a vowel.

              Notice that some words might not be broken anymore at any "good" place so it disappears into Eternally Overset Text. I presume that with regular words and a regular column width, this should not be a problem ...

               

              (This GREP style works for English. You don't say what language this is for, but if it's a heavily accented language, you would use the slightly more convolved GREP

               

              \w[[=a=][=e=][=i=][=o=][=u=]]

               

              -- not tested. And if you don't use the Latin alphabet at all, you have to experiment for yourself )

               

              vowelhyphen.PNG

              1 person found this helpful
              • 4. Re: Indesign Dictionary - mass export & edit
                Dimitra.p Level 1

                Thank you so much, this seems to be on the right track, although it is not working entirely yet. It is the Greek language I need this for.

                 

                If I embed the GREP string to the paragraph style, it doesn't seem to be working. It does something to the paragraphs but some words are left hyphentaing as before.

                 

                If I do a regular Find and Change GREP replacement with either:

                 

                \w[[=α=][=ά=][=ε=][=έ=][=η=][=ή=][=ι=][=ί=][=ϊ=][=ΐ=][=ο=][=ό=][=υ=][=ύ=][=ϋ=][=ω=][=ώ=]]

                or

                \w[αάεέηήιίϊΐοόυύϋωώ]

                 

                that applies the no-break character style it works!

                 

                Have any idea why embedding the GREP style behaves differently? I am not using any other character styles in that paragraph.

                • 5. Re: Indesign Dictionary - mass export & edit
                  John Hawkinson Level 5

                  You cannot change the behavior of an existing hyphenation plugin, and writing your own custom language hyphenation module is technically possible, but it would require you to implement every single rule correctly -- and then add the "last minute" requirement 'not before a vowel'.

                  In CS5.5, which Dimitra.p is not using, there is support for Hunspell dictonaries. I'm not a Hunspell expert, but my understanding is they permit a rich hyphenation language to define these things, and the dictionaries themselves are [generally] open source. So it's probably feasible to modify a Hunspell dictionary to add support this requirement.

                   

                  That may or may not be a better solution.

                   

                  And again, it requires CS5.5.

                  • 6. Re: Indesign Dictionary - mass export & edit
                    [Jongware] Most Valuable Participant

                    Yes, this is kind of weird. Usually, a GREP style behaves exactly like a regular GREP find. It must have to do something with how the hyphenation plugin for Greek works, and if so all you can do is keep on trying different things.

                     

                    By the way: in GREP the single characters α or ά search for these exact characters, but the notation [=α=] should look for *all* possible alpha's -- with or without accents, uppercase or lowercase, etc. If you want to look for any vowel with any accent at all, this somewhat shorter string should be enough, with one entry for each vowel:

                    \w[[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]

                     

                    Just in case you forgot one accent.

                     

                    I just tried on a piece of Greek text I had lying around ("Let me tell you about that fearsome traveller", something like that ) and indeed it seems to fail at random.  Can you try the following in your GREP style:

                     

                    (1) change the GREP expression to this:

                    (?<=\w)[[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]

                     

                    -- it applies the No-Break to slightly less characters. Possibly the non-breakable string gets too long so ID breaks it anyhow?

                     

                    (2) add a thick solid red underline to the GREP character style, so you can see where it is applied in your text. Is it applied at every place you expected? That would look something like this:

                     

                    odyssey.png

                     

                    -- notice the failure on the 5th line ...

                     


                     

                    Scratch that -- all of the above! I think I found it. It may be due to how the GREP style processes a paragraph: only once, from start to end, so it only has to look at each character once. After I changed the GREP style expression to this

                     

                    \w[[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]+

                     

                    it suddenly does a lot more than the previous tries, and I think this one finally works.

                     


                    John's suggestion of editing a Hunspell dictionary may work, I'm afraid I'm having to admit zero knowledge on that particular topic.

                    • 7. Re: Indesign Dictionary - mass export & edit
                      [Jongware] Most Valuable Participant

                      Yeah, fairly sure this one does work:

                       

                      \w[[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]+

                       

                      -- but in case of doubt, add a red background so you can double-check it gets applied where you thought it would.

                       

                      odyssey2.png

                      • 8. Re: Indesign Dictionary - mass export & edit
                        Peter Spier Most Valuable Participant (Moderator)

                        [Jongware] wrote:

                         

                        Possibly the non-breakable string gets too long so ID breaks it anyhow?

                        In my experience ID will always go into overset when no-break is applied in such a way that a line cannot break and becomes too long for the column width.

                        • 9. Re: Indesign Dictionary - mass export & edit
                          Dimitra.p Level 1

                          I tried this version of the GREP string:

                          \w[[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]+

                           

                          but it almost applied the no-break style to all the text.

                           

                           

                          The GREP string that works is: [[=α=][=ε=][=η=][=ι=][=ο=][=υ=][=ω=]]

                           

                          Without the \w it seems to be only applying the style to the vowels and not both the preceeding character and the vowel so I get a much nicer hyphenated text.

                           

                          I'm not sure yet if the + at the end is needed, it seems to be working either way.

                           

                          The other way John Hawkinson describes with the Hunspell dictonaries sounds interesting but as noted I am working with Indesign CS5 and it is not supported in this version.

                           

                          Thank you all so much!