15 Replies Latest reply: Dec 1, 2011 1:14 AM by jmlevy RSS

    Hunspell dictionary vs Proximity

    jmlevy Community Member

      I am looking for informations about differences between those two dictionaries. Until 5.5 the only one available was the Proximity one. Has anybody switched to Hunspell?

      Any feedback appreciated, thanks.

        • 1. Re: Hunspell dictionary vs Proximity
          Steve Werner ACP/MVPs

          Hunspell dictionaries are open source and you can read about them here:

           

          Hunspell - Wikipedia, the free encyclopedia

           

          I doubt they have received much testing in InDesign yet, however.

          • 2. Re: Hunspell dictionary vs Proximity
            jmlevy Community Member

            Thanks Steve but I knew that. But, practically, do you know if one is better than another, mainly about hyphenation? I am not able to understand the differences even if I have noticed some text reflowing when switching fromone to another.

            • 3. Re: Hunspell dictionary vs Proximity
              Steve Werner ACP/MVPs

              The general principle I follow is: If things are working and there's no compelling reason to upgrade, I stay with what I have.

              • 4. Re: Hunspell dictionary vs Proximity
                jmlevy Community Member

                Yes, I deeply agree! But as a member of the IT team of a press magazine group, I have to check and test any new feature or improvment.

                • 5. Re: Hunspell dictionary vs Proximity
                  Joel Cherney MVP

                  I'm told by more than a few language pros that the hyphenation for many European languages is superior in Hunspell.

                   

                  For English, though, I think I can tell you why they are different. If you have an English word to be hyphenated, and that word does not appear in the hyphenation dictionary, then both Proximity and Hunspell will try to hyphenate algorithmically. That is, it'll guess "Oh hey there's two vowels folllowed by a consonant here, so the most likely correct hyphenation would look like so." The Proximity hyphenation algorithm is a commercial product. Hunspell's hyphenation algorithm is ye olde algorithm from TEX. And, if you are the kind of person who wants to understand what's going on behind the scenes, the pdf in that second link will be completely fascinating. (Otherwise it'll be a great insomnia treatment. For my own part, I'm riveted, but your mileage may vary.)

                  • 6. Re: Hunspell dictionary vs Proximity
                    John Hawkinson Community Member

                    Like Joel, I've also heard that Hunspell is vastly superior for non-English languages. The other huge benefit is that you can just add a language dictionary someone else has developed. You couldn't do that with Proximity.

                     

                    Joel, I did find that PDF fascinating, not only because I had no idea that Hungarian could hyphenate a word like "asszonnyal" as "asz- szony- nyal." Now I know why they call it Hunspell! I have to admit, I had always wondered what the fuss was about ("How hard could it be to hyphenate these non-English languages?" I thought. What's with these stories of native speakers *laughing* at hyphenation? I couldn''t imagine how you could laugh at wrong hyphenation in English. But now I see!)

                     

                    But:

                    Hunspell's hyphenation algorithm is ye olde algorithm from TeX.

                    That doesn't seem to be true at all!  That 2006 paper says:

                    The hyphenation algorithm of OpenOffice.org 2.0.2 is a generalization of TeX’s hyphenation algorithm that allows automatic non-standard hyphenation by competing standard and non-standard hyphenation patterns.

                    It's based on TeX's algorithm, but it's not "ye olde" algorithm. It's got quite a few enhancements and changes.

                     

                    Honestly I can't tell  to what extent the paper is discussing Hunspell (either the 2006 Hunspell or the 2011 Hunspell), and whether Hunspell is exactly the same as what Openoffice uses. (I realize OpenOffice uses Hunspell dictionaries and algorithms...) Not to rain on your parade, though. The paper is facinating.

                     

                    TeX remains awesome. Though I wouldn't really have expected TUGBoat would have become a vehicle that was scoped beyond TeX. Fascinating.

                    • 7. Re: Hunspell dictionary vs Proximity
                      jmlevy Community Member

                      Since I work in a French company, I am very interested to see if I can improve the way our texts are hyphenated with Hunspell. I will read this pdf carefully tomorrow. Thanks for the link.

                      • 8. Re: Hunspell dictionary vs Proximity
                        Joel Cherney MVP

                        The PDF I linked to will satisfy your curiousity about how the hyphenation algorithms derived from ye olde TeX algos (and not  the algos of some tawdry commercial op like Proximity, there you go, John ) work, but not how well Hunspell dictionary support will improve your French hyphenation in ID.  One of the cool things about Hunspell is that it has a dictionary format that is easy to use, and it was associated with the OpenOffice project. So, many, many people have contributed to these freely available dictionary files. As a result of all of that hive-mind action, these dictionaries are reportedly quite good - often better than commercial offerings in some languages. But that article is just going to tell you how to handle hyphenation in languages like Hungarian or Catalan or whatever that have non-standard hyphenation rules. Fascinating stuff (for people like me, anyways), but not directly applicable to your situation.

                        • 9. Re: Hunspell dictionary vs Proximity
                          Joel Cherney MVP

                          some tawdry commercial op

                          I hope it's obvious that this is a free-software-nerd in-joke - no offense meant to the folks at Proximity.

                          • 10. Re: Hunspell dictionary vs Proximity
                            John Hawkinson Community Member

                            8. Nov 29, 2011 7:52 PM (in response to jmlevy)

                            9. Nov 29, 2011 7:57 PM (in response to Joel Cherney)

                            Wow, they got to you fast, Joel. I'm sure you'd have some way to send us a coded message it you needed us to head over there and bust you out.

                            • 11. Re: Hunspell dictionary vs Proximity
                              jmlevy Community Member

                              Well, I read this very instructing pdf. As you wrote, Joel, it does not give answers about French hyphenation . I made some tests with the same texts in two different files (one with Hunspell and the other with Proximity) and there are not so many differences, and i get better results with Proximity.

                              Thanks for your contributions!

                              • 12. Re: Hunspell dictionary vs Proximity
                                John Hawkinson Community Member

                                I'm worried about Joel. Do you think they got to him? There's plenty of tawdry free software out there too...

                                 

                                I'm under the impression that you may have to load the French Hunspell dictionary by hand to get comparable results to Proximity. CS5.5 shipped with DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8 from http://www.dicollecte.org/. They are up to 4.3 at present...

                                 

                                I think the instructions are somewhat crazy right now (I can't imagine they aren't planning on improving this). See Miguel Sousa's post at

                                http://blogs.adobe.com/typblography/2011/11/how-to-enable-more-languages-in-indesign-cs5-5 .html (Method Two: Hunspell dictionary) but in short (ha, ha!):

                                 

                                1. Download the Hunspell dictionary from OpenOffice.org (etc.; follow links to wherever. Actually, maybe you want libreoffice.org now, I dunno.). Note that the hyphenation dictionary is older and seperate, at least for French.

                                2. Rename the .oxt file to .zip, if necessary (doesn't seem required here)

                                3. Extract zip file, find the .aff and .dic files, also the hyph*.dic file.

                                4. Rename the files to the ISO language/country codes, e.g. fr_FR.aff, fr_FR.dic, hyph_fr_FR.dic.

                                5. Put all together in a language_country folder, e.g. fr_FR

                                6. Install in:

                                •     Win :                      %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin\Dictionaries
                                •     Mac:                      /Library/Application Support/Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin.bundle/Contents/Shar edSupport/Dictionaries

                                7. Check the Info.plist file and ensure that your language is listed under all three of SpellingService, HyphenationService, and UserDictionaryService, or at least as many as you want.For French, you should be good-to-go. This Info.plist file is at:

                                •     Win :                      %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin
                                •     Mac:                      /Library/Application Support/ Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin.bundle/Contents

                                8. Restart InDesign

                                 

                                [ I don't know what's going on with French hyphenation, CS5.5 seems to ship with an hyph_fr_FR.dic that's marked Adobe Confidential  Version 1.0.2 October 8 2010, but dicollecte.org has "Version 2.0" from 2008. Not sure where Adobe's version is from? ]

                                 

                                P.S.: I'm really out of my depth on this stuff...

                                • 13. Re: Hunspell dictionary vs Proximity
                                  jmlevy Community Member

                                  Thanks for all, John. I follow all those instructions, step-by-step…and just saw that those files were already installed! And they are newer thant those I just downloaded.

                                  • 14. Re: Hunspell dictionary vs Proximity
                                    John Hawkinson Community Member

                                    Wait, really? The machine I looked at with CS5.5 had DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8. Are you saying you have 4.3 installed? [Hrmm...maybe that machine was running 7.5.0 instead of 7.5.2? Yikes!] Are you sure? It does appear there's confusion about the hyphenation dictionary versioning, for sure, but the regular dictionary, not so much.

                                     

                                    I'm sorry for wasting your time!!

                                     

                                    p.s.: Meanwhile, Joel has been captured on the high seas by language pirates. I hope they don't send the ransom note in a bottle. That would just take the cake.

                                    • 15. Re: Hunspell dictionary vs Proximity
                                      jmlevy Community Member

                                      I'll check at work tomorrow (it's almost 9PM) which version is installed. And I did not waste any time, don't worry.

                                       

                                      @John

                                      Last check: you were right about the spellcheck dictionary, we have the 3.8 version installed and the last one available is 4.3. We do not use any spellcheck tools in InDesign, since we have a third-party spellchecker in our workflow, so my main concern is about the hyphenation dictionary and the last available version on http://www.dicollecte.org/ is version 2 which seems the same as the one shipped with 5.5.

                                       

                                      Message was edited at 10:15 am by: jmlevy