Skip navigation
jmlevy 479 posts
Oct 30, 2004
Currently Being Moderated

Hunspell dictionary vs Proximity

Nov 29, 2011 6:38 AM

I am looking for informations about differences between those two dictionaries. Until 5.5 the only one available was the Proximity one. Has anybody switched to Hunspell?

Any feedback appreciated, thanks.

 
Replies
  • Currently Being Moderated
    Nov 29, 2011 9:12 AM   in reply to jmlevy

    Hunspell dictionaries are open source and you can read about them here:

     

    Hunspell - Wikipedia, the free encyclopedia

     

    I doubt they have received much testing in InDesign yet, however.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 29, 2011 9:33 AM   in reply to jmlevy

    The general principle I follow is: If things are working and there's no compelling reason to upgrade, I stay with what I have.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 29, 2011 11:38 AM   in reply to jmlevy

    I'm told by more than a few language pros that the hyphenation for many European languages is superior in Hunspell.

     

    For English, though, I think I can tell you why they are different. If you have an English word to be hyphenated, and that word does not appear in the hyphenation dictionary, then both Proximity and Hunspell will try to hyphenate algorithmically. That is, it'll guess "Oh hey there's two vowels folllowed by a consonant here, so the most likely correct hyphenation would look like so." The Proximity hyphenation algorithm is a commercial product. Hunspell's hyphenation algorithm is ye olde algorithm from TEX. And, if you are the kind of person who wants to understand what's going on behind the scenes, the pdf in that second link will be completely fascinating. (Otherwise it'll be a great insomnia treatment. For my own part, I'm riveted, but your mileage may vary.)

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Nov 29, 2011 12:16 PM   in reply to Joel Cherney

    Like Joel, I've also heard that Hunspell is vastly superior for non-English languages. The other huge benefit is that you can just add a language dictionary someone else has developed. You couldn't do that with Proximity.

     

    Joel, I did find that PDF fascinating, not only because I had no idea that Hungarian could hyphenate a word like "asszonnyal" as "asz- szony- nyal." Now I know why they call it Hunspell! I have to admit, I had always wondered what the fuss was about ("How hard could it be to hyphenate these non-English languages?" I thought. What's with these stories of native speakers *laughing* at hyphenation? I couldn''t imagine how you could laugh at wrong hyphenation in English. But now I see!)

     

    But:

    Hunspell's hyphenation algorithm is ye olde algorithm from TeX.

    That doesn't seem to be true at all!  That 2006 paper says:

    The hyphenation algorithm of OpenOffice.org 2.0.2 is a generalization of TeX’s hyphenation algorithm that allows automatic non-standard hyphenation by competing standard and non-standard hyphenation patterns.

    It's based on TeX's algorithm, but it's not "ye olde" algorithm. It's got quite a few enhancements and changes.

     

    Honestly I can't tell  to what extent the paper is discussing Hunspell (either the 2006 Hunspell or the 2011 Hunspell), and whether Hunspell is exactly the same as what Openoffice uses. (I realize OpenOffice uses Hunspell dictionaries and algorithms...) Not to rain on your parade, though. The paper is facinating.

     

    TeX remains awesome. Though I wouldn't really have expected TUGBoat would have become a vehicle that was scoped beyond TeX. Fascinating.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 29, 2011 4:52 PM   in reply to jmlevy

    The PDF I linked to will satisfy your curiousity about how the hyphenation algorithms derived from ye olde TeX algos (and not  the algos of some tawdry commercial op like Proximity, there you go, John ) work, but not how well Hunspell dictionary support will improve your French hyphenation in ID.  One of the cool things about Hunspell is that it has a dictionary format that is easy to use, and it was associated with the OpenOffice project. So, many, many people have contributed to these freely available dictionary files. As a result of all of that hive-mind action, these dictionaries are reportedly quite good - often better than commercial offerings in some languages. But that article is just going to tell you how to handle hyphenation in languages like Hungarian or Catalan or whatever that have non-standard hyphenation rules. Fascinating stuff (for people like me, anyways), but not directly applicable to your situation.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 29, 2011 4:57 PM   in reply to Joel Cherney

    some tawdry commercial op

    I hope it's obvious that this is a free-software-nerd in-joke - no offense meant to the folks at Proximity.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Nov 29, 2011 5:07 PM   in reply to Joel Cherney

    8. Nov 29, 2011 7:52 PM (in response to jmlevy)

    9. Nov 29, 2011 7:57 PM (in response to Joel Cherney)

    Wow, they got to you fast, Joel. I'm sure you'd have some way to send us a coded message it you needed us to head over there and bust you out.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Nov 30, 2011 7:02 AM   in reply to jmlevy

    I'm worried about Joel. Do you think they got to him? There's plenty of tawdry free software out there too...

     

    I'm under the impression that you may have to load the French Hunspell dictionary by hand to get comparable results to Proximity. CS5.5 shipped with DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8 from http://www.dicollecte.org/. They are up to 4.3 at present...

     

    I think the instructions are somewhat crazy right now (I can't imagine they aren't planning on improving this). See Miguel Sousa's post at

    http://blogs.adobe.com/typblography/2011/11/how-to-enable-more-languag es-in-indesign-cs5-5.html (Method Two: Hunspell dictionary) but in short (ha, ha!):

     

    1. Download the Hunspell dictionary from OpenOffice.org (etc.; follow links to wherever. Actually, maybe you want libreoffice.org now, I dunno.). Note that the hyphenation dictionary is older and seperate, at least for French.

    2. Rename the .oxt file to .zip, if necessary (doesn't seem required here)

    3. Extract zip file, find the .aff and .dic files, also the hyph*.dic file.

    4. Rename the files to the ISO language/country codes, e.g. fr_FR.aff, fr_FR.dic, hyph_fr_FR.dic.

    5. Put all together in a language_country folder, e.g. fr_FR

    6. Install in:

    •     Win :                      %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin\Di ctionaries
    •     Mac:                      /Library/Application Support/Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin. bundle/Contents/SharedSupport/Dictionaries

    7. Check the Info.plist file and ensure that your language is listed under all three of SpellingService, HyphenationService, and UserDictionaryService, or at least as many as you want.For French, you should be good-to-go. This Info.plist file is at:

    •     Win :                      %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin
    •     Mac:                      /Library/Application Support/ Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin.bundle/C ontents

    8. Restart InDesign

     

    [ I don't know what's going on with French hyphenation, CS5.5 seems to ship with an hyph_fr_FR.dic that's marked Adobe Confidential  Version 1.0.2 October 8 2010, but dicollecte.org has "Version 2.0" from 2008. Not sure where Adobe's version is from? ]

     

    P.S.: I'm really out of my depth on this stuff...

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Nov 30, 2011 9:47 AM   in reply to jmlevy

    Wait, really? The machine I looked at with CS5.5 had DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8. Are you saying you have 4.3 installed? [Hrmm...maybe that machine was running 7.5.0 instead of 7.5.2? Yikes!] Are you sure? It does appear there's confusion about the hyphenation dictionary versioning, for sure, but the regular dictionary, not so much.

     

    I'm sorry for wasting your time!!

     

    p.s.: Meanwhile, Joel has been captured on the high seas by language pirates. I hope they don't send the ransom note in a bottle. That would just take the cake.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points