I'm told by more than a few language pros that the hyphenation for many European languages is superior in Hunspell.
For English, though, I think I can tell you why they are different. If you have an English word to be hyphenated, and that word does not appear in the hyphenation dictionary, then both Proximity and Hunspell will try to hyphenate algorithmically. That is, it'll guess "Oh hey there's two vowels folllowed by a consonant here, so the most likely correct hyphenation would look like so." The Proximity hyphenation algorithm is a commercial product. Hunspell's hyphenation algorithm is ye olde algorithm from TEX. And, if you are the kind of person who wants to understand what's going on behind the scenes, the pdf in that second link will be completely fascinating. (Otherwise it'll be a great insomnia treatment. For my own part, I'm riveted, but your mileage may vary.)
Like Joel, I've also heard that Hunspell is vastly superior for non-English languages. The other huge benefit is that you can just add a language dictionary someone else has developed. You couldn't do that with Proximity.
Joel, I did find that PDF fascinating, not only because I had no idea that Hungarian could hyphenate a word like "asszonnyal" as "asz- szony- nyal." Now I know why they call it Hunspell! I have to admit, I had always wondered what the fuss was about ("How hard could it be to hyphenate these non-English languages?" I thought. What's with these stories of native speakers *laughing* at hyphenation? I couldn''t imagine how you could laugh at wrong hyphenation in English. But now I see!)
Hunspell's hyphenation algorithm is ye olde algorithm from TeX.
That doesn't seem to be true at all! That 2006 paper says:
The hyphenation algorithm of OpenOffice.org 2.0.2 is a generalization of TeX’s hyphenation algorithm that allows automatic non-standard hyphenation by competing standard and non-standard hyphenation patterns.
It's based on TeX's algorithm, but it's not "ye olde" algorithm. It's got quite a few enhancements and changes.
Honestly I can't tell to what extent the paper is discussing Hunspell (either the 2006 Hunspell or the 2011 Hunspell), and whether Hunspell is exactly the same as what Openoffice uses. (I realize OpenOffice uses Hunspell dictionaries and algorithms...) Not to rain on your parade, though. The paper is facinating.
TeX remains awesome. Though I wouldn't really have expected TUGBoat would have become a vehicle that was scoped beyond TeX. Fascinating.
The PDF I linked to will satisfy your curiousity about how the hyphenation algorithms derived from ye olde TeX algos (and not the algos of some tawdry commercial op like Proximity, there you go, John ) work, but not how well Hunspell dictionary support will improve your French hyphenation in ID. One of the cool things about Hunspell is that it has a dictionary format that is easy to use, and it was associated with the OpenOffice project. So, many, many people have contributed to these freely available dictionary files. As a result of all of that hive-mind action, these dictionaries are reportedly quite good - often better than commercial offerings in some languages. But that article is just going to tell you how to handle hyphenation in languages like Hungarian or Catalan or whatever that have non-standard hyphenation rules. Fascinating stuff (for people like me, anyways), but not directly applicable to your situation.
Well, I read this very instructing pdf. As you wrote, Joel, it does not give answers about French hyphenation . I made some tests with the same texts in two different files (one with Hunspell and the other with Proximity) and there are not so many differences, and i get better results with Proximity.
Thanks for your contributions!
I'm worried about Joel. Do you think they got to him? There's plenty of tawdry free software out there too...
I'm under the impression that you may have to load the French Hunspell dictionary by hand to get comparable results to Proximity. CS5.5 shipped with DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8 from http://www.dicollecte.org/. They are up to 4.3 at present...
I think the instructions are somewhat crazy right now (I can't imagine they aren't planning on improving this). See Miguel Sousa's post at
http://blogs.adobe.com/typblography/2011/11/how-to-enable-more-languages-in-indesign-cs5-5 .html (Method Two: Hunspell dictionary) but in short (ha, ha!):
1. Download the Hunspell dictionary from OpenOffice.org (etc.; follow links to wherever. Actually, maybe you want libreoffice.org now, I dunno.). Note that the hyphenation dictionary is older and seperate, at least for French.
2. Rename the .oxt file to .zip, if necessary (doesn't seem required here)
3. Extract zip file, find the .aff and .dic files, also the hyph*.dic file.
4. Rename the files to the ISO language/country codes, e.g. fr_FR.aff, fr_FR.dic, hyph_fr_FR.dic.
5. Put all together in a language_country folder, e.g. fr_FR
6. Install in:
• Win : %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin\Dictionaries
• Mac: /Library/Application Support/Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin.bundle/Contents/Shar edSupport/Dictionaries
7. Check the Info.plist file and ensure that your language is listed under all three of SpellingService, HyphenationService, and UserDictionaryService, or at least as many as you want.For French, you should be good-to-go. This Info.plist file is at:
• Win : %ProgramFiles%\Common Files\Adobe\Linguistics\5.5\Providers\Plugins2\AdobeHunspellPlugin
• Mac: /Library/Application Support/ Adobe/Linguistics/5.5/Providers/Plugins2/AdobeHunspellPlugin.bundle/Contents
8. Restart InDesign
[ I don't know what's going on with French hyphenation, CS5.5 seems to ship with an hyph_fr_FR.dic that's marked Adobe Confidential Version 1.0.2 October 8 2010, but dicollecte.org has "Version 2.0" from 2008. Not sure where Adobe's version is from? ]
P.S.: I'm really out of my depth on this stuff...
Wait, really? The machine I looked at with CS5.5 had DICTIONNAIRE ORTHOGRAPHIQUE FRANÇAIS «Moderne» version 3.8. Are you saying you have 4.3 installed? [Hrmm...maybe that machine was running 7.5.0 instead of 7.5.2? Yikes!] Are you sure? It does appear there's confusion about the hyphenation dictionary versioning, for sure, but the regular dictionary, not so much.
I'm sorry for wasting your time!!
p.s.: Meanwhile, Joel has been captured on the high seas by language pirates. I hope they don't send the ransom note in a bottle. That would just take the cake.
I'll check at work tomorrow (it's almost 9PM) which version is installed. And I did not waste any time, don't worry.
Last check: you were right about the spellcheck dictionary, we have the 3.8 version installed and the last one available is 4.3. We do not use any spellcheck tools in InDesign, since we have a third-party spellchecker in our workflow, so my main concern is about the hyphenation dictionary and the last available version on http://www.dicollecte.org/ is version 2 which seems the same as the one shipped with 5.5.
Message was edited at 10:15 am by: jmlevy