Just purchased a large medical dictionary with ~400K words and InDesign is having some problems importing it into a User Dictionary. Oddly, it appears to import find but completely crashes when trying to write the dictionary to disk (the import progress bar is complete and I can see the .udc file size increase).
The file I received seems to have some special characters and diacritics (¡rnadÛttir, ≈kerlund, Ângstrom, ÈtagËres') but I know that I can added these to user dictionaries manually when they show up in a text file. Unfortunately, I can't share this file with anyone due to the fact that it was purchased and we signed a contract to for only our use.
So, the question is, what are the limits/specifications on word lists imported into user dictionaries? Size, character encoding, things to avoid, line break types, etc. The user manual is a bit silent on the specifics, I was unable to find anything conclusive using google and a phone call to support told me that an application crash/exception was "not a bug".
- OSX 10.6.8
- InDesign CS5.5
My word list text file:
- encoding: Non-ISO extended-ASCII text, with CRLF line terminators
- size: 4.8M
- words: 394705 (one on each line)
Things I've tried but errors (see below):
- Importing the full file
- Splitting files into smaller chunks (200K words, 100K words, 50K words) all seem to error (see below)
- I can import a 10K words per file but at that rate I'd have 40 UDCs. That doesn't seem like a good idea.
- Converting to UTF8 encoding (using iconv)
- This fails for all size files, even 10K words.
- Converting to the same encoding used by InDesign when exporting user dictionaries to a text file (ISO-8859 text, with CR line terminators). However, I was unable to get OSX to do this conversion (again using iconv). It seems that Non-ISO extended-ASCII should be the same as ISO-8859 (Latin-1) text. See http://en.wikipedia.org/wiki/Extended_ASCII
Things I haven't tried:
- Removing special characters, but I know that the user dictionaries can support UTF/special characters. Besides it'll limit the dictionary, so this in't ideal. Also, importing a very small file (< ~1000 words) with the special characters, it's handled them just fine.
Process: Adobe InDesign CS5.5 
Path: /Applications/Adobe InDesign CS5.5/Adobe InDesign CS5.5.app/Contents/MacOS/Adobe InDesign CS5.5
Version: 126.96.36.1993 (7530)
Code Type: X86 (Native)
Parent Process: launchd 
Date/Time: 2012-05-29 09:44:50.791 -0400
OS Version: Mac OS X 10.6.8 (10K549)
Report Version: 6
Interval Since Last Report: 59126 sec
Crashes Since Last Report: 1
Per-App Interval Since Last Report: 59061 sec
Per-App Crashes Since Last Report: 1
Anonymous UUID: 723C81B5-3F1C-44B5-B6F3-A81AF6A67834
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Thread 0 Crashed: Dispatch queue: com.apple.main-thread
0 ??? 0xa08726f0 _XHNDL_trapback_instruction + 0
1 ...inguistic.LinguisticManager 0x210c8a01 prox_cladd + 200
Thread 1: Dispatch queue: com.apple.libdispatch-manager
0 libSystem.B.dylib 0x90428382 kevent + 10
1 libSystem.B.dylib 0x90428a9c _dispatch_mgr_invoke + 215
2 libSystem.B.dylib 0x90427f59 _dispatch_queue_invoke + 163
3 libSystem.B.dylib 0x90427cfe _dispatch_worker_thread2 + 240
4 libSystem.B.dylib 0x90427781 _pthread_wqthread + 390
5 libSystem.B.dylib 0x904275c6 start_wqthread + 30