Copy link to clipboard
Copied
Hi, there.
We have a text document in which we want to find all (Biblical) names.
Ideas that we had were:
1. Search for all words that begin with a capitol letter.
PROBLEM: It will also capture the beginning of all sentences [etc.]. Very tedious to weed out all the extras.
2. Search for all words that are not in the dictionary.
QUESTION: Is that possible?
Any help would be appreciated.
SK
You can probably find online list of both anglicized and Hebrew transliterations of Names the will be a good start.
You also will have to sort out Adams from Adam etc.
Copy link to clipboard
Copied
I'd say this is pretty hard to do. If you wanted to find them in indesign, probably your best bet would be to include a dictionary of accepted names which you could presumably find somewhere and then pair them regex(es) to find what you wanted in the document. This is hard and probably more trouble than its worth (depending on how critical this is). A better option might be to export the text and find some sort of third party ai tool that can do this for you. I highly doubt you want to write your own artificial intelligence thing.
Copy link to clipboard
Copied
Can I script a grep/text search for all "misspelled" words?
Copy link to clipboard
Copied
You can check out www.mindsteam.com/products/mindspellpro/index.html I don't know if he's got for later versions of InDesign but the plugin allow for scripted access of spelling errors.
I think that as Obi wrote it might be a good idea to export a list to an external (file) for manual editing.
Note
not every Capitalized spell error is going to be a Name.
not every name going to be a spelling error. Adam?
not every first word of the sentence not going to be a name.
You might want to make separate lists
Copy link to clipboard
Copied
You can probably find online list of both anglicized and Hebrew transliterations of Names the will be a good start.
You also will have to sort out Adams from Adam etc.
Copy link to clipboard
Copied
Many IndexMatic users have had to deal with proper names searching and/or indexing—see for example here, Indiscripts :: IndexMatic 2 | Frequently Asked Questions [UPDATE] and my suggested approach (in French, sorry :-/) here: Indiscripts :: IndexMatic | Stratégie d'indexation des noms propres
In all cases the key rule is to use and gradually refine a dedicated word list (which in iX becomes a “query list”) and then to run it as a regular expression across the document. Of course I don't pretend IndexMatic is what you need to achieve your task—Peter Kahrel, Id-Extras and many of my colleagues here have developed great scripts and utilities that might do the job as well. My point is just that you likely have to use, or implement, a finely regex-based script.
@+
Marc
Copy link to clipboard
Copied
Hi,
About proper names, I'ld extract all of them in a new file, sort them (1 minute to do it) and read the list, deleting what we don't want (1 minute more because I read very quickly! =D It's a joke!).
After that, what do you want to do with this list?
(^/)