This is the basic code for any kind of search. It returns one word at a time.
You can add logic to remember the context - what was the previous word, and the word before that, etc.
This is an interesting challenge for you, I hope you aren't asking us to write it for you.
I think converting to use a regex would be a bigger task than solving the problem directly. I'd be inclined to use a state transition table specific to the problem. States might be:
I don't think hyphens will be returned to you as they are punctuation, but your mileage may vary.
You might have to check 2 or more words at a time and step through the words one at time. Also realize that the getPageNthWord returns words in the order that that they have been placed on the page and not the reading order.
Thank you all for your time.
I am looking for the JS code.