1 Reply Latest reply on Jun 1, 2010 1:01 AM by Peter Kahrel

    a regular expression question

    jackhenrie Level 1

      I have a large body of text which I am breaking into individual words (as part of an experimental indexing project.)

      I can break the text into a list by making a paragraph break for every word space (a simple find and replace).

      But I want proper names to remain unbroken.

       

      So I am trying to write a regular expression script which will find every occurence of two contiguous words which each begin in a capital letter, and then to replace the space between the two words with an underscore.

      So Sigmund Freud becomes Sigmund_Freud.

       

      Does anyone know how I would write this script?

       

      Thanks!!!

        • 1. Re: a regular expression question
          Peter Kahrel Adobe Community Professional & MVP

          You don't need a script, you can do it in the interface:

           

          Find: (\u[-\w]+)\x{20}(?=\u[-\w]+)

          Change: $1_

           

          \u[-\w]+ stands for "upper-case letter followed by one or more of hyphen/word character"; here the first name.

          \x{20} stands for the space.

          followed by another \u[-\w]+, the last name. This one is in a lookahead, so the whole expression paraphrases as "find a word that starts with an upper-case letter followed by a space if it's followed by another word starting with an uc letter".

           

          Peter