5 Replies Latest reply on Oct 2, 2013 6:27 AM by JoaoCP

# How to Grep sequences of duplicated letters?

I have a text where it is frequent the appearance of this sequence (some demons in Word):

perspec SPACE ective (= perspective, etc)

demo SPACE mocracy

insa SPACE sanity

I tried to adapt the grep command for «remove duplicate lines» (without success at all) to fix the text.

• ###### 1. Re: How to Grep sequences of duplicated letters?

Try the pattern below.

Find:

([[:alpha:]]{2}) \1

Change:

\$1

• ###### 2. Re: How to Grep sequences of duplicated letters?

Absolutely perfect.

A lot of keys: a posix, the space that separates the letters, etc.

Really, thanks a lot.

• ###### 3. Re: How to Grep sequences of duplicated letters?

You're welcome. I'm glad I could help.

But I'd like to make a little correction to my previous pattern, to avoid unwanted matches like "sing an anthem", "thinking of offices" and "apples or oranges".

(?<=[[:alpha:]])([[:alpha:]]{2}) \1

This one will require at least one letter before the duplicated pair.

If you want a detailed explanation of the pattern, feel free to ask.

• ###### 4. Re: How to Grep sequences of duplicated letters?

Joao,

Yes, it improves the first formulae!

This grep requires a visual control as the frequency of unusual casual concordances of two identical letters are real although scarce:

plain inside

simple leviatan

Is like seeing a sudoku already filled... seems easy, after.

In the second formulae is very good the +lookbehind that I also missed.

Thank you.

• ###### 5. Re: How to Grep sequences of duplicated letters?

You're definitely right, Camilo: with so much legimate cases of duplicate pairs, this is a scenario where "Replace All" is absolutely forbidden. The conservative "Find->Change/Find" approach is a much safer bet :-)