Skip navigation
Currently Being Moderated

How to grep connectors in a group, between two required words?

Mar 7, 2013 2:52 AM

Must catch some connectors between two words in a very extensive document.

 

1     Joanna de Smith        

2     Felicitas de la Tour

3     Perpetua y Beatrice Kennedy

 

but the solution is very poor resolved in three steps and it seems should be just only one string.

 

1     <\u\l{2,}de ( \u\l{2,})+\>

2     <\u\l{2,} de la ( \u\l{2,})+\>

3     <\u\l{2,}y ( \u\l{2,})+\>

 
Replies
  • Currently Being Moderated
    Mar 7, 2013 5:42 AM   in reply to camilo umaña

    (?<=\l )(de)( la)?|y(?= \u)

     

    seems to work on your example text.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 7, 2013 11:50 AM   in reply to camilo umaña

    Try this one:

     

    (( ?\u\l+)+(( de)( la)?| y)( \u\l+)+)


    It seemed to work when I tested it on your sample.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 7, 2013 12:41 PM   in reply to camilo umaña

    And thanks to Peter Spier as well. I used large parts of the code he wrote to work out this one.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 7, 2013 12:41 PM   in reply to camilo umaña

    > beautiful! i have to study the use of aditional (()).

     

    The most important "extra" parentheses here are the ones that Peter S. inadvertently left out:

     

    ( de( la)?| y)

     

    The vertical pipe | is the OR operator, and if you do NOT use it inside a parenthesized expression, it splits the *entire* expression in two halves. So Peter's matched either "Juan de la" OR "y Cetera". The extra set of parentheses around this -- and only this -- part of the expression limits the OR to just this part.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 7, 2013 3:05 PM   in reply to [Jongware]

    [Jongware] wrote:

     

    The most important "extra" parentheses here are the ones that Peter S. inadvertently left out:

     

    ( de( la)?| y)

    Wasn't inadvertant at all. It was out of ignorance.

     

    Where were you this morning?

     

    I thought I was rather clever, though, figuring out the conditonal "la" and the "or" part, and combining it with the lookarounds (because I misunderstood the intent -- thought all he wanted was the connectors).

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 8, 2013 5:18 AM   in reply to camilo umaña

    Maybe you should tell us again just exactly waht you are trying to do (find). You want to find all proper names, whether they have a connector or not?

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 8, 2013 6:33 AM   in reply to camilo umaña

    Make the entire connection part *optional* by following it with a ? like this:

     

    (( ?\u\l+)+(( de)( la)?| y)?( \u\l+)+)

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points