Skip navigation
Currently Being Moderated

Change mutiple finds - Erase all Diaritics

Apr 22, 2012 4:49 AM

Hi all,

 

I made a script to remove a range of diacritics from selected text (The Squiggly bits at the top and bottom of letters) which works but I thought could be made more efficient by using findText().

My question is: Can one search for a range of unicodes (or for that matter a list of words like mom, mum, mommy, mummy, mam etc.) so that they can be deleted or changed to the same thing, (in the case of the word list to mother). without have to loop through every character in the selection?

 

my script is:


#target "InDesign"
app.doScript("main()", ScriptLanguage.javascript, undefined, UndoModes.FAST_ENTIRE_SCRIPT, "Remove Vowels");
 
function main()
{
 var cc, t, w, x, d, q;
cc=0
t = app.selection[0];
w = new Array;
x = new Array;
for(d=0; d<t.characters.length-1; d++){
    w[d]=t.characters[d];
try{  
   
  myCharacter= w[d];
    myChar=myCharacter.contents;
    unicode=myChar.charCodeAt (0);
 // Unicode range to remove
 if  (((unicode > (0x0590) && unicode  < (0x05BE))||
        (unicode >  (0x05C0) && unicode  < (0x05C3))|| 
        (unicode >  (0x05C3) && unicode  < (0x05C6)))||
        unicode == (0x05BF)||
        unicode == (0x05C7))
        {x[cc]=d; cc++}
    else 
    
}
catch (noUnicode) {};
}    
 
q=cc-1;
while (q>-1){
    try {
w[x[q]].remove();        
        }
    catch (error) {};
    q--;
    }}

 

I would also like to know if one can change a unicode range or word list using the regular indesign find / change interface?

 

Thanks in advance.

 

Trevor

 
Replies
  • Currently Being Moderated
    Apr 22, 2012 11:10 AM   in reply to Trevorׅ

    Find unicode ranges:

     

    [\x{0590}-\x{05BE}]  (find range 0590-05BE)

    [\x{0590}-\x{05BE}\x{05C0}-\x{05C6}]  (find ranges 0590-05BE and 05C0-05C6)

     

    Replace items from a list with a single item:

     

    Find what: \b(mom|mum|mommy|mummy|mam)\b

    Replace with mother

     

    You need to do this in the GREP tab.

     

    Peter

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 22, 2012 3:21 PM   in reply to Trevorׅ

    Trevor,

     

    The <0000> format is replaced with the corresponding character in the Find what field, which often makes it barely readable. The \x{0000} format is not replaced, and I find that easier. As to that book, you guess right!

     

    Peter

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 3:25 AM   in reply to Trevorׅ

    If you write out GREP expressions in Javascript to use with findGrep/changeGrep, you must take into account that backslashes inside a Javascript string needs escaping. Therefore you need to double each of them:

     

    \\x{0591}

     

    (etc.)

     

    The "exceptions" -- there are always some -- are \r, \t, and \n, but in fact those aren't as special as they seem. They get translated into literal character codes for Carriage Return, Tab, and Line Feed, and as it happens, those can be fed as well into the findWhat string, even though you cannot type them in the interface (after inserting them with your script, sometimes you can see the GREP find field struggle with trying to display the string).

     

    You could try if the special Unicode GREP group "\p{Mn}" finds all of the non-spacing markers you want to get rid of -- I think this class of commands is mentioned in Peter's book as well.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 4:34 AM   in reply to [Jongware]

    Ah, yes, the unicode properties \p{ }. They're quite useful. Two of my favourites are \p{Zs} 'all spaces except tab and return' and \{Pd} 'all hyphens and dashes'. And yes, all 37 of them described in the book.

     

    Peter

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 6:53 AM   in reply to Trevorׅ

    If it's brevity you're after:

     

    app.findGrepPreferences = app.changeGrepPreferences = null;
    //Unicode Range
    app.findGrepPreferences.findWhat = "\\p{Mn}";
    app.selection[0].changeGrep();
    app.findGrepPreferences = app.changeGrepPreferences = null;
    

     

    Peter

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 8:35 AM   in reply to Peter Kahrel

    Peter Kahrel wrote:

     

    Ah, yes, the unicode properties \p{ }. They're quite useful. Two of my favourites are \p{Zs} 'all spaces except tab and return' and \{Pd} 'all hyphens and dashes'. And yes, all 37 of them described in the book.

     

    Wow. How have I gone this long without knowing about these? Guess I should have read your book. Here's another resource.

     

    Jeff

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 23, 2012 8:47 AM   in reply to absqua

    It's never too late, Jeff ! That source you mention is indeed very good. It's where I first learnt grep, back in CS2 days. It's not InDesign-specific though, so not everything discussed there applies to InDesign. Good site nevertheless. Those codes are illustrated with an InDesign document here: http://www.kahrel.plus.com/indesign/grep_mapper.html

     

    Peter

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points