Highlight other language texts

Report · Feb 16, 2017

HI,

I have document which contains Chinese and English languages. For both we have separate fonts to be used. I am trying to find if any Chinese language texts applied with English language font or not. If anything present i need to apply some specific color.

app.findGrepPreferences = null;  
app.findGrepPreferences.findWhat = '[a-zA-Z0-9]';  
app.findGrepPreferences.appliedFont="ITC Avant Garde Gothic Std";
found = app.documents[0].findGrep();  
for (i=0;i<found.length;i++) {  
    if(found.contents){ 
        myarray.push(found.contents); 
        app.changeGrepPreferences.fillColor  = "Chinese";
        found.changeGrep();
    }
}
alert(myarray)

How I can found the other languages like how i found english character in my find and replace above coding.

Thanks,

K

Report · Feb 16, 2017

Hello Experts,

May i get any suggestions?

Report · Feb 20, 2017

Hello Experts,

Still not fount any results from my end, May i get any suggestions?

Report · Feb 20, 2017

You'll need to put together a list of all possible Chinese characters. You can search for ranges of characters in GREP using their Unicode values. So in the script above, something like this will find the basics:

app.findGrepPreferences.findWhat = '[\\x{4E00}-\\x{9FFF}]';

But to be more thorough, you will have to decide exactly which Chinese characters/special characters/punctuation you're looking for. It looks like a big discussion. See here for example: cjk - What's the complete range for Chinese characters in Unicode? - Stack Overflow

Ariel

Report · Feb 20, 2017

Hi,

Which OS and version of InDesign are you using?

P.

Report · Feb 20, 2017

I am using Indesign CS6 and the OS details shown below..

Report · Feb 20, 2017

I have a plugin that highlights languages in a document. P.M. me, I will send it to you later today.

Actually, could you use the style highlighter script?

Indiscripts :: The Hidden Way to Highlight Styles

P.

Report · Feb 20, 2017

Hi Pickory,

Just sent PM to you. Yes the highlighter script wont help in this regards.

Report · Feb 20, 2017

Hi Ariel,

Thank you for the reply. But the boolket contains both english and the relevant Chinese words. So can't list all those in GREP

Regards,

K

Report · Feb 20, 2017

You don't need to list all the words! You just need to list all the possible characters! Get a list of all possible characters. Type them into the Grep find field. Set the formatting to the English font. Now do a GREP search.

The result will be all Chinese characters that have the English font applied to them.

QED

Report · Feb 20, 2017

HI Taw,

Thanks, but i am afraid if i missed any list of character. I thought if we can do the GREP search except we listed in my original coding!

Like instead his if(found.contents){

can use like if(!found.contents){

Report · Feb 20, 2017

Well, yes, sure, you can do that too. Instead of:

app.findGrepPreferences.findWhat = '[a-zA-Z0-9]'

Just use

app.findGrepPreferences.findWhat = '[^a-zA-Z0-9]'

If you're sure that's what you want.

Report · Feb 20, 2017

Have you seen this thread?

Find text with different appliedLanguage

P.

Report · Feb 20, 2017

TᴀW wrote
… If you're sure that's what you want.

Hi Ariel,

hm…

I'd be not so sure. No, I don't think that's what Kartik wants.

My screenshot below is showing some dummy text where a condition is applied to the found text:

I think the correct way would be to seach for unicode ranges like you already suggested in reply 3 .

And apply something to it. A condition like the one I presented in my screenshot would be great.

And one could chose a color for the condition that also can print or exported with a PDF.

FWIW: There is really no need for applying a character style.

On the contrary: We should spare character styles for other kinds of formatting, because maybe character styles are already in use and it would be destructive to apply a character style for the found Chinese characters.

The used GREP could include ranges for blocks containing Han Ideographs like suggested at Github.

Block	Range	Comment
CJK Unified Ideographs	4E00-9FFF	Common
CJK Unified Ideographs Extension A	3400-4DBF	Rare
CJK Unified Ideographs Extension B	20000-2A6DF	Rare, historic
CJK Unified Ideographs Extension C	2A700–2B73F	Rare, historic
CJK Unified Ideographs Extension D	2B740–2B81F	Uncommon, some in current use
CJK Unified Ideographs Extension E	2B820–2CEAF	Rare, historic
CJK Compatibility Ideographs	F900-FAFF	Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement	2F800-2FA1F	Unifiable variants

But the ranges above are not sufficient as the following little experiment is showing:
I copied some Chinese text—no idea what it is saying—from the net to my InDesign page and ran the following GREP on it:

[\x{4E00}-\x{9FFF}\x{3400}-\x{4DBF}\x{20000}-\x{2A6DF}\x{2A700}-\x{2B73F}\x{2B740}\x{2B73F}\x{2B820}-\x{2CEAF}\x{F900}-\x{FAFF}\x{2F800}-\x{2FA1F}]

What obviously is missing are punctuation characters, brackets and quotation marks.
Plus—at least in this text sample—a simple blank character that maybe should not be there ( second line of the Chinese text ).

Source of the text:

Chinese Voices - Texts

Don't know if I'm on the right track.

Hope, that helps.

Regards,
Uwe

Report · Feb 20, 2017

Hi Uwe,

Thank you for your interest in this thread. I am quite limited access in client file. So applying conditional text is probably not possible in this case.

So that i am searching any simple grep solution.

Thanks again,

K

Report · Feb 21, 2017

Hi Kartik,

applying conditional text is just one option out of some to mark the found text visually.

If you are sure, that there are no character styles applied to the text in the document, then go ahead with assigning a character style to the found text.

Since I have no access to your document(s) I'd say there is no simple GREP solution.

Here an example where I expanded the range of characters a bit, but still it would not cover all necessary ones with my little example:

We could still use some lookarounds to catch the missing ones.

Not so easy…

Regards,
Uwe

Report · Feb 21, 2017

Hi Uwe,

Yes i have some character styles applied within the document. Also my idea is not getting the grep list of all Chinese characters, because i may miss some thing anyhow. So better am checking whether the character is english or not.. if no then i marked with swatches.

Thanks,

K

Report · Feb 21, 2017

As you can see from my example, the pasted Chinese text is using some typical characters that are ALSO used with English text.
Among them a pair of brackets ( 0028 and 0029 ) and a "stray" blank ( 0020 ). So it's not "English or not" whatever English means. E.g. German would share the same characters with English. But not the other way around.

Without seeing your document I am running out of suggestions.

Regards,
Uwe

Report · Feb 21, 2017

Hi Uwe,

Thank you for your suggestions It is really helpful. Yes the punctuation as well as a problem. Using Grep will reduce the manual work little, but need to look the other characters like punctuation separately.

Feeling bad because not able to provide the client document to you. Sorry about that.

Regards,
K

Report · Feb 21, 2017

( No need to feel sorry… )

Perhaps there are also other problems ahead:

One of the toughest things could be to find out if some text that is meant for English is falsely typed with FULLWIDTH characters somewhere in the range FF10 to FF5A. Then you would need to map FULLWIDTH characters to "normal" characters, if you like to apply "ITC Avant Garde Gothic Std" that would not contain FULLWIDTH characters.

Same for FULLWIDTH digits perhaps.

E.g a FULLWIDTH DIGIT ZERO ( FF10 ) could be mapped to DIGIT ZERO ( 0030 ).
But that will depend on the individual case of course.

Regards,
Uwe

Report · Feb 20, 2017

Hi TaW,

Yes for now i just need to exclude something in my GREP search. For now it is great and helpful to go ahead of my next step.

Thanks,

K

Adobe Community

Highlight other language texts

1 Correct answer