Did "the person on the other end of the export" explain why you should ASCII? Try both encodings and see what the difference is.
1 person found this helpful
Try this: type some random text in a text frame and run your ASCII script to export it. You will get the text. Now insert any 'special' character -- any at all. Accented characters, curly quotes, en dash, em space, Hebrew, Greek, a footnote, an Insert Page Number Here. Now you no longer can export to "ascii" because otherwise the file would contain characters that are not available in the ASCII format.
... the person on the other end of the export says I need to convert UTF-8 to ASCII encoding.
Inside the InDesign UI you can search for non-ASCII character with GREP: look for
Hi -- Thanks for the explanation.
My first thought is I could build a translation resource doc for most of the non-ASCII characters, but I have run into a more significant issue: If the story has notes, I cannot save it as ASCII (zero K). I tried hiding the notes, but that has no affect. Removing the notes works, but obviously I do not want to do that. Can you think of any way around this?
As an alternative, can anyone suggest a command line tool for converting a utf-8 to ANSII. One that would turn any non ANSII characters into ? would be acceptable. I tried iconv, but it fails to convert the files. I know this question may not be appropriate for an InDesign forum, but I suspect many of the developers here use tools like this to overcome these kind of problems.
There are some utf-8 characters they can't handle.
ANSI =/= ASCII
Is your client really okay with you sending "processed" text? Depending on what goes in, the output may or may not resemble anything coherent anymore. Imagine a phrase in Greek.
>There are some utf-8 characters they cannot handle.
Wot nonsense. "Some"!? It's All or Nothing, I'd say. If they send you a list of those they can, or cannot, you could make a better translation.
>.. notes ..
Wait, you have notes -- as in "footnotes"? That concept simply does not exist in ASCII, ANSI, or for that matter, in UTF-8.
Sorry, typo on my part -- I meant ASCII not ANSI
What I mean by notes are the hidden text you can put in stories. They cause my ASCII exports to show up as zerok. If I export these stories as utf-8 they show up as a white square in notepad. If you look at them in a hex editor they read ef bb bf.
You mean, in UTF-8 they look like ef bf bb, right? That is InDesign's Placeholder marker (see also http://www.fileformat.info/info/unicode/char/fffd/index.htm). It's useless to include this in your export because (a) InDesign uses this code for a lot of different functions, and (b) there is nothing "associated" with it after exporting. You can safely remove them from your string before you output it as text, you don't have to remove them from your InDesign document.
Oh wait, that must be the UTF-8 code your client was having difficulties with. Try your very first UTF-8 export again, but this time remove all of these codes before writing your string to the file. It's simple, add this line before writing:
myString = myString.replace (/\uFFFD/g, '');
Well I say "simple" but when I learned this particular trick it was a moment of "why didn't anyone told me this five years ago!?".
Thanks ... but I inserted that line and I am still getting that code. Could I be missing something?
My bad! I knew U+FFFD could occasionally pop up in text copied straight out of InDesign, so I checked its UTF8 encoding and that is "EF BF BB". That's why I thought you got it wrong, and advised to remove U+FFFD.
But ... you got it right after all. The one you said it was, "EF BB BF", also translates to yet another code that ID uses as 'placeholder': U+FEFF, and since that's also a special code in Unicode your client is having problems with it.
Change the replacement line to this one (this time around, I made sure to run your script and check the result before posting ...).
myString = myString.replace (/\uFEFF/g, "");
Thanks ... it worked great.
I really appreciate the help.