ID reads the data file as plain text. Normally a tagged file like this would go into placeholders with the correct paragraph styles defined and you'd simply use Find/Change to remove the tags. You can also use F/C to apply character styles based on tags for text which does not match the paragraph style.
You're absolutely right. If it were only one little card, or a dozen, or even a hundred, we could do this. But it's thousands of cards we have to print. If I follow you correctly, I'd have to split the contents of the cells in that column that has the html in it and sort the snippets of text out according to their formatting, adding more columns, and each new column should fit inside a neat little box in the InDesign file, where it gets its formatting. I think I can do the split with sql and then turn it into csv. But then, the difficulty is that the cards we need to print don't have a recurring design pattern inside that html cell, hence the need to preserve the html. Did you check out the stackoverflow link? Is it possible to "teach" ID to "read" html formatting coming from DataMerge?
You can also use F/C to apply character styles based on tags for text which does not match the paragraph style. I don't get this. When you say "tags", what are you referring to? What paragraph style do you mean?
And now we have a new problem. Yesterday, ID would accept my csvs as legit data sources. Now, nothing I do can make it accept new csvs as data sources. I get:
The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again.
What is the right setup for a csv file so that ID will want to perform Data Merge with it?
By tags I mean your HTML tags.
And ALL text has some paragrah style assigned that describes the basic formatting for the text in that paragragh. Character styles are used to format "special" case text that does not adher to this format. You could also use these tags in conjuction with Find/Cahnge to split a single paragraph into multiple paragraphs after the merge and assign a differnt paragraph style based on the tag.
Do I do the Find-Change operation for every card, or is there a way to automate this?
How can I "carry over" a special character such as µ all the way from Excel to ID, passing through csv without messing it up?
amruffatti, Find/Change can work on any scope you wish. If all text appears in a single thread long story, you can use the scope "Story". If each of your cards reside in an unlinked text frame, you can use Document. (And if you have several separate documents and you want to change them all, open all of them and use "All Documents" -- a feature I have used many times, and usually to my advantage too).
Getting "special characters" out of Excel may need some tinkering. Can Excel save your CSV as UTf-8 encoded text? That ought to work.
Thanks Jongware, the text comes from Data Merge.
. Can Excel save your CSV as UTf-8 encoded text? That ought to work.
Recent versions of Excel will let you save UTF-8 or UTF-16, but it won't tell you which. Saving as filetype "Unicode Text" is tab-delimited UTF-16.
Well, actually it's tab-delimited little-endian UCS-2 according to Notepad++, but due to the fact that one encoding was intended to replace the other, you'd tell InDesign at data source selection time that what you had was actually tab-delimited UTF-16.
Yes, I suppose you can export a variety of formats out of Excel. InDesign, however, is a picky eater, and the minute you include anything weird inside the document, like the μ character, it won't swallow it, no matter what the encoding or format or character set or whatever. It says The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again. I'd like to know what's the "official" instruction as to how to carry over these special characters.
What we're doing is correcting them once inside the ID file.
InDesign, however, is a picky eater, and the minute you include anything weird inside the document, like the μ character, it won't swallow it, no matter what the encoding or format or character set or whatever. It says The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again. I'd like to know what's the "official" instruction as to how to carry over these special characters.
It ain't so.
I just ran this merge:
and here's the first page:
Note that my data source is "Book1.txt" because I saved "Unicode Text" out of Excel, because I knew that the mu character would be a problem if I saved out CSV, which in Excel's world means "ANSI encoding." I then checked the "Import Options" box when selecting the data source so I could tell ID that it was Unicode-encoded with tab delimiters.
DId you maybe keep the file open in Excel when trying to do the merge? That's what your error sounds like to me.
So the key is to avoid .csv altogether. It should be Excel > Unicode Text in a txt file with tab delimiters > InDesign with Import Options as unicode with tab delimiters. I handed it over to someone else... I'll let you know how it goes. Thank you.
The original problem had 2 parts: the html that does the formatting, like <i></i>, <b></b>, and so on, and the html for special characters such as µ ≤ ≥ > γ <. As Joel Cherney said, these special characters should be able to pass through if you export the Excel like he said. That works as long as the special characters exist in their original form in the Excel file, that is, if you can see them as what they are and not as html code. If they exist in html code to start with, as in our particular problem, they get carried through as html code all the way to ID. Joel Cherney's operation will not interpret or render the characters, but will only keep the original form intact.
How we solved it was to bring all that code into InDesign and Find&Replace every single instance of each one of these things: μ or µ or whatever (about 100 different ones in total) and apply the visual equivalent: µ, etc. Not fun at all. And we are going to have to do this for more batches of cards.
I would have wanted to import this html code into ID and have ID render it, as formatting and as special characters, as it prints the cards. Am I just too ambitious?
Am I just too ambitious?
Mmmaybe. My response was simply about non-ASCII data merges, which is something I do all the time. Another thing I do all the time is manage multilingual content for multiple media/channels/what have you. It sounds like you're trying to do to just that, barring the "multilingual" part.
Unfortunately, you have a really generic content-versus-presentation-format problem, and you already have a bucketload of HTML entities in a place where I'd argue they don't belong. If I were managing your content, I would either walk away from the job forever, or somebody would get fired. I mean, I guess that housing your content in Excel is better than just having a Big Boss say "Can't you just suck the content off of the Web site?" but not by much. Content ought to be housed far, far away from its final presentation formats, to keep your designers/Web devs/DTP wonks/etc. from doing acrobatics trying to transform one presentation format into another.
However, since you're already there and doing the acrobatics - I'd look into a script or plugin to do the heavy lifting for you. There's FindChangeByList, and Mutli Find Change off the top of my head. Ickull looks like a very useful tool, but a classic case of overkill for your intended purpose, if it is really limited to "get this Excel content into ID, and transform all of the HTML entities into true Unicode glyphs."
I forgot to write an entire paragraph about transforming your HTML markup into InDesign styles - that is another spot where some scripting might be useful. Check out this article in which Anne-Marie Concepcion reviews Peter Kahrel's additions to the estimable [Jongware]'s preptext.js. (Phew! It's a Cavalcade of InDesign Celebrities!)