13 Replies Latest reply on Apr 25, 2014 11:08 AM by Joel Cherney

    Using html in Data Merge

    amruffatti

      We need to print several thousand cards using data merge with data coming from an Excel file. The data in that Excel file contains formatting in html: <i></i>, <b></b>,<sub></sub>, &#956;, etc. which we would like to carry over to ID and use so that we don't have to redo the formatting and special characters in ID. I've been searching forums for a couple of days now and found this:

      http://stackoverflow.com/questions/1508807/getting-website-data-into-adobe-indesign

      So it seems like it's possible. We weren't able to find the sdk they're talking about, but came to this:

      https://code.google.com/p/ickmull/

      And got stuck in the part where we had to do this:

      xsltproc --output myNewFile.icml --novalid tkbr2icml-v04x.xsl myWebPage.html

       

      So can InDesign "read" html or no? Is there any other way to do this?

        • 1. Re: Using html in Data Merge
          Peter Spier Most Valuable Participant (Moderator)

          ID reads the data file as plain text. Normally a tagged file like this would go into placeholders with the correct paragraph styles defined and you'd simply use Find/Change to remove the tags. You can also use F/C to apply character styles based on tags for text which does not match the paragraph style.

          • 2. Re: Using html in Data Merge
            amruffatti Level 1

            You're absolutely right. If it were only one little card, or a dozen, or even a hundred, we could do this. But it's thousands of cards we have to print. If I follow you correctly, I'd have to split the contents of the cells in that column that has the html in it and sort the snippets of text out according to their formatting, adding more columns, and each new column should fit inside a neat little box in the InDesign file, where it gets its formatting. I think I can do the split with sql and then turn it into csv. But then, the difficulty is that the cards we need to print don't have a recurring design pattern inside that html cell, hence the need to preserve the html. Did you check out the stackoverflow link? Is it possible to "teach" ID to "read" html formatting coming from DataMerge?

             

            You can also use F/C to apply character styles based on tags for text which does not match the paragraph style. I don't get this. When you say "tags", what are you referring to? What paragraph style do you mean?

             

            And now we have a new problem. Yesterday, ID would accept my csvs as legit data sources. Now, nothing I do can make it accept new csvs as data sources. I get:

            The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again.

            What is the right setup for a csv file so that ID will want to perform Data Merge with it?

            • 3. Re: Using html in Data Merge
              Peter Spier Most Valuable Participant (Moderator)

              By tags I mean your HTML tags.

               

              And ALL text has some paragrah style assigned that describes the basic formatting for the text in that paragragh. Character styles are used to format "special" case text that does not adher to this format. You could also use these tags in conjuction with Find/Cahnge to split a single paragraph into multiple paragraphs after the merge and assign a differnt paragraph style based on the tag.

              • 4. Re: Using html in Data Merge
                amruffatti Level 1

                Do I do the Find-Change operation for every card, or is there a way to automate this?

                 

                How can I "carry over" a special character such as µ all the way from Excel to ID, passing through csv without messing it up?

                • 5. Re: Using html in Data Merge
                  [Jongware] Most Valuable Participant

                  amruffatti, Find/Change can work on any scope you wish. If all text appears in a single thread long story, you can use the scope "Story". If each of your cards reside in an unlinked text frame, you can use Document. (And if you have several separate documents and you want to change them all, open all of them and use "All Documents" -- a feature I have used many times, and usually to my advantage too).

                   

                  Getting "special characters" out of Excel may need some tinkering. Can Excel save your CSV as UTf-8 encoded text? That ought to work.

                  • 6. Re: Using html in Data Merge
                    amruffatti Level 1

                    Thanks Jongware, the text comes from Data Merge.

                    • 7. Re: Using html in Data Merge
                      Joel Cherney Adobe Community Professional & MVP

                      . Can Excel save your CSV as UTf-8 encoded text? That ought to work.

                       

                      Recent versions of Excel will let you save UTF-8 or UTF-16, but it won't tell you which. Saving as filetype "Unicode Text" is tab-delimited UTF-16.

                       

                      Well, actually it's tab-delimited little-endian UCS-2 according to Notepad++, but due to the fact that one encoding was intended to replace the other, you'd tell InDesign at data source selection time that what you had was actually tab-delimited UTF-16.

                      • 8. Re: Using html in Data Merge
                        amruffatti Level 1

                        Yes, I suppose you can export a variety of formats out of Excel. InDesign, however, is a picky eater, and the minute you include anything weird inside the document, like the μ character, it won't swallow it, no matter what the encoding or format or character set or whatever. It says The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again. I'd like to know what's the "official" instruction as to how to carry over these special characters.

                         

                        What we're doing is correcting them once inside the ID file.

                        • 9. Re: Using html in Data Merge
                          Joel Cherney Adobe Community Professional & MVP

                          InDesign, however, is a picky eater, and the minute you include anything weird inside the document, like the μ character, it won't swallow it, no matter what the encoding or format or character set or whatever. It says The data source cannot be opened. Confirm that the file exists and that you have rights to open it, then choose the Select Data Source command again. I'd like to know what's the "official" instruction as to how to carry over these special characters.

                           

                          It ain't so.

                           

                          I just ran this merge:

                           

                          Untitled.png

                           

                          and here's the first page:

                           

                          Untitled 2.png

                           

                          Note that my data source is "Book1.txt" because I saved "Unicode Text" out of Excel, because I knew that the mu character would be a problem if I saved out CSV, which in Excel's world means "ANSI encoding." I then checked the "Import Options" box when selecting the data source so I could tell ID that it was Unicode-encoded with tab delimiters.

                           

                          DId you maybe keep the file open in Excel when trying to do the merge? That's what your error sounds like to me.

                          • 10. Re: Using html in Data Merge
                            amruffatti Level 1

                            So the key is to avoid .csv altogether. It should be Excel > Unicode Text in a txt file with tab delimiters > InDesign with Import Options as unicode with tab delimiters. I handed it over to someone else... I'll let you know how it goes. Thank you.

                            • 11. Re: Using html in Data Merge
                              amruffatti Level 1

                              The original problem had 2 parts: the html that does the formatting, like <i></i>, <b></b>, and so on, and the html for special characters such as µ ≤ ≥ > γ <. As Joel Cherney said, these special characters should be able to pass through if you export the Excel like he said. That works as long as the special characters exist in their original form in the Excel file, that is, if you can see them as what they are and not as html code. If they exist in html code to start with, as in our particular problem, they get carried through as html code all the way to ID. Joel Cherney's operation will not interpret or render the characters, but will only keep the original form intact.

                               

                              How we solved it was to bring all that code into InDesign and Find&Replace every single instance of each one of these things: &#956; or &#181; or whatever (about 100 different ones in total) and apply the visual equivalent: µ, etc. Not fun at all. And we are going to have to do this for more batches of cards.

                               

                              I would have wanted to import this html code into ID and have ID render it, as formatting and as special characters, as it prints the cards. Am I just too ambitious?

                              • 12. Re: Using html in Data Merge
                                Joel Cherney Adobe Community Professional & MVP

                                Am I just too ambitious?

                                 

                                Mmmaybe. My response was simply about non-ASCII data merges, which is something I do all the time. Another thing I do all the time is manage multilingual content for multiple media/channels/what have you. It sounds like you're trying to do to just that, barring the "multilingual" part.

                                 

                                Unfortunately, you have a really generic content-versus-presentation-format problem, and you already have a bucketload of HTML entities in a place where I'd argue they don't belong. If I were managing your content, I would either walk away from the job forever, or somebody would get fired. I mean, I guess that housing your content in Excel is better than just having a Big Boss say "Can't you just suck the content off of the Web site?" but not by much. Content ought to be housed far, far away from its final presentation formats, to keep your designers/Web devs/DTP wonks/etc. from doing acrobatics trying to transform one presentation format into another.

                                 

                                However, since you're already there and doing the acrobatics - I'd look into a script or plugin to do the heavy lifting for you. There's FindChangeByList, and Mutli Find Change off the top of my head. Ickull looks like a very useful tool, but a classic case of overkill for your intended purpose, if it is really limited to "get this Excel content into ID, and transform all of the HTML entities into true Unicode glyphs."

                                • 13. Re: Using html in Data Merge
                                  Joel Cherney Adobe Community Professional & MVP

                                  I forgot to write an entire paragraph about transforming your HTML markup into InDesign styles - that is another spot where some scripting might be useful. Check out this article in which Anne-Marie Concepcion reviews Peter Kahrel's additions to the estimable [Jongware]'s preptext.js. (Phew! It's a Cavalcade of InDesign Celebrities!)