Skip navigation
Jesse Jackson
Currently Being Moderated

Importing CSV file with Data Merge Fails

Mar 23, 2013 6:34 PM

Tags: #error #osx #indesign #csv #encoding #os_x #cs6 #unicode #data_merge

Specs

See pasted text from CSV at http://pastebin.com/mymhugpN

I am using InDesign CS6 (8.0.1)

I created the CSV by downloading it from a Google Spreadsheet as a CSV. I confirm with the Terminal that the character encoding is utf-8 usnig the file command.

 

Problem detailed

I am trying to import a CSV file (utf-8) with Data Merge via the Select Data Source... command with Show Import Options checked. When viewing the Data Source Import Options dialog, I set the following options—Delimiter:Comma, Encoding:Unicode, Platform:Macintosh. I leave Preserve Spaces in Data Source unchecked. It fails to import any variables and produces no error message. I have tried other CSV files as well (created TextEdit, Espresso, etc.) and it seems that InDesign will not import any files if Unicode is specified as the encoding, no matter which other options are specified.

 

Can anyone else confirm this?

 

Importing as ACSII works, but obviously does not display my content correctly.

 
Replies 1 2 Previous Next
  • Currently Being Moderated
    Mar 24, 2013 3:36 AM   in reply to Jesse Jackson

    pattern86 wrote:

     

    Importing as ACSII works, but obviously does not display my content correctly.

    Obviously? What's obvious? What isn't being displayed correctly? I'm not sure that pastebin supports non-ascii characters, but I don't see anything in the paste in a quick look that should be a problem if the file is saved in ascii format.

     

    If you want someone to test the file to see the results we'll need a link to a sample file.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 24, 2013 3:01 PM   in reply to Jesse Jackson

    I'm not going to claim to be an expert on file encodings, but if I copy your text and save in Notepad (I'm a windows User) as UTF-8 I am presented with the following error:

    Bad CSV UTF-8.png

     

    If I susbsquently open in Excel and remove the added characters before the text in the first cell (Which I presume are the UTF-8 binary tag) I am able to load the file as ASCII and do the merge.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 24, 2013 3:20 PM   in reply to Peter Spier

    I also tried saving as Unicode text instead of UTF-8 in Notepad, and that works fine. As I say, I'm not expert in file encoding, but I think you may be under a misaprehension that UTF is Unicode encoding, and it simply is not a format that ID will read for Data Merge.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 25, 2013 1:44 AM   in reply to Jesse Jackson

    I don't think so. I think it's a problem with UTF -8 encoding. Can you save as a different flavor of unicode from Google?

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 25, 2013 2:43 AM   in reply to Peter Spier

    I've had a similar problem with text that was generated on Filemaker Pro for Mac. Drove me INSANE. Was with a former employer and has not been an issue since, but I had a somewhat clumsy workaround:

     

    1. Make a new InDesign document and place the csv text into indesign as a textframe.
    2. While in the textframe with the text tool, select file/export and export as text.
    3. Go back to the data merge file and from the data merge panel, choose select data source. The data should now import... but some of the characters will go haywire.

     

    I haven't really investigated what was going on as the job was always red-hot urgent, and I haven't been able to replicate the fault as I don't have Filemaker Pro. If anyone reading would like to try this,  have at it!

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 25, 2013 2:43 PM   in reply to Jesse Jackson

    Waht about opeing the UTF-8 file in Text Edit or Text wrangler and resaving from there as Unicode?

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 25, 2013 5:52 PM   in reply to Jesse Jackson

    I had no problem importing your data as either .txt or .csv. Granted I run CS5.5 but am also on a Mac and it went smoothly. I basically copied your data to TextEdit, saved and then imported through Data Merge. To get the CSV I opened that text in Excel and output as comma de-limited CSV.

     

    It looks like your data (in PasteBin) has 2 different encodings, or at least has been created by 2 different software programs or maybe computers. You can see this in the text:

    Screen Shot 2013-03-25 at 5.48.21 PM.png

    As you can see, there are two different quotation marks going on - the first looks like it's been copied straight from Word and the second looks to have been generated by a text or HTML program. This is also a common problem with websites when people supply text in Word/Excel and copy straight into HTML - those characters get messed up.

     

    Run a Find and Replace on the first set of quotes in that text and the file should load with no problem.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 30, 2013 12:24 AM   in reply to Jesse Jackson

    I downloaded your sample. As per my editor of choice, it shows it is UTF-8 format:

     

    capture-000179.png

     

    After setting up ID to merge it:

     

    capture-000178.png

     

    Seems to work just fine. Unless I am missing something.

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 30, 2013 3:43 AM   in reply to MW Design

    What editor are you using?

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 30, 2013 8:27 AM   in reply to Peter Spier

    Hi Peter, UltraEdit. Been using it since the mid 1990s for programming and then later, for HTML, PHP and what not.

     

    As regards this issue, UE will preserve the BOM with UTF files if it is set to do so (and not convert them to the OS'  format). Which is how I have it set. If one sees the odd characters at the beginning of a file, it is due to conversion to the OS format one is using. Otherwise the BOM is hidden from view.

     

    Down towards the bottom of their page linked below explains the BOM if anyone cares. For UTF files it is necessary that these be preserved.

     

    http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/unicod e.html

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 30, 2013 8:37 AM   in reply to MW Design

    Thanks.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 31, 2013 9:13 PM   in reply to Jesse Jackson

    Sorry, been away at a weekend golf tournament. Tried to reply with my phone but up in the mountains I waas cut off on both reply attampts.

     

    I am using CS6.

     

    Your file is a UTF-8 ASCII file. If you want Unicode, it will be a UTF-16 file and as long as you are using Western languages, you really should be using the UTF-8 ASCII.

     

    If you create a UTF-16 format CSV file, you will see that ID automatically determines it is the Unicode import type. Now, they are both Unicode (UTF-8 and UTF-16 that is). But to ID, I think the UTF-16 format(s) are used specifically for non-Western languages, or certainly a mix of Western and non-Western languages.

     

    At least that's what I find.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 1, 2013 7:34 AM   in reply to Jesse Jackson
    When I import that utf-8 file

    Is this the file you uploaded to pastebin? Frankly I do not see those characters listed in post 18 in that file. If the missing characters you list in post 18 are suppose to be in the text, try creating a ZIP file of the text (both the .txt and the source Google Doc spreadsheet, maybe saved as an Excel or other spreadsheet type) and posting it somewhere like dropbox. Then provide a link.

     

    The reason it will not import as Unicode, I believe, is that it is a UTF-8 file and is considered as ASCII for import.

     

    Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 1, 2013 11:30 AM   in reply to Jesse Jackson

    Ok. I think that we aren't getting anywhere quick at my end. I still do not see any upper-range characters in your file(s).

     

    I went to a web site and copied a bunch of characters and made a table. I added dollar signs where their characters were missing. But I think it illustrates that if the file is set correctly, ID automatically picks the proper format for import.

     

    capture-000181.png

     

    And the result when merged into a new file:

     

    capture-000180.png

     

    Converting the source records as an ANSI text file, and importing it, ID chooses:

    capture-000182.png

    And the resultant merge equals:

     

    capture-000183.png

     

     

    Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 1, 2013 4:20 PM   in reply to MW Design

    I downloaded the sample files, and I get the same character mapping issues. I wonder if this might be font-related. Is the font in Google spreadsheet really unincode compliant, or is it some older font that has Windows -Mac character mapping differences?

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 1, 2013 4:33 PM   in reply to Peter Spier

    Google Doc export of the above here:

     

    http://www.wenzloffandsons.com/temp/google_doc_export.csv

     

    It is a UTF-8 ASCII.

     

    Which means if you import it allowing ID to set the import type (Ascii) then it works fine.

     

    Want me to ZIP up a Unicode (UTF-16) version? ID will then automagically pick Unicode for the type and the imports will be identical.

     

    Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 2, 2013 5:55 PM   in reply to MW Design

    Mike is having some trouble posting in this thread (and I am too), but he sent me a PM with what he wanted to say:

     

     

    OK. I think I might have a positive answer for you.

     

     

     

    I was getting lost in the upper ASCII characters you showed. In your test file I never could see any--a case of not seeing the trees for the forest.

     

     

     

    Your quote marks are getting dropped in your test file. Now, this may or may not affect other factors but it does in some further testing. I believe ID has an issue with dropping quote marks even in a plain ASCII file if the marks are at the beginning of a sentence and the file is tab delimited. Call it a bug.

     

     

     

    Because of all the commas and quote marks in your simple file, I think you should be exporting from Google Docs' spreadsheet as a tab-delimited file. This exported file has to be opened in a text editor capable of saving it out as a UTF-16 BE (Big Endian) type of file.

     

     

     

    Also, I think you are going to have to use proper quote marks throughout, or change them in the exported tab-delimited file. Best to have a correct source, though.

     

     

     

    Here is your sample ZIPped up. I think it works properly. But then again, I think I might be bleary-eyed by now.

     

     

     

    http://www.wenzloffandsons.com/temp/merge_psalms_utf-16.zip

     

     

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 3, 2013 9:41 AM   in reply to Peter Spier

    Thanks Peter--especially if you can read this when I hit post...

     

    Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 3, 2013 9:49 AM   in reply to MW Design

    I still don't know what the problem was with posting yesterday. Admin says the error we were seeing usually means the spam filter kicked in because of something in the text, but all I did last night was remove a line of asterisks and it posted just fine, and as far as I know a line of asterisk is not flagged by the filters.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 5, 2013 11:15 PM   in reply to Peter Spier

    Have been following this thread with some interest as a user who uses Data Merge all of the time. Very rarely do I come across this issue, but when I do it is normally when the job is in a hurry.

     

    I've downloaded the csv from the google file and noted the same thing the OP had noticed, and that was the quotes appearing as the OP had stated in part 18 of this thread.

     

    My solution was to go into Textwrangler on a Mac and save the file as the following:

     

    forumanswer1.png

    Then, went back into InDesign, selected "Select Data Source", and (making sure that "Show Import Options" is checked ON) selected the new database. A dialog box should now appear with these options:

    forumanswer2.png

    and voila! The data should go from bad to good!

    forumanswer3.png

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 5:37 AM   in reply to cdflash

    You are still not there, Flash.

     

    Compare the right-hand side of your last screen shot to the one below:

     

    capture-000208.png

    See the difference? Look at the raw source again. There are double quotes, a lower and higher ascii quote. The lower ascii quote gets stripped. Download the ZIP at the link in my last message Peter was kind enough to post when I couldn't.

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 6:25 AM   in reply to MW Design

    Actually. I think the stripping of the outer quote is correct for a .csv file. It's there to inidcate everything inside should be treated as literal, including commas, so that you don't wind up with extra fields.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 3:23 PM   in reply to Peter Spier

    Not to rain on any parades, but IMHO Pete has it right. The opening and closing quotes of that field aren't meant to be there. I stand by my screenshot.

     

    My situation may also not be the same as the OPs, but I've had issues in the past from what FileMaker Pro calls tab delimted files (.tab) and that is twofold:

    1. When importing into InDesign, I get the same dialog box as Peter did in the third post of this thread; and
    2. If I do manage to place the file after saving a bad way, I then get the problem that the OP had in the 18th post of this thread.

    Thanks to this thread I have now overcome the latter issue....

    ...However I do not know if saving as specified in my earlier post will resolve the former issue as I don't have FileMaker Pro. Has anyone experienced this before?

     

    Colly

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 4:32 PM   in reply to Peter Spier

    Not for a tab delimited file. I know of no other application that mix and match delimiters. one should be able to have triple setz of quotes with a tab delimited file.

     

    plus under certain circumstances, only the leading double quote is stripped.

     

    i will stand by my assertion that id is goofing up a tab delimted utf-8 merge when double quotes are are present.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 4:44 PM   in reply to MW Design

    Mike,

    The file was saved as .csv, not tab delimited...

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 8:06 PM   in reply to Peter Spier

    I think there is a bit of mixed communications here so I just want to clear this up.

    The problem I am having is SIMILAR, but not IDENTICAL to the OP's issue.

    The OP's data from post #20 only had straight quotation marks in the second to last field of each record, presumably because the other fields did not contain commas but the second to last field did; and is to ensure that only items within the straight quotation marks are the data for that field, and not where any commas may be.

    I have seen this happen before but can't reliably say what software does/doesn't do this.

    To confirm my theory, look at any reccord after the Devotion reference (e.g. Psalm 147:3 for example) and there is a comma to separate the field, then a straight quote. The end of that field does not always have a "double quote" but instead has a straight quote followed by a comma. I think the "double quote" is a red herring.

    My dramas were spelled out in  post #30 of this thread, and am on the lookout for any test data that can be supplied from FileMaker Pro, preferably exported as a .tab file; and featuring soft-returns in any of the sample data so I can replicate the fault from my old client.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 6, 2013 8:34 PM   in reply to Peter Spier

    Peter, mine are tab delimited. I.e., I stripped/converted the OP's database and thought I had made it clear above in the post you posted for me. The reason is with all the quotes, double quotes and commas in the narrative text, it is the only means of maintaining them else ID (properly) gets confused as to what to strip and what to retain.

     

    For 15 years, I owned a company that wrote vertical market database publishing software. Mainly for the upper end of the insurance company spectrum (AIG, Zurich, and various entities operating under the Loyds of London umbrealla--which isn't an actual insurance company but that is another story--etc.).

     

    We usually had to use tab delimited merges due to the same issue. I didn't need to jump through the hoops as seen in this thread to produce the merge. Plain ascii files for Word were used in the offices, but mostly used large-scale databases, sometimes using ZIM to interface, sometimes direct for the underrating, quotes and actuary reporting. I know it can be done without stripping the quote marks is the point.

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 7, 2013 5:51 AM   in reply to MW Design

    But because YOU chose to convert from .csv to tab delimited doesn't mean the quote marks belong in merged data. You chose to preserve marks that inthe normal course of working with .csv are stripped out, so you can't say that preserving the .csv format during any of the translations and having those marks stipped in the final merge is an error.

     

    I tend to use tab-delimited text for files like this, myself, but had they been saved that way origianlly I'm quite confident that the outer quotes would not have been included. Reading the text I see no indication that the entire block is a quotation with nested quotes inside (and most style guides would have nested quotes into single quote marks rather than double).

     

    I bet if you resave your tab-delimited file that preserved the outer quotes as .csv from your text editor and open it again you'll find an additional set of new quotes outside.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 7, 2013 6:51 AM   in reply to Peter Spier

    It looks like I may not have the behavior for tab-delimited text entirely correct, as far as the adding of quotation marks, but here's a couple of screen shots to illustrate what happens when you save as .csv from Excel. I've never used Google Spreadsheet, but I don't have any reason to think it would behave differentely, or if it did that ID would understand that the file coming from Google as .csv was somehow different from one coming from another application and it should preserve quotes that would otherwise be discarded.

     

    The text in Excel:

    Save as CSV 1.png

    Note that there are no surrounding quotation marks.

     

    The .csv opened in Notepad:

    Save as CSV 2.png

    Now there are extra quotes surrounding the internal quote and the entire text string. I was surprised to see that Excel added the same quotes in the tab-delimited file I also saved, so it looks identical.

     

    And this is that same .csv file merged into ID (after adding a header row in Notepad):

    Save as CSV 3.png

     

    Extra quotes are stripped out as they should be.

     
    |
    Mark as:
  • Currently Being Moderated
    Apr 7, 2013 1:14 PM   in reply to Peter Spier

    Microsoft explains that if extra quote marks are not desired, one needs to use a macro to prevent Excel from adding them. I have VBA code that does this.

     

    Without input from the OP, we may never know if the quote marks in the various fields are desired or not. In the raw paste, they are there, so I was going by that. I agree that style-wise, I would use italics for proper titles such as the beginning of record 8. I wouldn't quote the artist's description., etc.

     

    Another (possible) means of obviating the ascii vs. unicode, utf-8 vs. utf-16 BE issue(s) might be to convert to XML. This has an added advantage of not needing to stitch the data merge's frames, copy/paste into another story, etc.

     

    capture-000211.png

    It's been a fun exercise and debate. I have not used ID's XML importing (mapping to styles, etc.) before, nor done much data merges with it. My stuff is rather pedestrian.

     

    Take care, Mike

     
    |
    Mark as:
  • Currently Being Moderated
    May 12, 2013 5:30 PM   in reply to Jesse Jackson

    I've run into similar problems. I'm using OpenOffice to create a spreadsheet that I export to CSV, and use that as the data source for an InDesign CS2 document. When exporting from OO, if I use the default UTF-8 setting, and open the CSV file in TextWrangler, it shows as "UTF-8 with no BOM". If I resave it as UTF-8, it fails to import.

     

    The thing is, I have some symbols in the Apple Symbols font that I need to import into my document. When the spreadsheet is exported as UTF-8, the file imports, but the symbols don't print correctly, but they show up correctly in Textwrangler. I found, through trial and error, that if I create the CSV file using the plain "Unicode" setting, then open it up in Textwrangler, it shows up as "Unicode UTF-16". I then import this data source into InDesign CS2, using the options "Unicode" and "PC", and it works perfectly.

     

    I'm on Mac OS X 10.5.8 Intel.

     
    |
    Mark as:
1 2 Previous Next

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points