Data Merge in newest version of InDesign

Report · Jun 12, 2012

Hi all,

Was wondering if anyone can point me to the improvments to Data Merge in the newest version of InDesign (assuming there are some). We are specifically looking for a way to output single PDFs (500 page Data Merged InDesign Document saves as 500 PDF files for example) and each PDF being named using a column from the attached Excel file. I realise that it will be possible to have the javascripted, but was kind of hoping to avoid this with the newest InDesgn.

Thanks,

Graham

Report · Jun 12, 2012

There are no changes in Data Merge in InDesign CS6.

Report · Jun 12, 2012

Thanks Steve.

Report · Jun 12, 2012

Have there been any changes to Data Merge since CS3?

Report · Aug 09, 2012

If you're getting 500 separate PDF files, you're doing something wrong. You should be able to save your Excel file (with graphic file references) in tab-delimited TXT format, attach it to your InDesign template through Merge function, complete your layout by dragging fields, then generate your catalog. Check the Indesign Help function. Columns should be your Field names.

Report · Aug 09, 2012

Cosmo,

Did you read the first post? The OP isn't getting separate PDFs, he WANTS separate PDFs and was hoping something had changed to allow it.

Report · Aug 09, 2012

Hi all,

one of the Multimedia guys here wrote a javascript that saves the pdf pages is seperate files and names them using a column from an Excel file. Works quite nicely and I could upload it here in case anyone is interested.

Graham

Report · Aug 20, 2012

Graham,

Please please do upload it to pastebin / here / similar, that would be immensely helpful.

Report · Aug 20, 2012

Here you go. If you run the script, it will ask which csv file to use for the naming, and where its says 'PartnerHQ_Id' in my script, replace that with the column name you want to use. Let me know if anything is unclear; I am not really a Javascript kind of guy ...

/* Put script title here */
var CSV = function(data) {
var _data = data.split('\r\n');

    for(var i in _data) {
        if(_data.length > 0) {
            console.println(i + ' ' + _data);
            _data = _data.split(',');
        }
    }

    var _head = _data.shift();

    return {
        length: function() {
            return _data.length - 1;
        },
        getRow: function(row) {
            return _data[row];
        },
        getRowAndColumn: function(row, col) {
            if(typeof col !== 'string') {
                return _data[row][col];
            } else {
                col = col.toLowerCase();
                for(var i in _head) {
                    if(_head.toLowerCase() === col) {
                        return _data[row];
                    }
                }

            }
        }
    };
};

this.importDataObject("CSV Data");
var dataObject = this.getDataObjectContents("CSV Data");

var csvData = new CSV(util.stringFromStream(dataObject));

if(this.numPages != csvData.length()) {
    app.alert("Number of pages & CSV row count inconsistent");
} else {
    for(var i = 0; i < this.numPages; i++) {
        this.extractPages({nStart: i, cPath: csvData.getRowAndColumn(i, 'PartnerHQ_Id') + '.pdf'});
    }
}

Report · Aug 23, 2012

not sure if it works in CS6 (am trying this in CS5.5 mac running OSX 10.5.8) but I get a dialog with:

Error 24

Error String: this.importDataObject is not a function

on line 37

anyone else tried this script yet?

If the answer wasn't in my post, perhaps it might be on my blog at colecandoo!

Report · Mar 19, 2013

Finally on cs6 and have retested GrahamHe's script. Still has the same error in InDesign, but that is because this is NOT and indesign script - it's an ACROBAT script! The script is applied using the action wizard.

So this doesn't behave the way I thought, such as running the script directly from indesign. The file is still merged to a single multi-page PDF file, and then the script is run via the action wizard.

Had trouble getting the script to work initially via acrobat but I did amend the second line to read

var _data = data.split('\r');

and it worked a treat!

One warning: the names in the column selected to become the filenames need to be unique (such as a primary key) otherwise there is a risk of files overwriting each other.

colly

If the answer wasn't in my post, perhaps it might be on my blog at colecandoo!

Report · Jul 30, 2013

Here's an improved version of that script:

It works for multiple pages per record rather than just one page per record (and it figures out how many, e.g. 500 pages and 100 records = 5 pages each). The only limitation is, each record must have the same number of pages.
It works for UTF-8 csv files, so it doesn't just skip or screw up accented characters or other non-English characters. If you have problems, make sure you're saving UTF-8 csvs - these are the default for most things except Excel, but quite tricky in Excel, see some suggestions here http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding
It allows you to specify text to go before and after each name, and a sub folder to put the PDFs in
It can cope with csvs that have commas and quotes in the cells, using CSV parsing code from here http://stackoverflow.com/questions/1293147/javascript-code-to-parse-csv-data (this should work fine but CSVs are quite variable - if you have problems, try exporting the CSV from a different program, e.g. if it's coming from Excel, try exporting it from Google Docs or OpenOffice)

If you have trouble running it in Acrobat's Actions panel, try running it in the Javascript console instead. Hit ctrl-J or cmd-J, then enable the console. copy and paste the code in, select the code you just copied and pasted, and hit ENTER (not return! on Mac) to run the code.

In my testing, (700kb PDFs, 600 or so records, 6gb RAM Mac Pro) it's pretty darn fast. Everything else I'd tried takes hours and hours or dies mid way, this takes less than a minute. Reading the CSV is probably the slowest part, then it spits out PDFs at a rate of several a second. It seems to take maybe 5% as long as the data merge takes.

Here's the improved code:

var CSV = function (data, delimiter) {
    var _data = CSVToArray(data, delimiter);
    var _head = _data.shift();
    return {
        length: function () {return _data.length;}, 
        adjustedLength: function () {return _data.length - 1;}, 
        getRow: function (row) {return _data[row];}, 
        getRowAndColumn: function (row, col) {
            if (typeof col !== "string") {
                return _data[row][col];
            } else {
                col = col.toLowerCase();
                for (var i in _head) {
                    if (_head.toLowerCase() === col) {
                        return _data[row];
                    }
                }
            }
        }
    };
};

function CSVToArray( strData, strDelimiter ){
    strDelimiter = (strDelimiter || ",");
    var objPattern = new RegExp(
        (
            // Delimiters.
            "(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +
            // Quoted fields.
            "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +
            // Standard fields.
            "([^\"\\" + strDelimiter + "\\r\\n]*))"
        ),
        "gi"
        );

    var arrData = [[]];
    var arrMatches = null;
    while (arrMatches = objPattern.exec( strData )){
        var strMatchedDelimiter = arrMatches[ 1 ];
        if (
            strMatchedDelimiter.length &&
            (strMatchedDelimiter != strDelimiter)
            ){
            arrData.push( [] );
        }
        if (arrMatches[ 2 ]){
            var strMatchedValue = arrMatches[ 2 ].replace(
                new RegExp( "\"\"", "g" ),
                "\""
                );
        } else {
            var strMatchedValue = arrMatches[ 3 ];
        }
        arrData[ arrData.length - 1 ].push( strMatchedValue );
    }
    return( arrData );
}

function isInt(n) {
    return typeof n === "number" && n % 1 == 0;
}

var prepend = app.response("Enter any text to go at the START of each filename:");
var append = app.response("Enter any text to go at the END of each filename:");
var pathStr = app.response("If the PDFs should be saved in a sub folder, enter the relative path here:", "", "pdf/");

this.importDataObject("CSV Data");
var dataObject = this.getDataObjectContents("CSV Data");
var csvData = new CSV(util.stringFromStream(dataObject, 'utf-8'), ',');
var pagesPerRecord = this.numPages / csvData.length();
if (isInt(pagesPerRecord)) {
    for (var i = 0; i < this.numPages; i ++) {
        var pageStart = i*pagesPerRecord;
        var pageEnd = (i+1)*pagesPerRecord - 1;
        var recordIndex = (i + pagesPerRecord) / pagesPerRecord;
        var filename = csvData.getRowAndColumn(i, "filename");
        if (!filename) {
            app.alert('No filenames found - using "file-XX.pdf". Press Escape after continuing to cancel.');
            filename = "file-" + i;
        }
        var settings = {nStart: pageStart, nEnd: pageEnd, cPath: pathStr+prepend+filename+append+'.pdf'};
        this.extractPages(settings);
    }
} else {
    var message = "The number of pages per row is not an integer (" + pagesPerRecord;
    message += ", " + this.numPages + " pages, " + csvData.length() + " rows).";
}

Report · Aug 27, 2013

It seems this script gets slower the more hyperlinks there are in the document:

Document with 1100 pages and no hyperlinks: About one 2-page PDF a second, maybe slightly more. Not bad.
Same document with 11 hyperlinks every 2 pages: About one 2-page PDF every two or three minutes. Hopeless, it would take more than 24 hours.
Same document run through Optimise PDF aggressively reducing its file size by 70% and then through Save As Reduced Size PDF turning off all backwards compatibility: About one 2-page PDF every 24 seconds. Bad, needs to be left running for 3 and a half hours at the end of the working day.

In each case it's "Saving PDF..." that is the point the process struggles with. I don't know what it is about the addition of hyperlinks that makes the PDFs so slow to save.

I think it's maybe something to do with cross-reference tables - the final PDFs are 25% cross reference tables according to Optimise PDF's audit tool, and there are no cross references so I can only guess that this is the hyperlinks. Why so much data, I have no idea.

Report · Dec 11, 2013

Thanks for posting your revised script, it would seem that it should do exactly what I require.

I think I've set it up correctly as an Action in Acrobat.

I have a 137 page PDF which I want to split into single page documents. I have a .csv file with a list of 137 unique IDs, which I would like Acrobat to use to name the individual files.

I run the action and get prompted to input anything to prefix and suffix the filename with, together with an option of a relative path.

I then get asked to choose a data file, so I choose my CSV list.

The script seems to run ok but I get no output.

I'm guessing at some point it should ask me for column name and also how many pages per file output??

Or do I need to edit that in the script first? If so, which are the parts to edit?

THanks in advance,

Ben

Report · Jan 17, 2014

Hey Ben, sorry looks like I wasn't clear - the CSV file must contain a column headed "filename" (lowercase). It pretty much only uses that one column, so it can be a one column CSV. It's written into the code.

If you want it to be variable (e.g. if you get CSVs you can't edit), add a line around line 67 like:

var columnHeader = app.response("Enter the column header of the filename column in the CSV", "", "filename");

...then find/replace "filename" (including quotes) with columnHeader.

A few other tips:

Make sure the order of rows in your CSV matches the order of the file you used to create your merged document!
Make sure you UNTICK "Create tagged PDF" when exporting the PDF else everything goes massively slow. This is the cause of my puzzlement about hyperlinks in my last comment - hyperlinks are fine, but tagged documents go nuts.
If it doesn't start merrily churning out PDFs, scroll down in the Acrobat console box. Sometimes it hides error messages right at the end of the code, the last place anyone would expect. "The number of pages per row is not an integer" means you have a mismatch between the number of rows in your CSV and the number of records in your PDF.

Report · Feb 01, 2014

Hi,

hoping someone can help.

I have a 48 page document which I need to split into 8 documents. I have the excel spreadsheet with column headed "filename" with 8 file names.

When I select the data file to import after being asked to enter text at start of filename etc I see the message 'No filenames found - using "file-XX.pdf". Press Escape after continuing to cancel.'

and then the error -

RaiseError: The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.

Doc.extractPages:83:Console undefined:Exec

===> The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.

file-0

I have checked the number of rows in the csv with the one which i created the merge in Indesign and it is the same. I have also tried usinh google docs spreadsheet but acrobat doesnt recognise the URL.

Thank you.

Report · Feb 04, 2014

Sounds like you've got the CSV file open in Excel or something similar.

Certainly with Excel for Mac, it gets stroppy if the CSV file is open - Excel won't let the script read it. Not sure about any other programs but I imagine they're similar.

Make sure the CSV is closed before running the script. I find it works best to have an excel version and a CSV version, keeping the excel version open to make any edits and closing the CSV versions as soon as they're saved.

Report · Feb 04, 2014

Thanks for your reply.

I tried with exitng Excel but the same problem occured.

Report · Feb 04, 2014

Hi alanomaly,

I managed to get acrobat to read a csv which was created in officelibre(as you mentioned) but I just have one more issue.

I am getting the error message 'The number of pages per row is not an integer (5.454545454545454, 60 pages, 11 rows).' which makes me believe that it doesnt realise I have 6 page documents.

Thanking you in advance.

Report · Jun 25, 2014

Hi, Did you receive an answer to this issue? I get the same error:

The number of pages per row is not an integer (5.333333333333333, 16 pages, 3 rows).

How does the code know how many pages each document should be?

Report · Jun 26, 2014

If it's saying "'The number of pages per row is not an integer (5.454545454545454, 60 pages, 11 rows)" that means that the number of pages isn't divisible by the number of rows. So, yvesha has 60 pages and 11 rows, which means 5.4545 pages per row, which doesn't make sense, you can't have a PDF with 5.4545 pages. It sounds like you're aiming for 6 pages per row, which means there should be 10 rows in your CSV ( 60 / 10 = 6 ).

TStanwood has 16 pages and 3 rows, but 3 doesn't go into 16. This could mean there's too many pages in the Indesign file (maybe there's one extra page and it's supposed to be 3 five-page PDFs?) or the wrong number of rows in the CSV.

Sometimes Excel adds empty rows to CSVs for no reason other than wanting to ruin your day... if the number of rows the script says you have doesn't match the actual number of rows you're seeing in the CSV in Excel, open the CSV with a plain text editor and delete any empty lines of text.

Report · Jun 26, 2014

The issue is I have two rows, but it's counting the header row as one.

Todd

Report · Jun 26, 2014

Hi, thanks for responding. I only have two rows but it's counting the header row as one making it three. Todd

Report · Apr 11, 2014

How do I set this script/action up + execute it?

What does the process look like from CSV to InDesign to Acrobat to final PDF?

I'm slow!

Report · Mar 02, 2016

Hi Alan,

Thanks for the script, but like ben, it seems to run but doesn't actually give any output - When clicking the full report it states that it has been successful but only takes less than a second to complete and there are no files.

I don't know if I am putting the right info in the dialogue box that asks: "If the pdf should be saved in subfolders..." - what info needs to go here?

Really hoping I can get this to work

Adobe Community

Data Merge in newest version of InDesign