4 Replies Latest reply on Aug 29, 2013 3:50 AM by alanomaly

    Create hyperlinks with script, without big increases to size and processing reqs of resulting PDFs?

    alanomaly Level 1

      Quick summary: I've got a script as part of a process involving a large Data Merge based document that creates hyperlinks using doc.hyperlinkURLDestinations.add(). The script simply turns text snippets into clickable hyperlinks.


      The problem is, the final PDFs that emerge from this process are 300% the size of a PDF without the script being run, and are very slow to process. Most of this extra weight comes from a crazy 740,000% increase (!!!) in the amount of data used for "Structure info" (and that's after running the version with hyperlinks through the Acrobat PDF optimiser).


      The amount of structure data in the PDFs with hyperlinks ends up being more than twice the entire file size of the PDFs without added hyperlinks. There's also a substantial increase in the size of the "Cross Reference Table".


      This big increase in weight turns a simple, fast process that takes a few minutes into a torturously slow one that takes hours, creating bloated files at a painfully slow rate in a process that is prone to crashing and failing.


      I'm hoping for another way to create hyperlinks that doesn't add all this weight.




      Detail: Here's my current process. It works fine except step 2 causes final PDF file size and the time required to run other processes to increase massively. I know there are different ways of implementing hyperlinks in a PDF - I'm hoping there's a different approach I can use at step 2 that avoids this massive bloat.


      The goal is to create 600 2-page PDFs from one indesign template using data merge, with each PDF having a filename that reflects the records and live hyperlinks that vary from record to record. Here's my current process (for benchmarking, the machine is a Mac Pro, Lion, CS6, with 6gbs RAM, and all files are on a local HDD, no data transfer over any networks):


      1. Set up a 2-page INDD with data merge placeholders, and spreadsheet.
      2. Create merged INDD, importing hyperlinks and other content as text, then run this script in InDesign which makes the hyperlinks live using doc.hyperlinkURLDestinations.add()
      3. Export a PDF from the 1,200 page PDF - from File > Export if step 2 was done, or from the Data Merge tool if it wasn't.
        • Without step 2: the PDF is around 110mbs and takes a couple of minutes to create.
        • With step 2: the PDF is around 170mbs and takes over an hour and a half to create - I don't know exactly how long as I ended up leaving it running overnight, but it had already been running 1 hour 20 when I left and didn't look close to finishing.
      4. Run this script in Acrobat to chop the PDF into 2-page segments with filenames based on the spreadsheet data.
        • Without step 2: the process takes around quarter of an hour, creating PDFs slightly faster than one a second, and the PDFs are 700kbs.
        • With step 2: the process freezes after about 10 PDFs, and takes minutes to produce each of these. Even if it didn't freeze it would take about 25 hours to do all of them.
        • With step 2 and then also reducing the PDF to 45mbs through Acrobat's PDF optimiser and 'save as reduced size' to cut out backwards compatibility, the process takes about 4 hours, producing a PDF every 20-30 seconds, and the PDFs are around 3mbs each. Through watching the process, almost all the additional time taken is from the "Saving PDF..." step for each one.


      Each 2-page PDF has 14 or 15 hyperlinks. I don't understand how 14-15 hyperlinks can result in an extra 2mbs of "" and something like 3500% more processing time to create each PDF.


      Can anyone suggest any alterations to the script at step three that might avoid all these overheads? Here's the full script for convenience:



      app.findGrepPreferences = app.changeGrepPreferences = null;

      var doc = app.activeDocument;

      app.findGrepPreferences.findWhat = '(http://.*$|https://.*$)';

      var objs = doc.findGrep();


      for (var i = 0; i < objs.length; i++) {

          var currTarget  = objs[i];

          var lnkDest = doc.hyperlinkURLDestinations.add(currTarget.texts[0].contents);

          var lnkSrc = doc.hyperlinkTextSources.add(currTarget);

          var lnk = doc.hyperlinks.add(lnkSrc, lnkDest);


      alert('Processed '+objs.length+' hyperlinks');



      Edit - here's a side-by-side comparison of the Acrobat PDF optimiser 'Audit Space Usage' tool, showing where the hyperlinks add bulk. I've marked the two big increases in red...



      ...so there's a massive, insane increase in the size of the "Structure info", from a tiny 0.000238 mbs to 1.76 mbs - two and a half times the size of the entire original file, just on "Structure info"! That's a 740,000% increase...


      There's also a big fat increase in the cross reference table, from 0.0046 mbs to 0.427 mbs - the cross reference table in the PDF with hyperlinks is more than half the size of the original file.


      The only differences between the two PDFs is, one has 14 clickable hyperlinks attached to existing snippets of text (and, the 'with hyperlinks' one is from a PDF that's been through aggressive PDF optimisation, hence the images are much smaller).