7 Replies Latest reply: Aug 29, 2013 3:51 AM by alanomaly RSS

    How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?

    alanomaly Community Member

      This question is related to a question in the InDesign scripting forum, Create hyperlinks with script, without big increases to size and processing reqs of resulting PDFs?

       

      I'm having trouble with a process (explained in more detail at that other question) that involves creating large batches of PDFs via Acrobat and InDesign's Data Merge feature.

       

      The addition of clickable hyperlinks causes an absolutely massive increase in the amount of "Structural Info" in the final PDFs, according to Acrobat's PDF Optimser's PDF Space Audit feature. The PDFs are also extremely slow to process - presumably because of this weight. Here's the details of the PDF space audits:

       

      acrobat-audit.png

       

      I'm wondering if any of the Acrobat experts here can advise one what it is about the presence of hyperlinks that could cause this huge increase? And if there's anything that can be done about it?

       

      The PDFs above are cut from a master PDF using an Acrobat script (Acrobat X), and that master PDF has been through the PDF optimiser with almost all options selected except those relating to tags and javascript, plus 'Save as > Reduced File Size'. The hyperlinks are created in InDesign CS6 using a script using doc.hyperlinkURLDestinations.add(). Other case-specific details are at the other question.

       

      The main focus of this question is, what could be the meaning of all this extra Structure Info, and what if anything can be done about it in Acrobat?

       

      ----

       

      I've tried re-saving the optimised PDF with 'Remove all document tags' selected, but Acrobat can't manage it - the process just hangs and needs to be forced-closed (Mac Pro, 6Gbs RAM, 2.4GHz quad-core)

        • 1. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
          Sabian Zildjian Community Member

          What kind of hyperlinks?  Are they web links to external pages/pdfs/files?  Or are they PDF bookmarks, Named Destinations, and TOC hyperlinks?  The later types will take up considerably more data within the PDF file.  The tagging of those types of links will add more data as well.

          • 2. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
            alanomaly Community Member

            They're all regular web links to http:// web pages, except one which is a http:// url to a downloadable PDF (for some reason on my machine this one opens in a different browser - Safari, while the others open in my default browser Chrome - I think Acrobat must have a different default for urls ending in .pdf).

            • 3. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
              Dave Merchant CommunityMVP

              Without access to these two different files, there's no way anyone on here can tell what's going on.

              • 4. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
                alanomaly Community Member

                I'm not expecting a perfect magic answer to drop from the sky... but I'm sure people with an understanding of what this "Structure Info" is actually used for can share insight or suggest possibilities.

                 

                For example, I've done quite a lot more reading and research, and I stumbled on something that suggested the data the Audit tool calls "Structure info" corresponds to what the PDF Optimiser options call "document tags". That gives me an avenue of investigation. There were probably people on this forum who knew that off the top of their heads.

                 

                I've also learned that accessibility features can create complex tagging structures in a PDF, and that this is one known possible cause of bloated "Structure Info" - so I've started checking back through my workflow in case anything could have added any accessibility related tagging that I wasn't expecting. It might not be the cause, but it's a possibility and is something else that I'm sure some people here knew off the top of their heads.

                • 5. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
                  Dave Merchant CommunityMVP

                  The 'structure' audit includes all the tagging and accessibility features, that's made clear in the help files. What you're asking is why this particular file is getting 1.5MB of extra stuff, and we can't possibly comment without seeing the file. You're evidently not comparing like with like, as the size of all the other audit blocks are completely different.

                  • 6. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
                    Test Screen Name CommunityMVP

                    The interesting thing is that you are clearly adding lots of JavaScript. This doesn't appear to be accounted. Maybe the structure info is including or comprising that. Just a thought.

                     

                    However, I can't understand how the first file could possibly be tagged - tagging is a big overhead and can multiply the size of a text-only file.

                    • 7. Re: How can adding hyperlinks cause a 740,000% increase in the amount of "Structure info" data in a PDF?
                      alanomaly Community Member

                      Test Screen Name: It looks like your hunch about the difference being that the first doc is untagged is right - and it looks like tagging is the root of the problem.

                       

                      I've been back over the process, making sure that I wasn't allowing InDesign to include document tags this time, and the amount of structure info is back down to normal and Acrobat is no longer struggling to cope with the documents.

                      Why the hyperlinks cause so much additional tagging, I don't know. What the difference is under the hood between hyperlinks in a tagged and non-tagged PDF, I don't know (all I know about hyperlinks under the hood is that they can be implemented as javascript or named destinations, according to your comment on my other thread). Not sure how to find out more - this is beyond the level of detail that normal manual docs or guides go into. But it looks like I'm on the right track now, so maybe I don't need to.

                      ---------

                       

                      Dave Merchant: before giving people the "RTFM" treatment, it's polite to read the manual yourself, here's the entirity of Adobe's online help page about the Acrobat space audit feature:

                      Audit the space usage of a PDF

                       

                      Auditing the space usage gives you a report of the total number of bytes used for specific document elements, including fonts, images, bookmarks, forms, named destinations, and comments, as well as the total file size. The results are reported both in bytes and as a percentage of the total file size.

                      1. Choose File > Save As > Optimized PDF. The PDF Optimizer dialog box opens.
                      2. Click the Audit Space Usage button at the top of the dialog box.

                       

                       

                       

                      This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

                       

                      Note the absence of any sentence stating anything like " 'Structure info' refers mainly to document tags, which are used for things including defining reading order of text elements for screen reading, SEO and other automatic processing. This can add a lot of weight to a document, especially if the document is intended to be fully accessible". I learned that from other (more helpful) forum threads I found on loosely related topics.

                      The only search results for "structure info" in the adobe.com Acrobat docs help section are two PDFs referring to a seemingly unrelated concept of the same name in the context of the Acrobat API and SDK.

                      I still don't know what else other than tagging comes under the heading "Structure info" (if anything else does), but I do now know one thing about what "Structure info" might mean, and that one thing is proving very useful.