3 Replies Latest reply on Jul 24, 2014 10:44 AM by tubaboy

    Questions about Tags and Reading Order


      I have been creating accessible PDFs from various types of documents for several months now, and I have a few specific questions as to whether I'm doing this correctly or not.


      1. When tagging, I am generally trying to keep the tags themselves neat, in order, etc. Does this matter? For example, grouping all tags from each page into their own <Page> tag, artifacting and deleting blank paragraph tags, etc. Is this necessary?


      2. I am working on a document now, and when I clean up the tags and then correct the reading order, it completely jumbles up my tags tree afterwards. If I do this in reverse order, set the reading orders and then try to organize the tags, it puts background elements/artifacts in front of text, etc. I guess this relates to the question above as well, should I just ignore that the tags become jumbled in their trees?


      I use a number of different software packages such as Word, InDesign, etc. Any help would be greatly appreciated!

        • 1. Re: Questions about Tags and Reading Order
          CtDave Level 6

          It looks like you're walking into the bramble bush.
          An accessible PDF is one that complies with ISO 14289-1 (PDF/UA) which is a sub-set of ISO 32000-1 (the ISO standard for PDF).
          So, in 32000-1 (PDF-1) there is no "<Page>" grouping element.
          Grouping elements are only used to group other structure elements and are not directly associated with content items (a paraphrase of a "shall" statement - so, no deviations here).
          The grouping elements that are typically of interest are:
          Document | Part | Art (Article) | Sect (Section) | Div (Division)
          (there are others, see table 333 in ISO 32000-1)
          Now you can use a "tag" name other than those identified in ISO 32000 however such are required to "role map" to the appropriate standard element (e.g., one of those identified in PDF-1).
          So,  in the structure tree (in the Tags panel) you might see "<Normal>" provide by Word to the tagged output PDF.

          "<Normal>" role maps to the appropriate standard PDF element which is <P> (Paragraph).
          "Rule 1" for accessible / tagged PDF is to work out of the structure tree. 

          While it is good to try to harmonize read order to the logical hierarchy of the PDF content as established by the well-formed structure tree this is not always possible.
          Regardless, It is the structure tree that dominates; read order is subordinate.

          AT uses the structure tree to understand a tagged PDF's logical hierarchy which it in turn provides to the end-user making use of AT

          (note Adobe Reader / Acrobat read out load is *Not* "AT" -- Do "AT" evaluation with "NVDA" (open source, free, 1st string / "starter" AT application) *not* read out loud.
          Artifacts - YES - if something present does not provide real semantic content then it has to be made "artifact".
          Blank tags - ya, dump 'em.  Very often these come from using the return key to make "white space" between paragraphs, etc. Each of these 'lines' is (under the hood) associated with the style / paragraph tag in use by the authoring application at the time of the use of the Return key.
          So, "blank" tags in the tagged output PDF.
          Do Not do this. Configure styles / paragraph tags to provide the desired before / after white space.
          PDF tags / elements used *must* be semantically correct in their usage.
          Proper use authoring application's built-in features (e.g., headings, table insert feature, styles for lists, etc);

          Proper mastering of content to support its logical hierarchy, proper tag management available (and configured) provides a tagged output PDF that will have a 'good' structure tree.
          Often there is post-processing required (with Acrobat Pro) to achieve the 'well-formed' structure tree.
          Again, it is the structure tree that will make or break the accessible PDF. While working with TORU (for tables) or within the Content panel may be needed these are subordinate to the structure tree.
          Be well...


          • 2. Re: Questions about Tags and Reading Order
            NathanSwyers Level 1

            Thanks for the very detailed response, Dave.


            If I am understanding you correctly, the reading order really doesn't matter in the end--assistive devices will be following the tag heiarchy.


            It sounds like I have some work to do still as far as tables are concerned and setting the column spans. Otherwise, I think I'm pretty well up to standards. That is, after removing the unneeded <Page> groupings--this is something I noticed on government documents and figured it was helpful to use.


            The documents I normally create are more or less properly tagged after exporting, aside from the stray tags that need artifacting.


            I have another question, but I am at home right now and don't have access to Acrobat Pro to better explain what I'm asking, but I'll try for now: Sometimes a tag will contain several <span> or <p> elements inside for a single paragraph. Is this fine? For instance, InDesign seems to do this with a block of text.


            If you had a link to an example of a more complicated PDF that is compliant with the standard that would be extraordinarily helpful.

            • 3. Re: Questions about Tags and Reading Order

              Hi- If you are ordering your tags first, any reading order changes need to be done in the CONTENT panel, not the order panel. When you change order in the ORDER panel it makes changes in the tag order- this does not occur if you do it in content panel.


              Usually it's quicker to do first clean up of empty tags, etc., and order in ORDER panel, then go back and organize tags as needed.