5 Replies Latest reply on Jan 10, 2012 7:49 AM by BCM-MSY

    Tagging and formatting for proper reflow

    ghumdinger Level 1

      It seems to be a general rule of thumb that how well a PDF reflows depends on how well it is tagged.


      Forgive me if it's blatantly obvious, but I can't find anything relating to tagging other than accessibility tagging. That seems to be more about ordering blocks of text in Acrobat so that handicapped users can listened to a properly ordered voice read out of the document.


      Is this the tagging that affects reflow?


      I suppose the reflow depends on formatting like paragraph marks, breaks, and other formatting characters (in Word speak). Is there a way to view all these - an Acrobat version of Word's "show formatting button", and touch them up?


      This is something I've been confused about for a time.

        • 1. Re: Tagging and formatting for proper reflow
          Bill@VT Level 7

          The tagging you are talking about is the tagging put in the PDF by PDF Maker that indicates blocks of text have a certain function in the original document. You may be able to add the reflow, but only after a lot of time figuring out how to do tagging. The idea of reflow in a PDF should be discarded. If you need to do editing that needs to reflow, go back to the original document and edit that one, not the PDF. Editing a PDF is somewhat of a luck issue, and skill might be useful. PDFs should be thought of as electronic paper and the editing process as electronic whiteout.

          • 2. Re: Tagging and formatting for proper reflow
            CtDave Level 6

            Bill -
            Discarding reflow? For you, that may be appropriate to your needs/wants.
            However, as Tagged PDF is integral to making use of PDF logical structure it is essential if one's output PDF is to be effectively used by users of mobile devices, to users who have visual impairments and make use of AT, or for providers of PDFs who desire to support adequate export of PDF content,format, layout, & font information to an application such as MS Word.

            ~~~ ~~~

            ghumdinger -

            While Tagged PDF, from an Accessibility perspective, is well discussed in related Adobe documents, the PDF files span discussion from Acrobat 5 (Full release with Accessibility feature installed from OSM's "extra stuff" directory) to Acrobat 9.
            Perhaps the best was documentation published when Acrobat 7 was "new" (Adobe's Greg Pisocky's "how-to").
            However, all these are no longer readily available out in blue waters of web space.
            Too, bad because having them makes it easier to "get it" vis-a-vis Accessible PDFs.

            But, to your question.
            The sole definitive sources of "Tagged PDF" start with Adobe's PDF References and ends with the ISO Standard.
            It is Tagged PDF that provides:
            --| Reflow functionality
            --| Export to other applications with format-layout-font data-etc.
            --| Copy-Paste to other applications with some fundamental retention of content format.
            and, of course
            --| Accessibility

            Tagged PDF was introduced with PDF Version 1.6 (Acrobat 5.x)
            Each PDF Reference, from Version 1.6 to 1.7 provided the discussion and description of Tagged PDF.
            (Although there are a few documents that have some relevance tucked away in the Acrobat SDK(s).

            PDF became an ISO Standard mid-year of 2008.
            So, from then on the foundation document for anything PDF became ISO 32000 (currently "ISO 32000-1:2008".
            The Adobe extensions to the ISO standard describe fundamentals to what Acrobat 9.x adds to PDF.

            A free Adobe version of the ISO standard is available.
            See Leonard Rosenthol's Blog post at Acrobat User Community to learn how to obtain the Adobe version of the ISO 32000 Standard.


            As to mastering content in source application authoring files; implementation of an appropriate logical hierarchy is and is not "obvious".
            Well-formed tagged output PDF requires well-formed authoring files mastered in an application that has adequate Tag management.
            Currently, there are three choices that are "adequate".
            --| Adobe FrameMaker
            --| Adobe InDesign
            --| MS Word
            (n.b., It is my understanding that the Open Office community  is making strides to provide adequate, up front tag management & I suspect the Corel-Nuance collaboration may put WordPerfect in "the circle".)


            From MS Word, in Office 2007, SP2 or better, Microsoft provides a Save as to PDF that incorporates tag management; or, with Acrobat installed there is Adobe PDFMaker.


            Two critical characteristics of an authoring file:
            --| For Headings (chapter, section, etc.) - Only the built-in Headings are used.
            --| Tables must be configured such that they comply to the discussion the <Table> PDF mark up element.


            There are more. However the Headings and Tables issues are often what is overlooked by content authors.
            Typically, to "get it right" in the authoring file a major "make over" of existing templates is needed.
            Often, what is used came from the "make paper" days &, with some tweaks, became the "make PDF".
            Often, template authors, content authors, management, etc. are unaware of what must be done in the authoring environment in order to reach an end game of properly Tagged PDF.
            With that said, "getting it" requires making a study of what was in the PDF References and, now, the ISO Standard.
            The "how-to" comes from the "Accessibility" documents I alluded to above and a fair dinkum of "theory-to-practice" application.


            In summary, it starts with the authoring file and the content mastering.
            Done right (typically it isn't) with an application that has adequate tag management (often, it is not) the output PDF is a well-formed Tagged PDF -- that still will require post-processing by someone of competence via Acrobat Professional.
            The resultant PDF will then fulfill the Tagged PDF facilities discussed and described in the ISO 32000 Standard.
            With this you have reflow for mobile devices, accessibility for users of Assistive Technology applications and really rather good "export" of content/layout/format.
            If a PDF does not "cut it" with respect to this then one or both of these are the "root cause":
            --| Authoring application is not up to the task
            --| Content owner(s)/author(s) are not up to the task.


            Be well...


            Message was edited by: CtDave

            1 person found this helpful
            • 3. Re: Tagging and formatting for proper reflow
              ghumdinger Level 1



              Thank you for your learned reply. I was after a general broad based understanding of how it works without delving into the technicalities and losing myself, and your explanation was just that.


              So it seems Acrobat's tagging capabilities is just limited to the "Touchup reading order tool" and "add tags to doc" which would be inadequate for ensuring compliance with the ISO standard for PDF reflow.


              There is another kind of source format which I'm very curious about - HTML. I do lots of conversion of web articles to PDF using the Adobe virtual printer. Does Acrobat actually inteprets and converts formatting information in the HTML standard when creating the PDF file?


              My interest in reflow lies in displaying them in an e-book reader device and Windows Mobile. I have a fair amount of readings in PDF to get through, some scanned, some printed from webpages; and the glare of an LCD screen was becoming an issue.


              I did some experimenting as well with PDFs from different sources and how they display on the desktop and mobile, in reflow mode.


              1. Scanned PDFs OCRed with Acrobat Searchable Image/ Image Exact
              2. Scanned PDFs OCRed with Acrobat ClearScan
              3. Scanned PDFs OCRed with my scanner's bundled ABBYY.
              4. Web articles printed to PDF


              What was surprising was that the huge difference in reflow quality for scanned PDFs depending on how they were OCRed. I did not realize how lousy Acrobat's OCR was in tagging the document.


              PDFs OCRed with Acrobat's Searchable image reflows alright on the desktop but but cannot reflow on the mobile.


              PDFs OCRed with Acrobat's Clearscan were reflows badly (funny that clearscan is worse than searchable image on Adobe Reader) both on the desktop as well as mobile.

              The SAME pdf OCRed using ABBYY reflowed well for both.


              Web articles printed to PDF reflowed well for both.


              By any chance, are you familiar with any e-book reader that has good support for annotations? After days of researching, I gave up on finding an e-book reader that does annotation to PDF natively. Being able to read and annotate is very important for me. Being able to sync those annotations to the PC is also important as I rely a lot on the PDF ifilter in WDS to do in-content search and indexing..


              I'm however, starting to seriously considering migrating to another format like the mobi PRC as the format of choice for reading and annotation. To be honest with myself, PDF's annotation support is not very good either. It seems more catered for editing review and collaboration. However, I've yet to find a suitable reader that syncs the annotation to desktop.


              Be well too.



              • 4. Re: Tagging and formatting for proper reflow
                phiNDsum1 Level 1

                CS4 - I have implemented a tag structure in InDesign CS4 and initiated the PDF "use document structure" in the page properties of the exported PDF but the PDF document will not follow the reading order specified in the InDesign structure panel. 


                CS5.5 - I upgraded to InDesign CS5.5 as there has been great mention of the "leaps" forward in creating accessible documents. I have created 2 InDesign documents. 


                1. I stripped the Structure out of the document and created articles in the new "Articles Pane". I have followed the instructions for creating accessible PDF documents (selected "use for reading order" in the "Articles" option dialogue, completed PDF touch-ups {turned on "use document structure" in page properties, etc}) but still - PDF does not follow reading order created with InDesign. 


                2. I left the "structure" in tact and also used the "Articles Pane" for all content in the InDesign CS5.5 document. Exported to PDF. Followed touch-up sequence - PDF still will not use structure order created in InDesign document. 


                Also running into errors with passing accessibility tests.  Example: Vector graphic (with text wrap) has Alt text appended to it through InDesign "Object Export Options", it is present in the PDF as well (by mousing over the image and in the tag properties) but the accessibility report (PDF, 508 and WC3) lists the image with no alt text. 


                Another InDesign CS5.5  error when creating accessible documents occurs with the new "image anchor" process (which is an incredible leap forward from the old CS4 way). The problem - text wrapped images lose their ability to margin the text away from the top of the image when the "image anchor" is set. 


                Accessibility is a major area of development for our firm right now - and even though there have been advances, it seems we are held up in our work flow process - which is impeding our efforts to create accessible documents for our eager clients (Government Agencies, Municipalities and Not-For-Profit). 


                PLEASE SEND HELP - lol - but really - much needed - much appreciated!

                • 5. Re: Tagging and formatting for proper reflow
                  BCM-MSY Level 1



                  I found this solution from another User, tested it, and it works for reading order.


                  In Indesign, reorder ALL the objects on page using send to back, starting with the last object on page you wish read, continuing until the object you want read first in acrobat is sent to the back last in InDesign. The result is that the Reading order is correct in Acrobat. Obviously, any object additions to a page means re-ordering the whole page. Changing copy doesn't affect this. Also, the order of objects on a page has no relationship to the page structure. It appears that the object reading order is as objects are added to page. Ist added (and furthest back), 1st read, last added (nearest front), last read. Shuffling the objects in the Document Structure pane has no effect on this order (and vice-versa).