9 Replies Latest reply on Apr 13, 2013 7:40 PM by a C student

    Tagging PDF introduces text selection issues in Reader, why?

    3MXO

      I can't understand why generating a tagged PDF with MS word 2010 can result in files with text selection issues. When I export tagged PDF files with MS word 2010 I am having an issue where some part of the text can't be selected properly, as trying to copy the text to notepad will result in the selected text to be copied multiple times. Searching such text in the PDF does also not work properly. In addition, if in such text there is a link, the link will not work, which is why this issue is very annoying. Can someone expain me why the Tags can influence text selection/search in Reader and if this is a bug that will be fixed?

       

      Update. I found out this issue is not limited to files tagged with word, but the same problem can happen with files tagged with Adobe Acrobat as well.

       

      I can post a sample file.

       

      Thanks.

        • 1. Re: Tagging PDF introduces text selection issues in Reader, why?
          a C student Level 3

          Hi 3MXO,

                                               

          Are you sure the problem is related to tagging? If you create a PDF without tags does the problem go away? If so, what version of Acrobat are you using on what platform? I have encountered this problem with legacy PDFs created by others, so I don’t know what caused it. My work-around has been to use the alt text property to correct the text. Link spelling should also be corrected in the <Link> structure element. Hope this helps.

           

          a ‘C’ student

          • 2. Re: Tagging PDF introduces text selection issues in Reader, why?
            3MXO Level 1

            Hi, thanks a lot for your reply. Yes, the problem is indeed related to tagging. I am using Adobe Acrobat 11 on Windows 7. After some tests, I was able to understand what causes the issue. It happens when you add text in the "Actual Text" field in the tag properties. Here is a simple way to reproduce the issue. Open a Word document and write two lines of text separated by some hard returns, for example you can write:

             

            First line of text

             

             

            Second line of text

             

            Then you convert the file to PDF using ADOBE ACROBAT, which by default does not add tags. After the file is converted to PDF, you open it and add the tags using the "Add Tags to Document" option in the "Accessibility" menu. Now, go in the tag trees, go in the <p> tag that contains the text, go to properties, and add some text in the "Actual Text" field. This will cause the text selection issue. Here is a sample PDF file that I created following these steps.

             

            http://dfiles.eu/files/wsg8rqcww

            • 3. Re: Tagging PDF introduces text selection issues in Reader, why?
              a C student Level 3

              Good to hear you solved the problem!

              • 4. Re: Tagging PDF introduces text selection issues in Reader, why?
                3MXO Level 1

                Well, I was only able to understand what causes the issue. However, not using the "actual text” property because it can cause text selection problems is not a real solution, as you may need to use this feature in your documents. I hope that Adobe will fix this problem in a future update.

                • 5. Re: Tagging PDF introduces text selection issues in Reader, why?
                  Test Screen Name Most Valuable Participant

                  Surely this PDF is wrong. ActualText is for use when normal text extraction techniques will not produce the same results that would be perceived by a person with vision. In this case the ActualText of "First line of text" is different from what a person with vision would see "First line of text Second line of text". So, you are not using the tags correctly.

                   

                  It does raise an interesting question, since ActualText should be controlling text extraction, and probably selection too. All a PDF viewer has is a piece of extractable text (ActualText), and a collection of words on the page which the ActualText replaces. Given that it is impossible to break down the specific location of individual characters in the ActualText, so it is impossible to highlight them individually. Obviously the tag could be ignored, but then it is impossible to discover which elements of the ActualText were selected, so copy will fail. I suspect ActualText should be used at the smallest possible elemental level, perhaps for each single character that cannot be represented.

                  1 person found this helpful
                  • 6. Re: Tagging PDF introduces text selection issues in Reader, why?
                    3MXO Level 1

                    Thank you very much for your reply. If I got it right, you are saying that the tag was not used correctly. So, are you saying that in order to use the tag correctly, you should put "First line of text Second line of text" in the "actual text" property of the <P> tag containing the text?

                     

                    bug.JPG

                     

                    Because if this is the case, the problem is that any text you put in the "actual text" field will cause the text selection issue. So, even if you use the tag correctly, the issue will still be here.

                     

                    You can copy the text correctly if you use the "copy with formatting" option in Adobe Acrobat though.

                     

                    Now, it is interesting that this problem only happens with Adobe Reader, as if you open the file with any other PDF viewer (such as Foxit, Sumatra, Nitro PDF, and all the other ones I tested) there are no text selection issues. All text is perfectly selectable and searchable.

                    • 7. Re: Tagging PDF introduces text selection issues in Reader, why?
                      Test Screen Name Most Valuable Participant

                      I don't know about those viewers, but I imagine many of them ignore tags, so they won't be affected by the incorrect use of tags...

                       

                      Have you read the specification of how that tag is supposed to be used? Unless I hear otherwise, I'd repeat that the correct use is to replace the smallest possible unit of text: preferably a single character. Certainly never whole sentences. What are you trying to achieve with it?

                      • 8. Re: Tagging PDF introduces text selection issues in Reader, why?
                        3MXO Level 1

                        Thanks for your help. Well, the problem that I am having is that no matter if I use the tag correctly, I still have this text selection issue in my documents. Even when using the tag to replace a single character. I don't think this text selection issue is related to the correct use of the tag itself but it's probably a bug, and thus I hope it will be fixed in a future update.

                         


                        • 9. Re: Tagging PDF introduces text selection issues in Reader, why?
                          a C student Level 3

                          Recreated the problem. Bizarre. I’m with 3MXO - Adobe please fix this bug. While you are at it could I please have a way to set internal links like TOC entries to “Inherit Zoom” by default?

                           

                          a 'C' student