      I'm trying to save a PDF as text in Reader 9.4.5 under windows and I've run into a formatting problem. Line wrapping within a paragraph seems to be broken, as soft returns are not being converted to spaces. The last word in each line runs into the first word in the next line without either a space or a line break.


      A potential workaround would seem to be to copy and paste the full text of  the file into notepad or such, but that converts both soft returns and hard returns into line breaks which is a  formatting nightmare. Using "Paste special" unformatted doesn't help. Both of the PDF to text converters I've tried do the same thing. One of them, SomePDF, got my hopes up when it had  an option to turn off text formatting, but that stripped out all  paragraph breaks as well as the line wrap. Doh!


      On a narrower screen than the file's formatted line width (which is the whole purpose of my converting to text) it's actually less annoying to read the file with the words run together than with the broken lines within paragraphs.


      The PDF in question was created in either Acrobat PDFMaker 5.0 for Word (the "Application" field in PDF Properties) or Acrobat Distiller 5.0 (Windows) (the "PDF Producer" field in PDF Properties), unless those are confusingly different names for the same application.


      I've searched through Reader's preferences without finding anything that  might change this behaviour. Am I missing something? Or is this a bug in  either Reader or the program which created the file? And if it's a bug  in the (Adobe) application which created the file, can Reader be fixed  to handle the glitch gracefully? Either way, will the Adobe folks read this forum or is there some way I need to submit a bug report?


      Can anyone shed some light on this problem? And in the meantime, does anybody know of a PDF to text converter that properly converts soft returns to spaces and hard returns to line breaks?


          Acrobat PDFMaker 5.0 for Word & Acrobat Distiller 5.0 (Windows) - installed with Acrobat 5.x.
          PDFMaker provided/provides the means of obtaining a PDF with the "interactivity" (links, bookmarks, etc.) .

          Printing to the Adobe PDF printer just provides page content in a PDF.


          Under the Description tab of the Document Properties dialog look to see if "Tagged PDF:" is "Yes".
          I'm guessing it is "No".


          A Tagged PDF's structure tree (viewed via the Tags panel) orchestrates the logical hierarchy of content and page content within the PDF.
          Of course a malformed structure tree can preclude any advantages otherwise provided by a Tagged PDF.

          For the untagged PDF, the PDF page content can be thought of as objects painted to specified locations on a PDF page.
          The sequence/order of placement is not related to the content flow but rather to how the application(s) in play are coded.
          No concept of layout / format in the word processing or page layout application sense.


          Basically, the painted PDF page content does not "hold" any text formatting.
          That's what's slick about a well-formed Tagged PDF; it does "hold" the information.


            You guess correctly, under "Tagged PDF:" it says "No."


            So, how does that help me?


                Acrobat Pro provides the means for tagging a PDF that is not tagged.

                This can be done manually or Acrobat can be used to get a programmatic "best guess".

                Provide you have an understanding of Tagged PDF "manual" gets you spot on.

                "Best guess" can be problematic with complex content but can establish a starting point for the requisite manual clean up.


                Regardless, Adobe Reader (any version) only provides the ability to view/read (unless the PDF has been "enabled" by Acrobat Pro for specific activities).

                However, tagging a PDF or working with PDF Tags is not something Adobe Reader can do.



                  Claudio González Most Valuable Participant

