5 Replies Latest reply on Jul 24, 2011 7:28 PM by Claudio González

    "Save as text" problem in Reader 9.4.5




      I'm trying to save a PDF as text in Reader 9.4.5 under windows and I've run into a formatting problem. Line wrapping within a paragraph seems to be broken, as soft returns are not being converted to spaces. The last word in each line runs into the first word in the next line without either a space or a line break.


      A potential workaround would seem to be to copy and paste the full text of  the file into notepad or such, but that converts both soft returns and hard returns into line breaks which is a  formatting nightmare. Using "Paste special" unformatted doesn't help. Both of the PDF to text converters I've tried do the same thing. One of them, SomePDF, got my hopes up when it had  an option to turn off text formatting, but that stripped out all  paragraph breaks as well as the line wrap. Doh!


      On a narrower screen than the file's formatted line width (which is the whole purpose of my converting to text) it's actually less annoying to read the file with the words run together than with the broken lines within paragraphs.


      The PDF in question was created in either Acrobat PDFMaker 5.0 for Word (the "Application" field in PDF Properties) or Acrobat Distiller 5.0 (Windows) (the "PDF Producer" field in PDF Properties), unless those are confusingly different names for the same application.


      I've searched through Reader's preferences without finding anything that  might change this behaviour. Am I missing something? Or is this a bug in  either Reader or the program which created the file? And if it's a bug  in the (Adobe) application which created the file, can Reader be fixed  to handle the glitch gracefully? Either way, will the Adobe folks read this forum or is there some way I need to submit a bug report?


      Can anyone shed some light on this problem? And in the meantime, does anybody know of a PDF to text converter that properly converts soft returns to spaces and hard returns to line breaks?


      Thanks, all!



        • 1. Re: "Save as text" problem in Reader 9.4.5
          CtDave Level 5

          Acrobat PDFMaker 5.0 for Word & Acrobat Distiller 5.0 (Windows) - installed with Acrobat 5.x.
          PDFMaker provided/provides the means of obtaining a PDF with the "interactivity" (links, bookmarks, etc.) .

          Printing to the Adobe PDF printer just provides page content in a PDF.


          Under the Description tab of the Document Properties dialog look to see if "Tagged PDF:" is "Yes".
          I'm guessing it is "No".


          A Tagged PDF's structure tree (viewed via the Tags panel) orchestrates the logical hierarchy of content and page content within the PDF.
          Of course a malformed structure tree can preclude any advantages otherwise provided by a Tagged PDF.

          For the untagged PDF, the PDF page content can be thought of as objects painted to specified locations on a PDF page.
          The sequence/order of placement is not related to the content flow but rather to how the application(s) in play are coded.
          No concept of layout / format in the word processing or page layout application sense.


          Basically, the painted PDF page content does not "hold" any text formatting.
          That's what's slick about a well-formed Tagged PDF; it does "hold" the information.


          Be well...

          1 person found this helpful
          • 2. Re: "Save as text" problem in Reader 9.4.5
            MoTLD Level 1

            You guess correctly, under "Tagged PDF:" it says "No."


            So, how does that help me?


            Thanks for the quick and informative reply, but my problem still remains...



            • 3. Re: "Save as text" problem in Reader 9.4.5
              MoTLD Level 1

              I could swear I saw an "edit" option a moment ago. Can I not edit a post once it's been replied to? That's irritating!


              Anyway, I wanted to remove the "in Reader 9.4.5" part of the title, 'cause I just upgraded to "X" and the problem persists...




              PS - Anybody else think "X" looks and feels suspiciously similar to 9? Does this release really rate a new major version number? Way to go the Microsoft route of confusing and unnecessary revisioning, Adobe. Oh, and why doesn't Firefox's inline spell checker work in this text box? *grumble*

              • 4. Re: "Save as text" problem in Reader 9.4.5
                CtDave Level 5

                Acrobat Pro provides the means for tagging a PDF that is not tagged.

                This can be done manually or Acrobat can be used to get a programmatic "best guess".

                Provide you have an understanding of Tagged PDF "manual" gets you spot on.

                "Best guess" can be problematic with complex content but can establish a starting point for the requisite manual clean up.


                Regardless, Adobe Reader (any version) only provides the ability to view/read (unless the PDF has been "enabled" by Acrobat Pro for specific activities).

                However, tagging a PDF or working with PDF Tags is not something Adobe Reader can do.



                Spell check - while the post a reply dialog is open there is an "abc (with a check mark) in the upper right. Click it to spell check your pending post.



                "Edit" - I believe this is only available during one's current session.  Or, perhaps, while still "in" the thread. (?)

                -- well, you can leave the thread, return and use "Edit" - so, current session may be it.


                Be well...


                Message was edited by: CtDave


                Message was edited by: CtDave

                • 5. Re: "Save as text" problem in Reader 9.4.5
                  Claudio González Most Valuable Participant

                  You can only edit a message if it has not been anwered. This is to prevent confusions such as making someone's post look stupid because the one been answered to is changed afterwards.