3 Replies Latest reply: Nov 8, 2014 10:43 AM by CtDave RSS

    Extracting Tabular data from a column

    StrongBeaver Community Member

      I have a PDF document that has two columns, when selecting one column to copy, the other column gets selected as well, can I prevent this, or must I manually delete the contents of the second column when pasting ?

        • 1. Re: Extracting Tabular data from a column
          CtDave CommunityMVP

          Hi,

          Welcome to the reality of PDF!  The objects painted to the PDF page are just that -- its a "paint by numbers" if you will.

          (See ISO 3200-1 for a full discussion - good for reading while sitting at the fire place this winter while the snow collects outside.)

           

          At its core PDF has no "rows", "columns", "tables", "styles", etc. - etc.

           

          Back in the day this let to the advent of "logical hierarchy" and "tagged" PDF.

           

          With a well-formed "tagged" PDF (e.g., ISO 14289-1, PDF/UA-1 compliant) provides two functionalities.

          That's the Accessible PDF and the PDF that facilitates export of the PDF page content.

           

          A PDF that is not a well-formed tagged PDF or even a tagged PDF is processed by Acrobat and the online subscription services that provide "export"  in a manner to assess what might be the content's tagged structure.

          It is a guess based on certain assumptions.

          Because of this the export may be garbage or may be good.

          "GIGO"

          The "under-the-hood" build of an untagged PDF dictates if it is garbage or good.

          Nothing you can do there.

           

          Be well...

          • 2. Re: Extracting Tabular data from a column
            StrongBeaver Community Member

            Sorry, I didn't quite understand your response ?

            • 3. Re: Extracting Tabular data from a column
              CtDave CommunityMVP

              PDF has not "columns" (or rows, styling, etc.). Basically a PDF is content objects painted to specified locations on the PDF page. You characters and, sometimes, lines.

              Can this be improved upon? Yes, that's what I'd alluded to. This would start in the authoring file using an appropriate authoring application.

               

              Be well...