7 Replies Latest reply: May 26, 2009 10:32 PM by D.I.V. RSS

    Sperren / Sperrsatz (increased letter-spacing → emphasis)

    D.I.V. Community Member

      It is possible to add emphasis to a word in character-based languages by increasing the letter spacing.  It is especially common in German, but can be used in Swedish, English, et cetera.

      The simplest production is to insert an extra  s p a c e  between the letters, like so.  In fact, typographically the desired spacing will generally be less than a normal word-space (i.e. space between words in unemphasised text).

      In HTML this can be approximated using commands such as

      Dies ist <span style="letter-spacing:0.125em">falsc</span>h gesperrt, so ist e<span style="letter-spacing:0.125em">s richti</span>g.

      producing this output:

      "Dies ist falsch gesperrt, so ist es richtig."

      (Taken from http://de.wikipedia.org/wiki/Diskussion:Sperrsatz#CSS.)

      In Microsoft Word the nicest way is to adjust the character spacing, accessed through the menus via Format > Font > Character Spacing > Spacing.  One then chooses the spacing to be Expanded by the default of one point, or by some other decimal amount. Just to make it clear:  there is absolutely no space character inserted between the letters when this technique is used;  it is just that the existing letters are permitted to occupy more room.

       

      Unfortunately, creating a PDF out of this leads to poor character interpretation, although visually the output may be perfect.

      For the example, if the word character were emphasised, then copying and pasting the text from the PDF may yield in an odd result such as "ch ar a c t er" or "char a cter" or "c har ac ter".

       

      Is there a resolution to this problem?

       

      —DIV

       

      P.S. PDF/A-1a might solve the problem (intuitively reasonable), but my trial version of Acrobat 9 (6 days left to run) is suddenly complaining that it needs to be activated :-P

        • 1. Problems
          D.I.V. Community Member

          Just to be clear, the main problems resulting from this incorrect interpretation of the text are:

          * copying and pasting from a PDF

          and

          * searching a PDF for a certain character string (word or phrase).

          —DIV

          • 2. Character spacing
            D.I.V. Community Member

            My trial version of Acrobat 9 Pro has decided to work again....

             

            Here is an example.

            The attached PDF's were generated from MS Word using PDFMaker.  (Same result with print-to-PDF, except that PDF/A-1a not available then).

             

            Although the outputs are visually identical, PDF/A-1a is the only format that generates the correct underlying text.  Wierder still, Acrobat wrongly considers that the text and appearance are the same in all of the PDFs (through Document > Compare Documents).

            PDF/A-1b:2005 (RGB) and Standard output Conversion Settings (from joboptions file), and presumably any other, all give an incorrect result as the character spacing is increased.

            A copy-and-paste operation on each file generates the following partially garbled text.  Searching is thus impeded.

             

            STANDARD

             

            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino roman).
            This t e x t e l e m e n t is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino roman).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino italic).
            This t e x t element is emphasised by 1.5‐point expansion (in 11‐point Palatino italic).
            This t e x t e l e m ent is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino italic).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino bold).
            This t e x t e l e m e n t is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino bold).
            This t e x t e l e m e n t is emphasised by 2-point e x p a n sion (in 11-
            point Courier New roman).
            This t e x t e l eme n t is emphasised by 2-point e x p a n s i o n (in 11-point
            Verdana roman).

             


            PDF/A-1b

             

            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino roman).
            This t e x t e l e m e n t is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino roman).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino italic).
            This t e x t element is emphasised by 1.5‐point expansion (in 11‐point Palatino italic).
            This t e x t e l e m ent is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino italic).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino bold).
            This t e x t e l e m e n t is emphasised by 2‐point e x p a n s i o n (in 11‐point Palatino bold).
            This t e x t e l e m e n t is emphasised by 2-point e x p a n sion (in 11-
            point Courier New roman).
            This t e x t e l eme n t is emphasised by 2-point e x p a n s i o n (in 11-point
            Verdana roman).

             


            PDF/A-1a

             

            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 2‐point expansion (in 11‐point Palatino roman).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 2‐point expansion (in 11‐point Palatino italic).
            This text element is emphasised by 0.5‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 1.5‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 2‐point expansion (in 11‐point Palatino bold).
            This text element is emphasised by 2-point expansion (in 11-point Courier New roman).
            This text element is emphasised by 2-point expansion (in 11-point Verdana roman).

             

            —DIV

            • 3. Tags
              D.I.V. Community Member

              Apparently the key component to correctly interpret the graphic objects as meaningful text are tags.  These are created as an integral component of PDF/A-1a -compliant file creation — although for a longer document there is an option to skip them (and only achieve PDF/A-1b compliance).  As you would expect, creating the PDF/A-1a -compliant file and subsequently removing all the tags (using Acrobat 9 Pro) causes all that useful information to be lost, so that the expanded words are no longer correctly interpreted.

              Importantly, it is not possible to achieve [correct] PDF/A-1a compliance 'after the fact' (unlike for PDF/A-1b compliance) by automatic software processing.  This is true of both Adobe Acrobat and Callas Software's pdfaPilot.  [It would be easy to trivially formalise an incorrect interpretation using automatic software parsing;  alternatively, in theory a text could be carefully worked on by a Real Live Human, manually inserting every tag correctly, but this would quickly become very onerous for typical documents.]

               

              —DIV

              • 4. Ultimate failure
                D.I.V. Community Member

                Unfortunately, with the long, complex document I am really doing all this for, I found that attempting to creating a PDF/A-1a-compliant PDF resulted in an inescapable loss of cross-referencing functionality, and a corruption of table-of-contents (etc.) linking — namely, all linking to a certain (incorrect) page, or no link at all.  Apparently the feature is still not 'bullet-proof' with Acrobat 9 Professional, Microsoft Word 2003 & Windows XP.  (I don't hold out much hope for later versions of the MS products, as anecdotally I've heard that there is decreasing co-operation between MS and Adobe in more recent times.)

                 

                —DIV

                • 5. Re: Ultimate failure
                  S.D.A. Community Member

                  Are you making these changes to the PDF ? If so that's the wrong place. Do your typesetting in the authoring program. Don't attempt to do this in the PDF afterwards -- Especially not if you want to keep your sanity. Adobe InDesign would be the application I would use for this kind of precise typsetting.

                  • 6. Typesetting in SOURCE
                    D.I.V. Community Member

                    S.D.A., as I was trying to explain above, the typesetting was only done in the source software application, which in this case was Microsoft Word.  Perhaps Adobe InDesign can do this sort of typesetting too.  Likewise, I demonstrated that it can be done with HTML/CSS code.

                    The problem is only that the PDF that is created from (probably) any of those source applications (and certainly from Word) does not correctly store the underlying text, even though the graphical representation for screen or printing is correct.  To be specific:  rather than recognising the emphasised words as consisting of contiguous characters with increased character spacing, it incorrectly interprets the emphasised words as comprising a set of 'normally-spaced' characters with extra spaces inserted between (some of) them.  Perhaps this is some sort of internal 'reverse-engineering' built into Adobe Acrobat, or perhaps it is a PostScript issue:  I don't know what the precise mechanism is, I can only speculate.

                    The only exception to this is when a PDF/A-1a-compliant document is created, where individual words are correctly tagged, irrespective of the character spacing.  As I mentioned, although this setting allows the creation of a correctly-tagged PDF from the source document, in my case it also caused corruption of the cross-referencing feature!  Hence, while I gained in one way, I lost in another (more important) way by using this setting.

                     

                    The only manipulation of the document with Adobe Acrobat that I have described & presented is to demonstrate that

                    • removing the tags generated for PDF/A-1a-compliance causes the correct interpretation of the underlying text — relevant for text searching, and for copy–paste operations — to be lost, and
                    • if the tags are not present (either never created, or else created but then removed), then it is impossible to restore them 'automatically' — the only way to restore them without using the source file would be to add them manually within Adobe Acrobat  [I think this is the part you misunderstood].


                    Hope this clears things up.

                     

                    —DIV

                    • 7. Tags
                      D.I.V. Community Member

                      It turns out that it isn't necessary to create a PDF with the PDF/A-1a settings in order to get the text to make sense when copying-and-pasting or when searching.  PDFMaker has an option to "Enable accessibility and reflow with Tagged PDF", and this seems to be sufficient to solve the word-identification problem (at least with Acrobat 7 Professional v7.1.0).

                       

                      I had previously had trouble with this 'feature' where I was unable to find strings that wrapped to the next line (on the same page), and therefore I had always disabled it.  For example, in the text

                      Here is some text that wraps around.

                      with the aforesaid option enabled I wasn't able to successfully search for a phrase like "some text that wraps".

                       

                      So far I have only tested this option with a small test file;  I'm not sure whether the combination 'PDF/A-1b with Tagging' will result in the same corruption of cross-references in the PDF generated from my large Word source document as I have mentioned previously occurred when I used the 'PDF/A-1a' option (which inherently includes Tagging).

                       

                      —DIV