4 Replies Latest reply on May 16, 2014 1:44 AM by diggeralmu

    Zero width space / discretionary line break exported PDF issue

    diggeralmu

      Greetings,

       

      I'm currently using Indesign CS6 and there are cases where i have to insert a discretionary line break / zero width space (U+200B) inside a word  (e.g for long numbers or long text which cannot fit inside a cell width).

       

      The problem is when the PDF is exported, and I open it in Adobe Reader / Acrobat Reader, when I try to select a word in the PDF and copy it, there are spaces where the discretionary line break is inserted (regardless if the line break was needed or not).

      Searching for the word which the exact word also fails. (need to add spaces where the discretionary line breaks are located).

      -----

      eg. Legislation (I added a zero width space for every 3 characters)

       

      Legisl

      ation

       

      copied word is "Leg isl ati on".

      search keyword should be  "Leg isl ati on" so pdf reader can find the text. (Legislation does not work).

      -----

      It is weird because discretionary / soft hyphen works great (both in copying and searching) except from the fact that it has a hyphen character shown in the displayed text (prefer if it would look the same as in MSWord)

       

      Is there something missing in my setup. or is this is a bug / issue with indesign for zero-width space?

      Is there any other character (except from above 2 characters)  that i can use in Indesign that when I export the PDF will allow me to search the word in the PDF using the word itself?

       

      Thanks

        • 1. Re: Zero width space / discretionary line break exported PDF issue
          Peter Spier Most Valuable Participant (Moderator)

          I'm not sure what you expect here. Zero-width or not, it's still a space character.

          • 2. Re: Zero width space / discretionary line break exported PDF issue
            diggeralmu Level 1

            Please see attached samples:

             

            Soft hyphen

            s000.tinyupload.com/index.php?file_id=82726988057059548862

             

            Zero width space

            s000.tinyupload.com/index.php?file_id=66010187485652230796

             

            You can open them in adobe reader / acrobat  (seems other readers fail to exclude the characters even for soft hyphen).

             

            Notice that in the soft hyphen sample, when you search for "Legislation", the whole text Legis-lat-ion in the cell was found and highlighted. (the hyphens were excluded from the actual text).

            You can also select the text and copy it and only "Legislation" will be copied.

             

            But in the Zero width space sample, when you search for "Legislation", the text in the cell will not be selected/includedi n the search result. You must add actual space (u+0020) character  where the hidden zero width spaces so that the text in the cell is highlighted. When selecting and copying, spaces will also appear within "Legislation".

             

            It seems weird that they have different behaviors when they are both discretionary characters. (Though I'm not sure if the problem is in Indesign or PDF viewer, though it was reported to me that exporting MSword word wrapped text to PDF using save as PDF in Word 2007 / 2010 will follow the exclude behavior, though I have yet to verify it as I have no MS word 2007/2010)

             

            Though I think indesign is somewhat treating the zero width space as normal space when exported to PDF. (such that adobe reader / acrobat will extract it as such).

             

            Thanks

            • 3. Re: Zero width space / discretionary line break exported PDF issue
              Peter Spier Most Valuable Participant (Moderator)

              The zero-with space is used as a discretionary character, but it really isn't.

              • 4. Re: Zero width space / discretionary line break exported PDF issue
                diggeralmu Level 1

                Noted on that.


                Though I have checked the actual pdf content stream and it seems Indesign is actually inserting 000A (space) where the 200B (zero-width space) characters are in the exported PDF file. Which is clearly wrong as I expect the PDF text content stream to have the same text content as the Indesign source file as much as possible.

                 

                Can anyone confirm if this is an Indesign bug? (I have also submitted an inquiry using the wishlist form)

                 

                In comparison with MS word 2007/2010, if you place zero width spaces, they handle it by not placing the zero width spaces in the PDF content stream and word break is done via text positioning/text showing operators present in PDF specifications. This allows copying / searching with the whole word in the MS Word PDF exported document.

                Though it is also wrong as they are removing characters, this removal may be acceptable as zero width spaces are supposedly invisible characters.