14 Replies Latest reply on Mar 26, 2017 6:28 AM by Laubender

    Import Word file loses non-breaking hyphens

    guardaroba Level 1

      I've had problems importing some Word files created by professional translators. They're full of non-breaking spaces and non-breaking hyphens, and when I import the files into InDesign the non-breaking hyphens are completely lost (they aren't just turned into discretionary hyphens, etc.).

       

      I've tried creating a new Word file, entering some text, adding a non-breaking hyphen, and then importing that file into InDesign, and the same problem occurs.

       

      Could somebody please try to replicate this?

       

      I'm using ID CS6 (version 8.0.1) and Word 2008 for Mac (version 12.2.3).

       

      Thanks!

        • 1. Re: Import Word file loses non-breaking hyphens
          Joel Cherney Adobe Community Professional & MVP

          Well, I can think of two ways to get a non-breaking hyphen into Word. One is to use a font that has a non-breaking hyphen in it, and the other is to use Word's non-breaking hyphen. So you can get a glyph that acts like a no-break hyphen in fonts that don't actually have anything at that codepoint. This is true of Word on both Windows and Mac platforms (because if 100% of your translators are using Word for Mac, I'll eat my hat). Take a look at a Word screenshot (with some annotations in red):

           

          hyphen1.png

           

          and that text saved as .docx and placed into InDesign CS6:

           

          hyphen2.png

           

          So, before placing your translators' Word docs into InDesign, you need to figure out how to clean your documents of "fake" nonbreaking hyphens and replace 'em with "real" Unicode no-break hyphens. I would make a character style in Word with a font that I knew supported non-breaking hyphens, and then do a careful find-replace, and then resave the cleaned Word doc before placing.

          • 2. Re: Import Word file loses non-breaking hyphens
            guardaroba Level 1

            Thanks for the reply!

             

            When I tried to test it I inserted a nb-hyphen using the keyboard shortcut I found on the Microsoft website, CTRL+SHIFT+HYPHEN (CMND for Mac).

             

            You're saying that's not the correct way to add a nb-hyphen that will survive being imported into InDesign? Even if the font used throughout supports 'real' nb-hyphens?

             

            This still seems to be a bit of a bug with InDesign's Word import, if not in a technical sense at least in terms of useability. Wouldn't it make more sense to look for the 'fake' hyphens and either translate them into InDesign-recognized nb-hyphens or at least into plain vanilla hyphens? It also seems strange that Word wouldn't insert a proper Unicode nb-hyphen when the font -does- support them.

            • 3. Re: Import Word file loses non-breaking hyphens
              Joel Cherney Adobe Community Professional & MVP

              You're saying that's not the correct way to add a nb-hyphen that will survive being imported into InDesign? Even if the font used throughout supports 'real' nb-hyphens?

               

              This almost becomes a philosophical question for me. Scratch that, it's not "almost," it is. The "correct" way for you is whatever works with your workflow. If your workflow is unfixable, you can only file bug reports and hope. But it seems fixable to me. Here's the experiment I did:

               

              1) Opened Word and started a new doc with Lucida Sans Unicode

              2) Inserted Word non-breaking hyphens in "What the heck" with control-shift-hyphen, and then Unicode non-breaking hyphens with alt + 2011 in "does this do" (or just find the glyph named "Non-Breaking Hyphen" in Insert Symbol, works the same way)

              3) Then highlighted each hyphen and went to Insert Symbol, which reported "Non-breaking hyphen (U+2011)":

               

              test1.png

              test2.png

               

              4) Then I selected all text and changed it to Minion Pro (an Adobe font), and repeated step 3:

              test3.png

              test4.png

              Obviously, Word cares how the symbol is keyed (when it really shouldn't). If you key it in with Word's own shortcut, it won't necessarily survive a change in font. That's the control-shift-hyphen being displayed as a "space" in Word's own "what glyph is this?" tool. But the nonbreaking hyphen that was not keyed using Word's shortcut is still correctly recognized.

               

              In general, I don't trust Word when it comes to advanced typographical stuff. Heck, Word drops the ball on intermediate typographical stuff, and since 80% of my job is "get forty different refugee languages out of Word and into InDesign" I get to experience all of the many ways that Windows and Word and InDesign and etc. handle fonts differently, and drop different balls at different stages of text manipulation. So, in this case, I'm saying that the Correct Way is to figure out how to preprocess your documents so that they are rendered correctly by InDesign's Word filter. If you rely on these companies to fix their tools so that you don't have to preprocess anything, you will wait a long time. In the interim, doing experiments to figure out how to get company A's product to work with company B's product will get your translations into ID and rendered correctly.

              • 4. Re: Import Word file loses non-breaking hyphens
                Joel Cherney Adobe Community Professional & MVP

                In case it's not obvious: Word's Insert Symbol menu displays the Unicode name of the selected glyph in the lower-left-hand corner, immediately above the Autocorrect button. I don't remember if Word for Mac has this menu or not.

                • 5. Re: Import Word file loses non-breaking hyphens
                  Mayerchak Level 1

                  Hi,

                   

                  I'm using Word Mac 2011. I have Word .docx files from an editor that are full of Nonbreaking Hyphens and the editor says they never intended to type them . . . they were apparently inserted automatically. But they are not "bad" per se . . . as long as we don't lose them.

                   

                  When I place text into InDesign, if I place the .docx file, the hyphens are stripped out and the characters run together. But, if I save it as a .doc file, they come in as nonbreaking hyphens.

                   

                  Here is what one looks like in Word:

                  Screen Shot 2013-10-15 at 11.10.18 AM.png

                  When I select Insert Symbol in Word, I don't get the palette that shows the Unicode values; it doesn't even show which character is selected. Here is what I get: (it doesn't matter what char is selected).

                   

                  Screen Shot 2013-10-15 at 10.46.03 AM.png

                   

                  I looked into my Word settings for autotext and autocorrect and they don't mention adding nonbreaking hyphens anywhere. I believe the editor is on a Windows platform, so perhaps in their version that is an option.

                   

                  Do you know where the controls are in Word for automatically inserting the nonbreaking hyphens?

                  • 6. Re: Import Word file loses non-breaking hyphens
                    Joel Cherney Adobe Community Professional & MVP

                    Do you know where the controls are in Word for automatically inserting the nonbreaking hyphens?

                     

                    I don't think there are such controls. Either someone is hitting the key combo accidentally, or there is a conversion error.

                     

                    According to the InDesign Secrets guide, "tilde with dot below" is a flush space. Doesn't seem to be what you are seeing here. That is what is in your screenshot, right? A hyphen marked with a tilde and a dot below? Doesn't seem very likely that your flush space has a strikethrough...

                     

                    If you open the Glyphs panel in InDesign and highlight the funky hyphen, then mouse-over the glyph in the Glyphs menu, a mouseover popup should tell you the Unicode name, as well as some other information. What is it?

                    • 7. Re: Import Word file loses non-breaking hyphens
                      Mayerchak Level 1

                      Sorry if I wasn't clear enough - those screenshots are from Word Mac 2011. 

                       

                      In InDesign, the character looks just like a regular hyphen. There is no code to be seen. If I open the Glyph panel and highlight the Nonbreaking Hyphen, the GID # is 13 in the Dante MT Pro Medium font and 15 in the Dante MT Pro Book font - exactly the same as it is for a regular (breaking) hyphen.

                       

                      The unicode name for both is "HYPHEN-MINUS". Not much help there.

                       

                      However, if I search with find/change, it can tell the difference between the breaking and nonbreaking hyphens. I find normal ones by typing "-" and the nonbreaking by typing "^~" (or selecting it from the flyout menu, which inserts the same code).

                       

                      I can't see any difference, but somehow Indesign can tell them apart.

                      • 8. Re: Import Word file loses non-breaking hyphens
                        Joel Cherney Adobe Community Professional & MVP

                        Well! Now that's odd. I've not experienced that behavior at all. You're saying InDesign won't break at what looks like a perfectly normal HYPHEN-MINUS if, in Word, that hyphen was a nonbreaking hyphen? And that you can find the nonbreaking hyphens with FInd/Change, but when you mouseover the glyphs it reports that it's not a nonbreaking hyphen? Bizarre.

                         

                        What version of InDesign are you using? I'll see if I can reproduce this issue over here.

                         

                         

                         

                        However: I'd still say that the way for you to get around this is to find the nonbreaking hyphens in Word and replace 'em with true-blue Unicode "NON-BREAKING HYPHEN" encoded at 2011, from a font that actually has a glyph at that codepoint.

                        • 9. Re: Import Word file loses non-breaking hyphens
                          RBastier

                          Thanks for the answer re. saving .docx as .doc, that works perfectly. Surely this is a bug that should be looked at though? I mean, if importing .docs into InDesign worked fine without faffing around with Unicode etc., why shouldn't it work for .docxs?

                          • 10. Re: Import Word file loses non-breaking hyphens
                            JolinMasson Level 1

                            +1 for me, I have the exact same problem.

                            • 11. Re: Import Word file loses non-breaking hyphens
                              JeffArdoise Level 1

                              +1 for me too, I still think this is an issue with the import of .docx in ID. In Word 2016 for Mac, if you put a non breaking hyphen, with either the keyboard shortcut Microsoft tells you to use or use the menu to insert one, ID should be able to see those and convert them to what it's version of a non breaking hyphen is. Converting to .doc is not an option for us.

                              • 12. Re: Import Word file loses non-breaking hyphens
                                Laubender Adobe Community Professional & MVP

                                Hi Jeff,

                                maybe you could do a little docx forensics and dissect the container file ( your docx ) by unzipping it and find out how this special character is defined within the XML structure there?

                                 

                                Would it be an option to search/replace the special character in Word with something unique like a combination of §$§ perhaps?
                                After import to InDesign you could replace that with the appropriate character.

                                 

                                I understand, that it is no option to re-save the docx to doc file format.

                                That would perhaps trigger different bugs with InDesign's import filter.

                                 

                                Regards,
                                Uwe

                                • 13. Re: Import Word file loses non-breaking hyphens
                                  JeffArdoise Level 1

                                  Hi Uwe,

                                   

                                  I did made a script that goes through Word and replace those non-breaking hyphen for regular hyphen. BUT, since we are a couple of guys to have to do this when a Word file comes in, sometime the script isn't run and we get feed back from the client saying some hyphens are missing, not good. Since I'm sure we are not the only one having this issue with word files provided by clients, I'm just supprised that the Adobe InDesign didn't fix this on there and, I'm pretty sure it would be pressy straight forward for them to do. Just saying;-)

                                  1 person found this helpful
                                  • 14. Re: Import Word file loses non-breaking hyphens
                                    Laubender Adobe Community Professional & MVP

                                    Hm. I doubt that fixing bugs to the Word docx import filter currently get high priority.
                                    So stay with your workflow to resolve the problem on the Word docx stage.

                                     

                                    Maybe you could expand the script for Word a bit to e.g. change the name of the docx with an addition after doing its main job.


                                    And then you have the chance to write a startup script for InDesign that checks if this addition is in the name of the docx before import.
                                    If not you could alert the user. Or—if you want to be super restrictive—not allow to import the docx at all.

                                     

                                    Regards,
                                    Uwe