Skip navigation
Currently Being Moderated

Turning off Hyphenation for URLs

Nov 29, 2012 4:13 PM

Tags: #no #hyperlink #language #urls #hyphenation

Hello,

 

There was quite an old trick for turning off hyphenation of URLs but it no longer seems to work. I used the following GREP find to get style my URLs:

 

((http://www\.|www\.|http://)\w+(\.|/)(\w+(\.|/))*\w+)|[\l\u\d_%-]+@[\l\u\d_%-]+(\.|/)\w+(\.|/)\w*

 

But when I apply the character style "[No Language]," the URL still hyphenates—as you can see in the attached screenshot. The URL in question is: http://thewallstreetjournal.org

 

Does anyone know why this is?

 

no-hyphenation.png

 
Replies
  • Currently Being Moderated
    Nov 30, 2012 1:58 AM   in reply to Jamil Jonna

    You are probably stretching the possibilities too far.

     

    InDesign does not like hyphenating [No Language] text, because it doesn't know how to. On the other hand ... "thewallstreetjournal" is a single 'word', and you can see the effect of not hyphenating it by applying No Break. Since it's the 4th line of a very tight column, the spacing is going to be awful. InDesign assigned a value to "godawful spacing" <-> "inserting a random hyphen" and chose the Least Worst Solution.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 3:04 AM   in reply to Jamil Jonna

    I tested this on a rather long url

     

    Premise is that a URL can't have spaces, so this looks for anything before the "www" but attached to it right up to the very end of the URL - given there are no spaces.

     

    (?<=\s).+?www.+?(?=\s)

     

    It probably won't work in all cases though

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 4:28 AM   in reply to Jamil Jonna

    You can use it GREP in paragraphs styles

     

    [\l\u\d\.]+@[\l\u\d\.]+

     

    the character style should be "No Break" in character formats, this will work.

     

     

    simon

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 11:31 AM   in reply to Jamil Jonna

    It is certainly a pain it the neck (and probably breaks the hyperlink if you expect it to auto-generate as in Reader, for example), but have you considered adding your own discretionary line breaks to URLs to control how they break?

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 12:34 PM   in reply to Jamil Jonna

    @Jongware "InDesign does not like hyphenating [No Language] text, because it doesn't know how to." I could care less if Indesign "doesn't like" hyphenating No Language: the point is, it shoudn't since it has no basis to do so logically.

     

    I share your beef with InDesign's tendency to hyphenate stuff that it should not hyphenate (like things marked with No Language) but I think you are misreading Jongware's brevity. When he says "doesn't like" I think he means that there is a heirarchy of techniques for altering composition which the Adobe Paragraph Composer will use, and while hyphenation of No Language should be completely forbidden from our point of view, it is instead on the list of possible composition-altering tools that the Paragraph Composer will use. I don't think he'd disagree with you in your claim that the Paragraph Composer should not hyphenate No Language-marked text, simply that it does when you push it.

     

    In terms of resolving your problem: I created a No Break character style, and then applied it with a GREP style using your GREP. It prevented URLs from breaking at all. Then I sat down to adjust your GREP query to try to use it to add some discretionary line breaks, then Peter posted his suggestion. I haven't been able to adjust your GREP query to perfectly add discretionary line breaks, and I think that this:

     

    Find: ([\l\u\d]+)(\.)([\l\u]+)

    Replace with: $1$2~k$3

     

    will add discretionary line breaks to most of your URLs, and this GREP style

     

    Apply Style: No Break

    To Text: [\l\u\]+\.

     

    will permit hyphenation in non-URL content, yet prevent hypenation of URLs. It worked perfectly in my test, but you will almost certainly need to fine-tune your initial GREP query to catch all of your potential  %20-containing URL permutations. The only URL in my sample was actually your WSJ url, and I see that your URL-finding GREP query accounts for far more permutations than a simple "letters separated by a period" that mine finds.

     

    I don't know why your old method stopped working, to be honest. I suspect it is because of behind-the-scenes changes in the way the Paragraph Composer operates. I am so sick of improper hyphenation of stuff marked with No Language, and in general of Proximity algorithmic hyphenation, especially in non-English languages, that my general advice amounts to "Turn off all hyphenation everywhere unless it's absolutely necessary."

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 1:15 PM   in reply to Joel Cherney

    > When he says "doesn't like" I think he means that there is a heirarchy of techniques for altering composition which the Adobe Paragraph Composer will use, and while hyphenation of No Language should be completely forbidden from our point of view, it is instead on the list of possible composition-altering tools that the Paragraph Composer will use.

     

    I came to that statement by looking at the position of the URL inside the paragraph. Not breaking it would most likely stretch word spacing beyond a reasonable limit. Disagreeing with Adobe, however, on "what's reasonable", and what rules can be bent and what rules can be broken, is perfectly alright. There is only one catch: you can disagree all tou want but it won't change a thing.

     

    If you think it would be better to break *anywhere*, if only there wouldn't be a hyphen, there is actually a code for that. Look for Discretionary Line Break.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 2:31 PM   in reply to Jamil Jonna

    I changed to the newer composer available in CS6 so I'll just try switching back. We'll see if the [No Language] behaves as expecetd. There may be other instances where I need such behavior, which is why I'm interested.

     

    I have been using the World-Ready Composer since its unadvertised undocumentd under-the-hood introduction in CS4. I have noticed no differences whatsoever between Latin-script paragraph composition in the ordinary Paragraph Composer and the World-Ready Composer. So I don't think that the WRC is going to get you what you want, here.

     

    And, when thinking about it again, I don't know if the behavior of the Paragraph Composer has changed or not. Because the rules by which it composes paragraphs are not, to my knowledge, available to us as end users, we can't know for sure. So I tried some experiments: I made a one-page-sized text frame full of lorem ipsum. I set the language to English so it'd hypenate. I then set the text "www.thewallstreetjournal.com" and applied No Language to it. I then sprinkled that URL throughout the sample text. I then duped this file and made separate files for CS3, CS4, CS5.5, and CS6. I then opened each one in its respective version and manually resized the text frame to make it very narrow. As you can see, I'm trying to recreate your issue, here.

     

    At the point at which I would have expected .thewallstreet. to hyphenate, I got... an "overset text" marker. I duplicated your environment as far as I could, but could not recreate your issue. In each version of InDesign, the No Language setting did exactly what you wanted.

     

    So, there are very few possibilities, here. But so far as I can tell, even on CS6 the "no language" setting still behaves as I would expect. I'm flabbergasted, honestly; I thought your issue would be easy to recreate. Can you maybe share the file in which this is happening? Or at least save out to .idml and reopen to see if this is being caused by corruption in your document?

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 30, 2012 3:35 PM   in reply to Joel Cherney

    Eep. Never mind. </gildaradner>

     

    I just tried to recreate your example, even going so far as to OCR the text in your screenshot. I misread your last post - I thought that you were trying out the WRC to see if it'd fix your problem. I now see that using the WRC actually induced this issue, and yes, for sure you have found a bug in the WRC. It seems to be ignoring the "No Language" criteria and hyphenates the URL when it shouldn't.

     

    Furthermore, it's a new bug in CS6 in the WRC - when I do the same test in CS5.5, all four composers (I didn't try either Japanese composer) respect the No Language setting and refuse to hyphenate the URL.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points