9 Replies Latest reply on Mar 19, 2008 2:12 PM by ACS LLC

    Parse text and hyperlink

    ACS LLC Level 1
      I posted this previously but did not find a solution. I have found REGEX in the past that might do it, but can't find the links to them anymore, one was on houseoffusion.com but I just can't locate it.

      What I want to do is take some text from a query and then parse it for any URL's, namely anything that begins with HTTP:// the ending character being a space which would be present after the URL.

      I don't know anything else about the actual link as they will all be different, but I can say that the text will potentially have more than one link within it.

      Does anybody have a solution to hand? Or can you point me to a suitable solution? Maybe one of the regex?

      Thanks

      Mark
        • 1. Re: Parse text and hyperlink
          Level 7
          ACS LLC wrote:
          >
          > Does anybody have a solution to hand? Or can you point me to a suitable
          > solution? Maybe one of the regex?
          >
          > Thanks
          >
          > Mark
          >


          I think this should get you close.

          " HTTP://.*? "
          ^ [ ]ending space character
          ^ [?]non-gready - i.e. match the shortest possible string.
          ^ [*]match one or more of the previous character.
          ^ [.]match any character
          ^ Match the string " HTTP://"

          • 2. Re: Parse text and hyperlink
            ACS LLC Level 1
            how would this be implemented into a CF command?? I'm not too familiar with this particular process
            • 3. Re: Parse text and hyperlink
              Level 7
              ACS LLC wrote:
              > how would this be implemented into a CF command?? I'm not too familiar with this particular process

              <cfset regex = " HTTP://.*? ">
              <cfset string = "The quick brown http://www.fox.com jumped over the lazy
              dog."
              <cfset links = refind(regex, string)>

              This will return an array of values that are the starting character and
              length of the matching strings. So you would then use something like
              the mid() function to parse the actual matching sub-strings from the
              parent string.

              • 4. Re: Parse text and hyperlink
                ACS LLC Level 1
                umm. Not sure if that will do the job
                • 5. Re: Parse text and hyperlink
                  Level 7
                  ACS LLC wrote:
                  > umm. Not sure if that will do the job


                  Then what is the 'job' and why won't this do it?
                  • 6. Re: Parse text and hyperlink
                    ACS LLC Level 1
                    I need to do a replace on the hyperlinks, so find them, and then take the value, and replace with html, so

                    a- find all links (beginning with HTTP://_)

                    b) replace with <a href="thelink" class="_blank">thelink</a>

                    Not quite sure if I can then take this to run the replace over an unspecified number of links?

                    • 7. Re: Parse text and hyperlink
                      Level 7
                      If you just need a regex replace that is a bit more straightforward.

                      <cfscript>
                      regex = "(http:/.*?) ";
                      string = "The quick brown http://www.fox.com jumped over the
                      http://www.lazy.net dog.";

                      newString = reReplaceNoCase(string,regex,'<a href="\1">\1</a>',"ALL");
                      </cfscript>


                      For fun, I played with this a bit and here is some code that
                      demonstrates both a search and find, and the above simple replace. You
                      will probably want to expand the regex to include instances where the
                      link is not in the middle of a sentence and thus is followed by
                      something other then a space such as a period.

                      <cfscript>
                      regex = "(http:/.*?) ";
                      string = "The quick brown http://www.fox.com jumped over the
                      http://www.lazy.net dog.";

                      //LOGIC TO FIND EACH INSTANCE.
                      links = arrayNew(1);
                      start = 1;
                      link = reFindNoCase(regex,string,start,true);

                      while (link.pos[1] NEQ 0) {
                      arrayAppend(links,mid(string,link.pos[1],link.len[1]));

                      start = link.pos[1] + link.len[1];
                      link = refindnocase(regex,string,start,true);
                      }

                      //LOGIC TO REPLACE EACH INSTANCE.
                      newString = reReplaceNoCase(string,regex,'<a href="\1">\1</a>',"ALL");
                      </cfscript>

                      <cfdump var="#links#">
                      <hr />
                      <cfoutput>#htmlCodeFormat(newString)#</cfoutput>

                      • 8. Re: Parse text and hyperlink
                        ACS LLC Level 1
                        Nice job :) I will test his later this evening, I just have a couple of critcal jobs to get out the way. I'll let you know how it goes

                        Thanks
                        • 9. Parse text and hyperlink
                          ACS LLC Level 1
                          It almost worked :)

                          Well actually it did, but it needs more info in the regex. As you mentioned it needs to cover situations where there is a period after instead of a space after the hyperlink otherwise it hyperlinks the rest of the text

                          The other issue I had is returns in the text, I had a link that was on one page then text under it, but it carried on displaying as a hyperlink

                          This is what I was using to replace the text initially so that the BR's went in there

                          #Replacenocase(GetMessage.message_body, "
                          ", "<br>", "ALL")#

                          What I meant to say at the end of this message is, do you know how to add a period and return to this? Regex is not my thing at all :(

                          Thanks!