Parse text and hyperlink

Report · Mar 19, 2008

I posted this previously but did not find a solution. I have found REGEX in the past that might do it, but can't find the links to them anymore, one was on houseoffusion.com but I just can't locate it.

What I want to do is take some text from a query and then parse it for any URL's, namely anything that begins with HTTP:// the ending character being a space which would be present after the URL.

I don't know anything else about the actual link as they will all be different, but I can say that the text will potentially have more than one link within it.

Does anybody have a solution to hand? Or can you point me to a suitable solution? Maybe one of the regex?

Thanks

Mark

Report · Mar 19, 2008

ACS LLC wrote:
>
> Does anybody have a solution to hand? Or can you point me to a suitable
> solution? Maybe one of the regex?
>
> Thanks
>
> Mark
>

I think this should get you close.

" HTTP://.*? "
^ [ ]ending space character
^ [?]non-gready - i.e. match the shortest possible string.
^

match one or more of the previous character.
^ [.]match any character
^ Match the string " HTTP://"

Report · Mar 19, 2008

how would this be implemented into a CF command?? I'm not too familiar with this particular process

Report · Mar 19, 2008

ACS LLC wrote:
> how would this be implemented into a CF command?? I'm not too familiar with this particular process

<cfset regex = " HTTP://.*? ">
<cfset string = "The quick brown http://www.fox.com jumped over the lazy
dog."
<cfset links = refind(regex, string)>

This will return an array of values that are the starting character and
length of the matching strings. So you would then use something like
the mid() function to parse the actual matching sub-strings from the
parent string.

Report · Mar 19, 2008

umm. Not sure if that will do the job

Report · Mar 19, 2008

ACS LLC wrote:
> umm. Not sure if that will do the job

Then what is the 'job' and why won't this do it?

Report · Mar 19, 2008

I need to do a replace on the hyperlinks, so find them, and then take the value, and replace with html, so

a- find all links (beginning with HTTP://_)

b) replace with <a href="thelink" class="_blank">thelink</a>

Not quite sure if I can then take this to run the replace over an unspecified number of links?

Report · Mar 19, 2008

If you just need a regex replace that is a bit more straightforward.

<cfscript>
regex = "(http:/.*?) ";
string = "The quick brown http://www.fox.com jumped over the
http://www.lazy.net dog.";

newString = reReplaceNoCase(string,regex,'<a href="\1">\1</a>',"ALL");
</cfscript>

For fun, I played with this a bit and here is some code that
demonstrates both a search and find, and the above simple replace. You
will probably want to expand the regex to include instances where the
link is not in the middle of a sentence and thus is followed by
something other then a space such as a period.

<cfscript>
regex = "(http:/.*?) ";
string = "The quick brown http://www.fox.com jumped over the
http://www.lazy.net dog.";

//LOGIC TO FIND EACH INSTANCE.
links = arrayNew(1);
start = 1;
link = reFindNoCase(regex,string,start,true);

while (link.pos[1] NEQ 0) {
arrayAppend(links,mid(string,link.pos[1],link.len[1]));

start = link.pos[1] + link.len[1];
link = refindnocase(regex,string,start,true);
}

//LOGIC TO REPLACE EACH INSTANCE.
newString = reReplaceNoCase(string,regex,'<a href="\1">\1</a>',"ALL");
</cfscript>

<cfdump var="#links#">
<hr />
<cfoutput>#htmlCodeFormat(newString)#</cfoutput>

Report · Mar 19, 2008

Nice job 🙂 I will test his later this evening, I just have a couple of critcal jobs to get out the way. I'll let you know how it goes

Thanks

Report · Mar 19, 2008

It almost worked :)

Well actually it did, but it needs more info in the regex. As you mentioned it needs to cover situations where there is a period after instead of a space after the hyperlink otherwise it hyperlinks the rest of the text

The other issue I had is returns in the text, I had a link that was on one page then text under it, but it carried on displaying as a hyperlink

This is what I was using to replace the text initially so that the BR's went in there

#Replacenocase(GetMessage.message_body, "
", "<br>", "ALL")#

What I meant to say at the end of this message is, do you know how to add a period and return to this? Regex is not my thing at all :(

Thanks!

Adobe Community

Parse text and hyperlink