Copy link to clipboard
Copied
Hi,
I am trying to extract "email address" from an html output query. How would I do that?
I am on CF9.
example:
Query col1:
<html><head></head><body>today they emailed about it from (mailto:xxx@hotmail.com) ...hello there and here</body></html>
Copy link to clipboard
Copied
Regular Expressions are often the tool to use for that kind of string manipulation.
ColdFuion has the reFind() and reReplace() functions to tap into a large part of the power of Regular Expressions.
Copy link to clipboard
Copied
I cannot setup the reqular expr. I need some sample///
Copy link to clipboard
Copied
Here are some resources to help get you started using regular expressions:
The CF documentation
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f-7ffb.html
Tutorial website
http://www.regular-expressions.info/
Ben Forta's ColdFusion books have coverage of regular expressions, at least in the CF6,7, and 8 editions that I own.
http://www.forta.com/books/
Copy link to clipboard
Copied
Here's a function I wrote for use on some of our CF sites:
<cffunction access="public" name="isEmailAddressValid" returntype="boolean">
<cfargument name="email" type="string" required="yes">
<cfif refindnocase("^([_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.(([a-z]{2,3})|(aero|coop|info|museum|name)))?$",arguments.email) neq 0>
<cfreturn true>
<cfelse>
<cfreturn false>
</cfif>
</cffunction>
This should get you started, along with the referenced CF documentation, you should be able to use this to extract an address.
Good luck!
bh
Copy link to clipboard
Copied
Argh! No!
God I hate it when people knock together a regex like this and go "Look! Email address validation!"
Before one starts down this road, one should read the RFC (http://tools.ietf.org/html/rfc5322, summarised here: http://en.wikipedia.org/wiki/Email_address).
Your own regex fails my spamtrap email address (for example: adam.cameron.signup+adobeforums@gmail.com), because you've forgotten that a + is a legitimate character in the local part of an email address. Along with a bunch of other completely legit characters.
Reading on through the RFC you will realise than ANYTHING is valid in the local part of an email address, provided it's quoted (double-quote being another character your regex doesn't accept).
If someone doesn't want to give you their valid email address, they won't. I can give you adam@notmyaddress.com, and that will pass. If I do want to give you my address, you should make sure your code will actually accept it!
I can understand wanting to make sure the punter doesn't key their email address in incorrectly, but your method doesn't help here. It'd pass adan@ismyaddress.com, despite the fact that it should be adam@ismyaddress.com. "Close" is not good enough in these cases.
The only sensible way of doing this is to ask them to type it in twice. This will assist people who don't just roll their eyes and copy and paste what they typed in the first box into the second box, wondering why you're wasting their time. So a typo will be transferred, so it's no help.
If you really want to get a person's email address, deprive them of something until they respond to an email that you end them. At the email address they specified. Because they actually don't mind you having their email address. This only works if you're not simply trying to harvest email addresses for your own benefit, and not the benefit of your subscribers.
Bottom line: email address is a mug's game, and one not often played by people who know the rules.
--
Adam
Copy link to clipboard
Copied
Listen, congrats on your thesis, man.
My function will get him started, you've yet to provide anything to help get the guy going.
He's asking about EXTRACTING email addresses from a lengthy string of HTML.
Your advise on "entering twice" is moot in this regard.
Instead of getting excited about my apparently insufficient regex, why don't you read the original request and try HELPING.
Copy link to clipboard
Copied
Oh, don't get your tits in a tangle because I observed a shortcoming in your approach to something.
Your technique of using a regex to extract it is sound: I had nothing to add to that part of things, other than - obviously - the regex is too limited to be useful.
However given the sample mark-up, it's going to be difficult to reliably extract the email address via starting from the position that one can have a simple-ish pattern to match the email address, because it's a bogus position to start from.
I think in the given situation, if they email address is simply floating around within other text with nothing else to delimit it, then perhaps just extracting a pattern that is a run of characters between whitespace chars, eg: \s*(.+@.+)\s* (and pull out the match for the sub-expression) or something along those lines. It can't really get any more precise than that, and it will possibly throw up some false positives, but at least it won't exclude valid email addresses.
--
Adam
Copy link to clipboard
Copied
Ok, fair enough, I'm defensive of my code.
Let's take my original example, your reference to RFC and additional allowed characters, and say that the two combined will provide a pretty good start to his problem.
Hopefully this discussion proves useful to someone.