• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Extract email address from html

Participant ,
Jan 13, 2011 Jan 13, 2011

Copy link to clipboard

Copied

Hi,

I am trying to extract "email address"  from an html output query. How would I do that?

I am on CF9.

example:

Query col1:

<html><head></head><body>today they emailed about it from (mailto:xxx@hotmail.com) ...hello there and here</body></html>

TOPICS
Advanced techniques

Views

2.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Jan 13, 2011 Jan 13, 2011

Copy link to clipboard

Copied

Regular Expressions are often the tool to use for that kind of string manipulation.

ColdFuion has the reFind() and reReplace() functions to tap into a large part of the power of Regular Expressions.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 13, 2011 Jan 13, 2011

Copy link to clipboard

Copied

I cannot setup the reqular expr. I need some sample///

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Jan 14, 2011 Jan 14, 2011

Copy link to clipboard

Copied

Here are some resources to help get you started using regular expressions:

The CF documentation

http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec0a38f-7ffb.html

Tutorial website

http://www.regular-expressions.info/

Ben Forta's ColdFusion books have coverage of regular expressions, at least in the CF6,7, and 8 editions that I own.

http://www.forta.com/books/

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Feb 09, 2011 Feb 09, 2011

Copy link to clipboard

Copied

Here's a function I wrote for use on some of our CF sites:

<cffunction access="public" name="isEmailAddressValid" returntype="boolean">
    <cfargument name="email" type="string" required="yes">

    <cfif refindnocase("^([_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*\.(([a-z]{2,3})|(aero|coop|info|museum|name)))?$",arguments.email) neq 0>
        <cfreturn true>
    <cfelse>
        <cfreturn false>
    </cfif>

</cffunction>

This should get you started, along with the referenced CF documentation, you should be able to use this to extract an address.

Good luck!

bh

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 09, 2011 Feb 09, 2011

Copy link to clipboard

Copied

Argh!  No!

God I hate it when people knock together a regex like this and go "Look!  Email address validation!"

Before one starts down this road, one should read the RFC (http://tools.ietf.org/html/rfc5322, summarised here: http://en.wikipedia.org/wiki/Email_address).

Your own regex fails my spamtrap email address (for example: adam.cameron.signup+adobeforums@gmail.com), because you've forgotten that a + is a legitimate character in the local part of an email address.  Along with a bunch of other completely legit characters.

Reading on through the RFC you will realise than ANYTHING is valid in the local part of an email address, provided it's quoted (double-quote being another character your regex doesn't accept).

If someone doesn't want to give you their valid email address, they won't.  I can give you adam@notmyaddress.com, and that will pass.  If I do want to give you my address, you should make sure your code will actually accept it!

I can understand wanting to make sure the punter doesn't key their email address in incorrectly, but your method doesn't help here.  It'd pass adan@ismyaddress.com, despite the fact that it should be adam@ismyaddress.com.  "Close" is not good enough in these cases.

The only sensible way of doing this is to ask them to type it in twice.  This will assist people who don't just roll their eyes and copy and paste what they typed in the first box into the second box, wondering why you're wasting their time.  So a typo will be transferred, so it's no help.

If you really want to get a person's email address, deprive them of something until they respond to an email that you end them.  At the email address they specified. Because they actually don't mind you having their email address.  This only works if you're not simply trying to harvest email addresses for your own benefit, and not the benefit of your subscribers.

Bottom line: email address is a mug's game, and one not often played by people who know the rules.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Feb 09, 2011 Feb 09, 2011

Copy link to clipboard

Copied

Listen, congrats on your thesis, man.

My function will get him started, you've yet to provide anything to help get the guy going.

He's asking about EXTRACTING email addresses from a lengthy string of HTML.

Your advise on "entering twice" is moot in this regard.

Instead of getting excited about my apparently insufficient regex, why don't you read the original request and try HELPING.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 09, 2011 Feb 09, 2011

Copy link to clipboard

Copied

Oh, don't get your tits in a tangle because I observed a shortcoming in your approach to something.

Your technique of using a regex to extract it is sound: I had nothing to add to that part of things, other than - obviously - the regex is too limited to be useful.

However given the sample mark-up, it's going to be difficult to reliably extract the email address via starting from the position that one can have a simple-ish pattern to match the email address, because it's a bogus position to start from.

I think in the given situation, if they email address is simply floating around within other text with nothing else to delimit it, then perhaps just extracting a pattern that is a run of characters between whitespace chars, eg: \s*(.+@.+)\s* (and pull out the match for the sub-expression) or something along those lines.  It can't really get any more precise than that, and it will possibly throw up some false positives, but at least it won't exclude valid email addresses.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Feb 09, 2011 Feb 09, 2011

Copy link to clipboard

Copied

LATEST

Ok, fair enough, I'm defensive of my code.

Let's take my original example, your reference to RFC and additional allowed characters, and say that the two combined will provide a pretty good start to his problem.

Hopefully this discussion proves useful to someone.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation