• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Parsing a text file

New Here ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

I am trying to parse a long string and get specific values out. The file is actually an email and I need to pull out items from the body to populate a query. The email would look something like

Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212

I need to pull the values out to populate firstName, lastName, dayPhone, etc...

I can't get the company to provide this in any other format. I am thinking this should be possible using regular expression to strip all the spaces out, then get the value of firstName by finding Name: and then , and grabbing whats in the middle.

Any ideas on this?
TOPICS
Advanced techniques

Views

851

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

Take a look at the documentation for the ReFind function. Youll probably also need to do a bit of research on regular expressions as well, but this should solve your problem.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

<cfsavecontent variable="sTest">
Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212
</cfsavecontent>

<cfset sPipeDelimList = ReReplace(sTest, "Name:[ ]?(.+)Home:[ ]?(.+)\nAddress1:[ ]?(.+)Work:[ ]?(.+)", "\1|\2|\3|\4", "all")>

<cfdump var="#sPipeDelimList#"><cfabort>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

A little explaination:

Name: - matches the literal text "Name:"
[ ]? - an optional space character
(.+) - matches 1 or more characters (the parenthesis are used to back reference the value later)
\n - new line character

So the expression:

Name:[ ]?(.+)Home:\n

translates into: match any text that starts with a literal "Name:" then an optional space then any text until the literal text "Home:" followed by a new line character

Once you match the values, you can use back references (e.g. \1,\2,\3,etc) to only return the values in parenthesis. In the example I used, I created a pipe (|) delimited file that I could easily process.

Once you check out the expression, the pattern should become evident and you should be able to use it for the entire email.

Hope that helps!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

insuractive,

Nice. I forgot about backreferences 🙂 Question, do you need the spaces/new lines in the regex or can you just trim the list values afterwards?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 10, 2007 Apr 10, 2007

Copy link to clipboard

Copied

Thanks insuractive.

I'll give it a try and let you know how it turns out.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 11, 2007 Apr 11, 2007

Copy link to clipboard

Copied

cf_dev2,
You could trim the spaces and replace the new line characters after you create the list using Trim() and Replace(). You would have to modify the expression a little in order to accomodate the white space characters (I don't think "." matches against spaces or new lines).

The new line character is a bit of a pain, though - Especially if you forget to remove it and then you have an invisible character at the end of your string in your DB. Plus, using "\n" (regExp newline) is a bit easier than #chr(13)##chr(10)# (ASCII/CF New Line)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Apr 11, 2007 Apr 11, 2007

Copy link to clipboard

Copied

Oh, and I should probably mention - I'm not really a regular expression guru. I am, however, a proud downloader of the RegEx Coach, a piece of shareware that makes testing and building regular expressions super easy. Might be a useful tool for anyone else interested in using regular expressions but not wanting to do trial and error testing in CF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Apr 11, 2007 Apr 11, 2007

Copy link to clipboard

Copied

insuractive,

You're right about the newline character. I thought (.) matched any character (including space and newline). But apparently it doesn't : "A period (.) matches any character except newline". So your original solution is more elegant than a regex + find + replace. Thanks for the info :)




Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 11, 2007 Apr 11, 2007

Copy link to clipboard

Copied

Well... After some trial and error I got it working. I had some issues with whitespace, tabs, and new lines not being pulled out with \s \t \n or [:space:]

I'm sure this was my error, but here was my solution anyway. Any suggestions on cleaning this up would be appreciated.


<cfif getMail.RecordCount> My cfpop call
<cfsavecontent variable='emailBody'><cfoutput>#getMail.body#</cfoutput></cfsavecontent>
<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>
<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>


<cfset myList = ListToArray(lobType,'|')>


It needs some cleaning up from my testing, but it works. Thanks for the help

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 12, 2007 Apr 12, 2007

Copy link to clipboard

Copied

So... I've got the above working, but is their a way to insert a default value if it's null. For example...

I'm pulling Work:(.+)Address1: As \6|

but I have fed some values in where the string ends up being Work:Address1: so when I try to put this into an array it throws an exception.

I tried go back at the end and doing ReReplace(myList,'\|\|','| |','all') but it still throws and error because \6 is never defined.

Any ideas

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Apr 14, 2007 Apr 14, 2007

Copy link to clipboard

Copied

> but I have fed some values in where the string ends up being Work:Address1:

ThomasWood,

Can you post an example of the full values? Maybe someone else can help with the regex.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 16, 2007 Apr 16, 2007

Copy link to clipboard

Copied

LATEST
PRODUCT TYPE: Homeowners

CONTACT CAN BE REACHED AT:
Name: value, test Home: (555)555-1212
Address1: 123 Test Ln Work: (555)555-1212
Address2:
City/St/Zip: Wilmington, NC, 28412 Email: wesley@mystateinsurance.com
County: New hanover
Contact Time: Anytime Respond Time: Within 24 hours
------------------------------------------------------------------------


This is what it looks like before I run the

<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>

after this is run the string looks like

PRODUCTTYPE:HomeownersCONTACTCANBEREACHEDAT:Name:value,testHome:5555551212Address1:123TestLnWork:5555551212Address2:City/St/Zip:Wilmington,NC,28412Email:wesley@mystateinsurance.comCounty:Newhanover

Then this is run

<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

and the result is

Homeowners|value|test|5555551212|123TestLn|5555551212|Wilmington|NC|28412|wesley@mystateinsurance.com|Newhanover|...rest of the string...

This works fine assuming all values are present. But if the work number, for instance was blank in the initial string then there is no space in Work:Address2: which ends up not running the entire conversion correctly.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation