Parsing a text file

Report · Apr 10, 2007

I am trying to parse a long string and get specific values out. The file is actually an email and I need to pull out items from the body to populate a query. The email would look something like

Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212

I need to pull the values out to populate firstName, lastName, dayPhone, etc...

I can't get the company to provide this in any other format. I am thinking this should be possible using regular expression to strip all the spaces out, then get the value of firstName by finding Name: and then , and grabbing whats in the middle.

Any ideas on this?

Report · Apr 10, 2007

Take a look at the documentation for the ReFind function. Youll probably also need to do a bit of research on regular expressions as well, but this should solve your problem.

Report · Apr 10, 2007

<cfsavecontent variable="sTest">
Name: Value, Test Home: (800)555-1212
Address1: 98 Test Center Dr Work: (800)5555-1212
</cfsavecontent>

<cfset sPipeDelimList = ReReplace(sTest, "Name:[ ]?(.+)Home:[ ]?(.+)\nAddress1:[ ]?(.+)Work:[ ]?(.+)", "\1|\2|\3|\4", "all")>

<cfdump var="#sPipeDelimList#"><cfabort>

Report · Apr 10, 2007

A little explaination:

Name: - matches the literal text "Name:"
[ ]? - an optional space character
(.+) - matches 1 or more characters (the parenthesis are used to back reference the value later)
\n - new line character

So the expression:

Name:[ ]?(.+)Home:\n

translates into: match any text that starts with a literal "Name:" then an optional space then any text until the literal text "Home:" followed by a new line character

Once you match the values, you can use back references (e.g. \1,\2,\3,etc) to only return the values in parenthesis. In the example I used, I created a pipe (|) delimited file that I could easily process.

Once you check out the expression, the pattern should become evident and you should be able to use it for the entire email.

Hope that helps!

Report · Apr 10, 2007

insuractive,

Nice. I forgot about backreferences 🙂 Question, do you need the spaces/new lines in the regex or can you just trim the list values afterwards?

Report · Apr 10, 2007

Thanks insuractive.

I'll give it a try and let you know how it turns out.

Report · Apr 11, 2007

cf_dev2,
You could trim the spaces and replace the new line characters after you create the list using Trim() and Replace(). You would have to modify the expression a little in order to accomodate the white space characters (I don't think "." matches against spaces or new lines).

The new line character is a bit of a pain, though - Especially if you forget to remove it and then you have an invisible character at the end of your string in your DB. Plus, using "\n" (regExp newline) is a bit easier than #chr(13)##chr(10)# (ASCII/CF New Line)

Report · Apr 11, 2007

Oh, and I should probably mention - I'm not really a regular expression guru. I am, however, a proud downloader of the RegEx Coach, a piece of shareware that makes testing and building regular expressions super easy. Might be a useful tool for anyone else interested in using regular expressions but not wanting to do trial and error testing in CF.

Report · Apr 11, 2007

insuractive,

You're right about the newline character. I thought (.) matched any character (including space and newline). But apparently it doesn't : "A period (.) matches any character except newline". So your original solution is more elegant than a regex + find + replace. Thanks for the info :)

Report · Apr 11, 2007

Well... After some trial and error I got it working. I had some issues with whitespace, tabs, and new lines not being pulled out with \s \t \n or [:space:]

I'm sure this was my error, but here was my solution anyway. Any suggestions on cleaning this up would be appreciated.

<cfif getMail.RecordCount> My cfpop call
<cfsavecontent variable='emailBody'><cfoutput>#getMail.body#</cfoutput></cfsavecontent>
<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>
<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

<cfset myList = ListToArray(lobType,'|')>

It needs some cleaning up from my testing, but it works. Thanks for the help

Report · Apr 12, 2007

So... I've got the above working, but is their a way to insert a default value if it's null. For example...

I'm pulling Work:(.+)Address1: As \6|

but I have fed some values in where the string ends up being Work:Address1: so when I try to put this into an array it throws an exception.

I tried go back at the end and doing ReReplace(myList,'\|\|','| |','all') but it still throws and error because \6 is never defined.

Any ideas

Report · Apr 14, 2007

> but I have fed some values in where the string ends up being Work:Address1:

ThomasWood,

Can you post an example of the full values? Maybe someone else can help with the regex.

Report · Apr 16, 2007

PRODUCT TYPE: Homeowners

CONTACT CAN BE REACHED AT:
Name: value, test Home: (555)555-1212
Address1: 123 Test Ln Work: (555)555-1212
Address2:
City/St/Zip: Wilmington, NC, 28412 Email: wesley@mystateinsurance.com
County: New hanover
Contact Time: Anytime Respond Time: Within 24 hours
------------------------------------------------------------------------

This is what it looks like before I run the

<cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>

after this is run the string looks like

PRODUCTTYPE:HomeownersCONTACTCANBEREACHEDAT:Name:value,testHome:5555551212Address1:123TestLnWork:5555551212Address2:City/St/Zip:Wilmington,NC,28412Email:wesley@mystateinsurance.comCounty:Newhanover

Then this is run

<cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)Home:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)County:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

and the result is

Homeowners|value|test|5555551212|123TestLn|5555551212|Wilmington|NC|28412|wesley@mystateinsurance.com|Newhanover|...rest of the string...

This works fine assuming all values are present. But if the work number, for instance was blank in the initial string then there is no space in Work:Address2: which ends up not running the entire conversion correctly.

Adobe Community

Parsing a text file