12 Replies Latest reply on Apr 16, 2007 7:11 AM by ThomasWood

    Parsing a text file

      I am trying to parse a long string and get specific values out. The file is actually an email and I need to pull out items from the body to populate a query. The email would look something like

      Name: Value, Test Home: (800)555-1212
      Address1: 98 Test Center Dr Work: (800)5555-1212

      I need to pull the values out to populate firstName, lastName, dayPhone, etc...

      I can't get the company to provide this in any other format. I am thinking this should be possible using regular expression to strip all the spaces out, then get the value of firstName by finding Name: and then , and grabbing whats in the middle.

      Any ideas on this?
        • 1. Re: Parsing a text file
          joeDangelo Level 1
          Take a look at the documentation for the ReFind function. Youll probably also need to do a bit of research on regular expressions as well, but this should solve your problem.
          • 2. Re: Parsing a text file
            insuractive Level 3
            <cfsavecontent variable="sTest">
            Name: Value, Test Home: (800)555-1212
            Address1: 98 Test Center Dr Work: (800)5555-1212

            <cfset sPipeDelimList = ReReplace(sTest, "Name:[ ]?(.+)Home:[ ]?(.+)\nAddress1:[ ]?(.+)Work:[ ]?(.+)", "\1|\2|\3|\4", "all")>

            <cfdump var="#sPipeDelimList#"><cfabort>
            • 3. Re: Parsing a text file
              insuractive Level 3
              A little explaination:

              Name: - matches the literal text "Name:"
              [ ]? - an optional space character
              (.+) - matches 1 or more characters (the parenthesis are used to back reference the value later)
              \n - new line character

              So the expression:

              Name:[ ]?(.+)Home:\n

              translates into: match any text that starts with a literal "Name:" then an optional space then any text until the literal text "Home:" followed by a new line character

              Once you match the values, you can use back references (e.g. \1,\2,\3,etc) to only return the values in parenthesis. In the example I used, I created a pipe (|) delimited file that I could easily process.

              Once you check out the expression, the pattern should become evident and you should be able to use it for the entire email.

              Hope that helps!
              • 4. Re: Parsing a text file
                cf_dev2 Level 1

                Nice. I forgot about backreferences :) Question, do you need the spaces/new lines in the regex or can you just trim the list values afterwards?

                • 5. Re: Parsing a text file
                  ThomasWood Level 1
                  Thanks insuractive.

                  I'll give it a try and let you know how it turns out.
                  • 6. Re: Parsing a text file
                    insuractive Level 3
                    You could trim the spaces and replace the new line characters after you create the list using Trim() and Replace(). You would have to modify the expression a little in order to accomodate the white space characters (I don't think "." matches against spaces or new lines).

                    The new line character is a bit of a pain, though - Especially if you forget to remove it and then you have an invisible character at the end of your string in your DB. Plus, using "\n" (regExp newline) is a bit easier than #chr(13)##chr(10)# (ASCII/CF New Line)
                    • 7. Re: Parsing a text file
                      insuractive Level 3
                      Oh, and I should probably mention - I'm not really a regular expression guru. I am, however, a proud downloader of the RegEx Coach, a piece of shareware that makes testing and building regular expressions super easy. Might be a useful tool for anyone else interested in using regular expressions but not wanting to do trial and error testing in CF.
                      • 8. Re: Parsing a text file
                        cf_dev2 Level 1

                        You're right about the newline character. I thought (.) matched any character (including space and newline). But apparently it doesn't : "A period (.) matches any character except newline". So your original solution is more elegant than a regex + find + replace. Thanks for the info :)

                        • 9. Re: Parsing a text file
                          ThomasWood Level 1
                          Well... After some trial and error I got it working. I had some issues with whitespace, tabs, and new lines not being pulled out with \s \t \n or [:space:]

                          I'm sure this was my error, but here was my solution anyway. Any suggestions on cleaning this up would be appreciated.

                          <cfif getMail.RecordCount> My cfpop call
                          <cfsavecontent variable='emailBody'><cfoutput>#getMail.body#</cfoutput></cfsavecontent>
                          <cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>
                          <cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)H ome:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)Coun ty:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

                          <cfset myList = ListToArray(lobType,'|')>

                          It needs some cleaning up from my testing, but it works. Thanks for the help
                          • 10. Re: Parsing a text file
                            ThomasWood Level 1
                            So... I've got the above working, but is their a way to insert a default value if it's null. For example...

                            I'm pulling Work:(.+)Address1: As \6|

                            but I have fed some values in where the string ends up being Work:Address1: so when I try to put this into an array it throws an exception.

                            I tried go back at the end and doing ReReplace(myList,'\|\|','| |','all') but it still throws and error because \6 is never defined.

                            Any ideas
                            • 11. Re: Parsing a text file
                              cf_dev2 Level 1
                              > but I have fed some values in where the string ends up being Work:Address1:


                              Can you post an example of the full values? Maybe someone else can help with the regex.

                              • 12. Re: Parsing a text file
                                ThomasWood Level 1
                                PRODUCT TYPE: Homeowners

                                CONTACT CAN BE REACHED AT:
                                Name: value, test Home: (555)555-1212
                                Address1: 123 Test Ln Work: (555)555-1212
                                City/St/Zip: Wilmington, NC, 28412 Email: wesley@mystateinsurance.com
                                County: New hanover
                                Contact Time: Anytime Respond Time: Within 24 hours

                                This is what it looks like before I run the

                                <cfset noFormat = ReReplace(emailBody,"[^A-Za-z0-9,:@.]+",'\1','all')>

                                after this is run the string looks like

                                PRODUCTTYPE:HomeownersCONTACTCANBEREACHEDAT:Name:value,testHome:5555551212Address1:123Test LnWork:5555551212Address2:City/St/Zip:Wilmington,NC,28412Email:wesley@mystateinsurance.com County:Newhanover

                                Then this is run

                                <cfset lobType = ReReplace(noFormat,'[A-Za-z0-9,:@.]+PRODUCTTYPE:(.+)CONTACTCANBEREACHEDAT:Name:(.+),(.+)H ome:(.+)Address1:(.+)Work:(.+)Address2:[A-Za-z0-9,:]+ityStZip:(.+),(.+),(.+)Email:(.+)Coun ty:(.+)Contact','\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12','all')>

                                and the result is

                                Homeowners|value|test|5555551212|123TestLn|5555551212|Wilmington|NC|28412|wesley@mystatein surance.com|Newhanover|...rest of the string...

                                This works fine assuming all values are present. But if the work number, for instance was blank in the initial string then there is no space in Work:Address2: which ends up not running the entire conversion correctly.