4 Replies Latest reply: Mar 28, 2012 11:11 AM by BKBK RSS

    Just getting numbers from a string

    wmkolcz Community Member

      I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

       

       

      <ul>

      <li><a href="details.cfm?id=20>Steve String</a></li>

      <li><a href="details.cfm?id=50>Mary String</a></li>

      <li><a href="details.cfm?id=120>Jerry String</a></li>

      </ul>

       

      I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

        • 1. Re: Just getting numbers from a string
          wmkolcz Community Member

          Ok, I used  <cfset cleanDocs = reReplace( theDocs, "[^[:digit:]]", ' ', "all") /> which infact did remove all the non numeric characters but leaves me with duplicates (this is a weird escape(url) in with the hyperlinks so the ID is in there twice. Anyone know how to remove duplicates from a string?

          • 2. Re: Just getting numbers from a string
            Dan Bracuk Community Member

            cflib.org has a function called listdistinct that would probably help you out.

            • 3. Re: Just getting numbers from a string
              wmkolcz Community Member

              Thanks Dan. I ended up turning the white space into comma, then turned it into a list which i then looped over and removed the duplicates. Seems to work since I couldn't see the white spaces in between the numbers. Looked like only 1 space per but ended up being varied.

              • 4. Re: Just getting numbers from a string
                BKBK Community Member

                wmkolcz wrote:

                 

                I used screen scraping to get html from a number of our web sites that we are converting to a CMS. I was asked to grab one block of code that includes links to individuals using a numeric ID as their primary key. This is  where my 'expertise' ends...lol. Lets say you have a list block like:

                 

                 

                <ul>

                <li><a href="details.cfm?id=20>Steve String</a></li>

                <li><a href="details.cfm?id=50>Mary String</a></li>

                <li><a href="details.cfm?id=120>Jerry String</a></li>

                </ul>

                 

                I want to strip out the ID's to only produce: 20 50 120 or 20,50,120 (would be optimal). How I can use probably regex to remove all non numeric numbers from a block of string?

                 

                The following assumes ColdFusion 8 or newer (Rematch). I have also assumed that the IDs are 2 or 3 digits long. You can easily adapt the code as appropriate.

                 

                <cfsavecontent variable="block"><ul>

                <li><a href="details.cfm?id=20>Steve String</a></li>

                <li><a href="details.cfm?id=50>Mary String</a></li>

                <li><a href="details.cfm?id=120>Jerry String</a></li>

                </ul>

                </cfsavecontent>

                 

                <!--- Raw list is of the form =20>, =100>, etc.  --->

                <cfset rawList = arrayToList(REMatch("(=[0-9]{2,3}>)",block))>

                <cfset numberList = replaceList(rawList,"=,>",",")>

                 

                <cfoutput>#numberList#</cfoutput>