• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

regexp not working

New Here ,
Mar 22, 2011 Mar 22, 2011

Copy link to clipboard

Copied

trying to strip strings out of a http request using a regexp:


I'm querying a server as to the presence of PDF files in a directory, then parsing a list from the rawhtml, unfortunately, some of the files contain periods and commas, and the Regexp only matches a string from after the punctuation character,

e.g.: "John M. Smith.pdf" gets matched as "Smith.pdf"


Here's the Regexp:

<cffunction name="parsePDF" access="public" returntype="string">
        <cfargument name="RAWHTML" type="string" required="yes">
        <cfset REGEXPMATCH="^[\s\S][[:punct:]]*\.pdf$">
        <cfreturn REMatchNoCase(REGEXPMATCH,RAWHTML)>
   </cffunction>

TIA

Server ProductColdFusion
Version8,0,1,195765 
EditionStandard 

IIS 6

TOPICS
Advanced techniques

Views

547

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 23, 2011 Mar 23, 2011

Copy link to clipboard

Copied

Your regex seems unnecessarily complex (esp given it ain't doing what you need it to 😉

What would be wrong with just:

^.*\.pdf$

?

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 23, 2011 Mar 23, 2011

Copy link to clipboard

Copied

Still doesn't work. It seems to be a problem with the periods being interpreted as CR-LF instead of literals

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 24, 2011 Mar 24, 2011

Copy link to clipboard

Copied

Yeah, sorry, I didn't read your requirement properly.

It's easy enough to tell where the .pdf file name ends in your mark-up (it'll end with ".pdf ;-), but how do you tell where the file name starts?  Given a file name can contain most characters (just slash and null are prohibited by NTFS... Windows is slightly more picky, but still), it kinda means you need to have something you identify as a boundary...?

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 24, 2011 Mar 24, 2011

Copy link to clipboard

Copied

I've tried using a ">" as the beginning boundary, as the raw HTML is returning them being the text in

tags, but I wasn't having any luck.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 25, 2011 Mar 25, 2011

Copy link to clipboard

Copied

Can you elaborate on what you mean by wasn't having any luck? Did you try that second regex I posted?

You might need to attach some sample markup & your code.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 25, 2011 Mar 25, 2011

Copy link to clipboard

Copied

LATEST

It appears to be a problem with the ColdFusion UDF, not the regexp. The expression that ended up working was "[\s_\-A-Za-z0-9\.&,]*\.pdf" I'm going to mark the thread as solved. Thanks for your help.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation