• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Help/Advice with CFHTTP Parser

Guest
Jan 06, 2008 Jan 06, 2008

Copy link to clipboard

Copied

Hi Folks,

I have built a very simple parser using CFHTTP to grab prices and stock availability on various products. Using the same theory, it works fine with some sites but no with others. I figured I'd pick out one of them as an example to see where it might be falling down.

My question is two-fold - firstly, why does this example produce an error, and secondly, is there a better way of doing this (I don't have access to anything other than the page data).

So, on to the script;

<!--- cfhttp scraper --->
<!--- price --->
<cfset nstartcode='<div class="standard-price"> <span class="label">Price </span> £'>
<cfset nendcode='</div>
<div class="deliveryShortcut">'>
<!--- get the page --->
<cfhttp url=" http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
<cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
<cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<!--- stock --->
<cfset astartcode='class="stock">'>
<cfset aendcode='</div>'>
<!--- get the page --->
<cfhttp url=" http://www.dixons.co.uk/martprd/product/seo/626225/?int=pleo" method="get">
</cfhttp>
<!--- parse the output --->
<cfset nStart=Find(astartcode, cfhttp.FileContent) +14>
<cfset nEnd=Find(aendcode, cfhttp.FileContent, nStart+1)>
<cfset stock_status=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
<cfset stock='#stock_status#'>
<!--- --->

The following error is returned when I run the above;

<!--- error message --->
The 3 parameter of the Mid function, which is now -14295, must be a non-negative integer

The error occurred in **removed**\parse.cfm: line 11

9 : <cfset nStart=Find(nstartcode, cfhttp.FileContent) +0>
10 : <cfset nEnd=Find(nendcode, cfhttp.FileContent, nStart+1)>
11 : <cfset liveprice=Mid(cfhttp.FileContent, nStart, nEnd - nStart)>
12 : <!--- stock --->
13 : <cfset astartcode='class="stock">'>
<!--- --->

Any help or advice would be greatly appreciated.
TOPICS
Advanced techniques

Views

324

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 07, 2008 Jan 07, 2008

Copy link to clipboard

Copied

I have examined the source code of the page's HTML. The structure is too disorganised for you to try something like that. Then it is not automated mining, but mining with fingers and nails.

There are just no obvious patterns. In such cases it often helps to parse the source as XML, and pick out the data from there.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 07, 2008 Jan 07, 2008

Copy link to clipboard

Copied

LATEST
Thanks BK, I was hoping that wouldn't be the case.

I appreciate your input.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation