• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Mining or Harvesting

Guest
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

Im not sure what you call this - I would call it mining or harvesting.

Curious to know if anyone has attempted to do this, or has done it.

I want to be able to take our everyday yellowpages.com website, and be able to evaluate the page, and harvest the data from a search query.

What coldfusion functions would I use to do such a task?

Any help would be great.

Thanks.
TOPICS
Advanced techniques

Views

655

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

From your yellowpage.com website? Take the code that produces the page and make it reuseable using one of the methods available to do that.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

I did not mean "our" as in my site - i used that openly. I want to search their database, and take the results page and harvest the info desired on that page.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

Step 1. Ask for permission.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

You dont have to ask permission to use their site or call the phone number listed on the results page...do you?

Obviously Dan, you are not right person here to answer my original question.

All I want to know is if it is possible to do it, and the coldfusion functions involved in making it happen.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advocate ,
Aug 17, 2008 Aug 17, 2008

Copy link to clipboard

Copied

Of course it's possible to do. Heck, you could do what you're asking with wget or curl.

Step 2. Read the ColdFusion documentation.
<cfhttp>

Then you'll have to parse the resulting page. You can do that in a number of ways. If it's valid XHTML, you can parse it into an XML doc object and do an xpath query on it. If it's not valid XHTML, then there's other ways to do it.

Have you done anything like this in other languages? It'd make learning how to do it in CF very simple. If you've never done anything like this before, google is your friend.

And for the record, I agree with Dan. Ask for permission. I also agree with you, you don't have to ask them for permission to do a search, however you're not using the data yourself, you're rebranding it, and profiting off it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Aug 26, 2008 Aug 26, 2008

Copy link to clipboard

Copied

cwm,

Actually you don't even have to ask for permission. The yellowpages.com site is pretty clear that you can't do what you want to do. http://www.yellowpages.com/about/terms states:

"You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service."

David

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 26, 2008 Aug 26, 2008

Copy link to clipboard

Copied

That is correct. But it does say...

"Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only."

Which could be interpeted as "copy" of the data provided by them. What I am using it for is strictly non-commercial anyways. But, I have already figured out myself how to build it, and it is working beautifully.

What was take me hours to copy and paste, I can collect 25 addresses at a time in less than 1 minute.

Thanks though.

Chuck

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Aug 27, 2008 Aug 27, 2008

Copy link to clipboard

Copied

LATEST
Common sense, albeit not that common nowadays, says that the first rule prohibiting data mining seems to address the very effort you are attempting.

The second statement, which it appears you want to interpret to suit your needs, appears to address the casual looking up of Mary Smith's information. You may than distribute Mary Smith's phone number to whomever you choose.

If I read what you are trying to achieve in the original post, and then read the published limitations and terms of use, you are doing exactly what they DON'T want you to do.

As you have no doubt learned by now, the art of programming this method is trivial. The only thing stopping you legally is the terms of use.



Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation