8 Replies Latest reply on Aug 27, 2008 11:04 AM by tclaremont

    Mining or Harvesting

      Im not sure what you call this - I would call it mining or harvesting.

      Curious to know if anyone has attempted to do this, or has done it.

      I want to be able to take our everyday yellowpages.com website, and be able to evaluate the page, and harvest the data from a search query.

      What coldfusion functions would I use to do such a task?

      Any help would be great.

        • 1. Re: Mining or Harvesting
          Dan Bracuk Level 5
          From your yellowpage.com website? Take the code that produces the page and make it reuseable using one of the methods available to do that.
          • 2. Re: Mining or Harvesting
            cwmcguire Level 1
            I did not mean "our" as in my site - i used that openly. I want to search their database, and take the results page and harvest the info desired on that page.
            • 3. Re: Mining or Harvesting
              Dan Bracuk Level 5
              Step 1. Ask for permission.
              • 4. Mining or Harvesting
                cwmcguire Level 1
                You dont have to ask permission to use their site or call the phone number listed on the results page...do you?

                Obviously Dan, you are not right person here to answer my original question.

                All I want to know is if it is possible to do it, and the coldfusion functions involved in making it happen.

                • 5. Mining or Harvesting
                  Kronin555 Level 1
                  Of course it's possible to do. Heck, you could do what you're asking with wget or curl.

                  Step 2. Read the ColdFusion documentation.

                  Then you'll have to parse the resulting page. You can do that in a number of ways. If it's valid XHTML, you can parse it into an XML doc object and do an xpath query on it. If it's not valid XHTML, then there's other ways to do it.

                  Have you done anything like this in other languages? It'd make learning how to do it in CF very simple. If you've never done anything like this before, google is your friend.

                  And for the record, I agree with Dan. Ask for permission. I also agree with you, you don't have to ask them for permission to do a search, however you're not using the data yourself, you're rebranding it, and profiting off it.
                  • 6. Mining or Harvesting
                    davidsimms Level 1

                    Actually you don't even have to ask for permission. The yellowpages.com site is pretty clear that you can't do what you want to do. http://www.yellowpages.com/about/terms states:

                    "You are prohibited from data mining, scraping, crawling, or using any process or processes that send automated queries to the YELLOWPAGES.COM Web site. You may not use the YELLOWPAGES.COM Web sites to compile a collection of listings, including a competing listing product or service."

                    • 7. Re: Mining or Harvesting
                      cwmcguire Level 1
                      That is correct. But it does say...

                      "Accordingly, You may view, use, copy, and distribute the Materials found on YELLOWPAGES.COM Web sites for internal, noncommercial, informational purposes only."

                      Which could be interpeted as "copy" of the data provided by them. What I am using it for is strictly non-commercial anyways. But, I have already figured out myself how to build it, and it is working beautifully.

                      What was take me hours to copy and paste, I can collect 25 addresses at a time in less than 1 minute.

                      Thanks though.

                      • 8. Re: Mining or Harvesting
                        tclaremont Level 2
                        Common sense, albeit not that common nowadays, says that the first rule prohibiting data mining seems to address the very effort you are attempting.

                        The second statement, which it appears you want to interpret to suit your needs, appears to address the casual looking up of Mary Smith's information. You may than distribute Mary Smith's phone number to whomever you choose.

                        If I read what you are trying to achieve in the original post, and then read the published limitations and terms of use, you are doing exactly what they DON'T want you to do.

                        As you have no doubt learned by now, the art of programming this method is trivial. The only thing stopping you legally is the terms of use.