3 Replies Latest reply on Jun 26, 2007 10:33 AM by Dinghus

    Recognizing Bots

    Dinghus Level 1
      How can my website recognize a bot? Especially googlebot.

      I have PPC on my site and I noticed during testing that when googlebot shows up it clicks on every single link on the page. It even sometimes does searches on the pages and clicks on THOSE links too. I can end up with over a 1000 clicks in a day all from googlebot. Now if I had live PPC going right now there would be some VERY upset people due to click fraud. Matter-of-fact I wonder how much of click fraud is really these bots.

      Anyway, I put in the robots.txt file that said for the bot to stay out and that didn't help at all. I have been weeding them out as they come in by blocking the IP addresses from the reporting but there always seems to be more of them. How many IPs does googlebot use????

      So, anybody know how to recognize a bot with say cgi variables or something?
        • 1. Re: Recognizing Bots
          Stressed_Simon Level 1
          The variable you are after is cgi.user_agent you need to find out the googlebot useragent. The other alternative would be to load your PPC campaign via javascript, as bots do not execute javascript and so will never see the links to follow.
          • 2. Re: Recognizing Bots
            Laurence Middleton

            It might be that outside sites link to yours, and that robots follow those links to your site.

            This meta tag is supposed to prevent a robot from following links on your page(s) once it *is* there:
            <meta name="ROBOTS" content="NOFOLLOW">

            Not knowing much about robots.txt etc., I found this short article useful:

            What I like about a simple meta tag in the head element is that it doesn't change the markup in the body of the page.

            Certainly if you identify robots by user-agent and hide the links from them, that will work, but it seems like you'd have to maintain a list of known robots, or white-list known browsers, and maintain the list either way to keep new robots out or to allow new browsers' users to click your links.

            Simon's excellent suggestion avoids the need for a robot list (or browser whitelist) -- the only drawback I can think is if it will make your site inaccessible to many mobile devices, or the (very small) percentage of users who browse w/o javascript.

            Happy robot-fighting!

            -- Laurence
            • 3. Re: Recognizing Bots
              Dinghus Level 1
              Javascript would seem to be a good idea, until I thought about it and wondered why google can read links dynamically created with coldfusion. So I put some in there with js and google read them. Google is getting really good at reading scripting languages and even inside Flash.

              Maybe if I move the js off the page into a seperate file and link to it.