• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Recognizing Bots

Contributor ,
Jun 23, 2007 Jun 23, 2007

Copy link to clipboard

Copied

How can my website recognize a bot? Especially googlebot.

I have PPC on my site and I noticed during testing that when googlebot shows up it clicks on every single link on the page. It even sometimes does searches on the pages and clicks on THOSE links too. I can end up with over a 1000 clicks in a day all from googlebot. Now if I had live PPC going right now there would be some VERY upset people due to click fraud. Matter-of-fact I wonder how much of click fraud is really these bots.

Anyway, I put in the robots.txt file that said for the bot to stay out and that didn't help at all. I have been weeding them out as they come in by blocking the IP addresses from the reporting but there always seems to be more of them. How many IPs does googlebot use????

So, anybody know how to recognize a bot with say cgi variables or something?
TOPICS
Advanced techniques

Views

285

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 23, 2007 Jun 23, 2007

Copy link to clipboard

Copied

The variable you are after is cgi.user_agent you need to find out the googlebot useragent. The other alternative would be to load your PPC campaign via javascript, as bots do not execute javascript and so will never see the links to follow.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 26, 2007 Jun 26, 2007

Copy link to clipboard

Copied

@Dinghus:

It might be that outside sites link to yours, and that robots follow those links to your site.

This meta tag is supposed to prevent a robot from following links on your page(s) once it *is* there:
<meta name="ROBOTS" content="NOFOLLOW">

Not knowing much about robots.txt etc., I found this short article useful:
http://blog.searchenginewatch.com/blog/060927-074214

What I like about a simple meta tag in the head element is that it doesn't change the markup in the body of the page.

Certainly if you identify robots by user-agent and hide the links from them, that will work, but it seems like you'd have to maintain a list of known robots, or white-list known browsers, and maintain the list either way to keep new robots out or to allow new browsers' users to click your links.

Simon's excellent suggestion avoids the need for a robot list (or browser whitelist) -- the only drawback I can think is if it will make your site inaccessible to many mobile devices, or the (very small) percentage of users who browse w/o javascript.

Happy robot-fighting!

-- Laurence

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jun 26, 2007 Jun 26, 2007

Copy link to clipboard

Copied

LATEST
Javascript would seem to be a good idea, until I thought about it and wondered why google can read links dynamically created with coldfusion. So I put some in there with js and google read them. Google is getting really good at reading scripting languages and even inside Flash.

Maybe if I move the js off the page into a seperate file and link to it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation