How can I prevent Google and other search engines from indexing the test server I use to develop client websites. I'm pretty sure you can do this with a robots.txt file but I don't know what to put in the file or where to put it on my server. Thanks in advance for your help.
Here's a simple version of what you need to do:
http://www.robotstxt.org/robotstxt.html
It just goes in the web directory of your site (eg: /www or /public_html). I would also make sure that you password it too and use other methods protection. That way if some bot decides to ignore the robots.txt file, which is not supposed to but can happen, the person visiting the site gets forced to your login page.
Thanks for the reply.
I probably need to put more thought into this as access to my test site section does not require a password. Also, I only want robots to ignore my client directory, which is www.mycompany.com/clients/ Where should I put the robots.txt file to prevent robots for indexing ONLY my clients directory and subdirectories?
http://en.wikipedia.org/wiki/Robots.txt See Examples
Place robots.txt in the root folder of your website
To exclude indexing one folder named "test" only
User-agent: *
Disallow: /test/
Remember that robots.txt works on an honor basis. It asks bots not to index. They obey your instructions only if they're programmed to do so. Google honors robots.txt according to Google.
But
"While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results."
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
So, no guarantees your test sites won't end up on the web somewhere without password protection.
Yes, according to my reading of the Google link.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /junk-directory/
If you have any further questions on this, feel free to read through the Wikipedia and Google links which should be able to provide most of the answers.
North America
Europe, Middle East and Africa
Asia Pacific