Can anyone tell me how to provent media downloads to show in google search? I made them secure so they can't be opened but they still show up. It woudln't be a problem if Google didn't display the first part of the content, which includes names and some sensitive information. I tried placing <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> on the page where I have them listed but this didn't help. Yet there is no way (as far as I know) to add meta tag the file itself, in this case PDF. Any ideas? This is very important. Thanks a lot.
Thanks Mario, but where do I put that? In the media download list template? On the page where the media downloads module is?, On the template holding the page? I don't see a way to put it in each individual media download. That's the issue. If we hide the page will all the items on the page also be hidden from the robots?
The meta data should be added to the page your document links are on. You can simply embed it inside the page and the rendering engine will move the tag into the head of the document, where should reside normally.
The robots file would be uploaded to the root folder of your site e.g. www.yoursite.com/robots.txt
You can place the URLs you don't want indexed into that file and the search engine should obey it.
Mario, In addition to meta tags I put in the template hosting the pages where all the media downloads listings are, I also tried using robots.txt and Google Webmaster Tools in conjunction with the robots.txt in order to exclude the entire directory and to expedite the proces of removal from the cache. When I entered a path for several individual media downloads into the Webmaster Tool, Google removed those individual files. Good. However when I included the directory containing the pages with media download listings, it didn't have any effect. Remember I also put the meta tags in the template. So the conclusion is that by excluding the page containing the links doesn't exclude the medi downloads included on that page. It seems the crawler finds these files elsewhere. So I tried excluding the page that calls the media downloads (www.domainname/LiteratureRetrieve.aspx) into the Webmaster tool and that didn't work either. Since we don't have the path to the directrory that hosts the media downloads, we can't target the directory. Unless you can tell me what I am doing wrong and how to fix it, It appears that I need to go and exclude each individual file by pasting the path directly into the webmaster tool. I am tallking hundreds of items. I also have to do it for each media upload I add in the future. Please let me know.
After I posted the last comment, I went to search more about robots.txt and x-robots-tag. reading several articles (one of them: http://www.seochat.com/c/a/Search-Engine-Optimization-Help/Using-the-X RobotsTag-HTTP-Header-Specifications-in-SEO-Tips-and-Tricks/ ) I confirmed that meta tag or robots.txt will not exclude the files on the server such as images, audio, videos, pdf because these documents don't have <head> tag. A solution is X-robots-tag that is able to target these files; however, X-robot-Tag can't be implemented in HTML page. So I think the only option is to block the entire site or each file individually using Webmaster Tools.
If I am correct about this, than this really sucks. What is the point of having secure items when people can read the first lines of each document just by searching for it in Google.
I am not talking about securing an item. I am talking about preventing it from showing in search results on serch engines. Before I take other actrions to have them removed, go and type in google "Toyota Denver Napa" and you will find a bunch of PDF files showing with url starting with www.thewellingtongroup/LiteratureRetrieve.aspx?ID= All these PDF files are Literatures items that have been placed in teh secure zone. If you click on them, it will ask you to put in the password. However, if you click on "Quick View" right there next to the link in Google, you will see the entire pdf file open right in front of you. So, to answer your question, yes it is a pdf and yes it is a literature item and yes it is secure and yes you can still see the entire document in Google. The only way to avoid this is not to have it show in the search results. Well, BC doesn't give you this option because it can't be done with meta tags or with robots.txt. It can only be done using x-robots-tag in php or by entering each file individually into the Webmaster tool. I have 354 files to enter. If you have a better way of doing it, let me know. In the meantime please send my regards to the development team and tell them they have a huge hole in their security concept.
Secured literature items should not be indexed by Google, please go to the Webmaster tools and trigger a re-index of the site, I believe the resources were indexed before you addd them to the secure zone.
After the re-index process is complete the Literature items should not even appear in the search results any more.
Do I need to remove the robots.txt exclusing some files and directories as well as the meta robot in the template requesting nofollow, noindex, noarchive in order to allow google to reindex the entire site including these folders and files I had previously excluded?
The site is re-indexed once a day. Since I started this, the site was crawled already 3 times. The meta tag in the template now includes noarchive in order to strip away all the secure files that have been cached and are not supposed to show. (yes they were made secure after they have been already indexed). I also removed the robots.txt and canceled all exclusions I placed in the webmaster tool to allow the robots to reindex the entire site. So if you are correct, the secure files should disapear after the next crawl. I will let you know.
Over the last 10 days I did a lot more testing using meta tags, robots.txt and webmaster tools just to see which of all the claims above is holding up.
Here is what I found out:
- if a media downolad item was secured right away when posted, it will not show up in search engines.
- if a media download was not secured, it will show in the serach engines.
- to remove media downloads item from the Google's cache, you will first have to secure the item than use the google webmaster tools and remove it. - if you have multiple items, you will need to do that for each item individually.
- once cached by a serch engine, media files such as pdf or mov cannot not be removed it using meta tags, robots.txt, reindexing regardless what you try and how you set them up.
I am still questioning why is this not mentioned in the knowledgebase.