Preventing web crawlers from indexing everything



Ok, so we’ve seen how to password protect directories to keep the web crawlers out, but I don’t want to go through that. I want to keep the page open, but I don’t want it spidered and indexed by the bots.

There are ways for doing this too. In fact there are several. The most commonly accepted and respected way of telling a bot not to crawl certain areas of a website is with what’s called a robots.txt file. Usually this is put in the same folder as your main site index and looks like this.

User-agent: *
Disallow: /

The above will keep all robots out of your site. This might be too heavyhanded though. Let’s say the msnbot has been a bit too voracious with your downloads area

User-agent: msnbot
Disallow: /downloads/

That should be enough to keep it out of that folder. Here’s another example.

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

You can get more complicated than this if you need to.
Here’s a link to googles robots.txt file

To exclude a specific file from being indexed, you might try the following meta tag in your document.
you can also use index and follow to fine tune what you want to restrict or allow.

I’m not certain that the respecting of a meta tag is widely held. robots.txt is more likely to be followed.

To just exclude the googlebot, you might try this…

according to Googles page on removing pages from the index

Google apparently will respect that tag and that would allow other bots through.

Related Posts

Blog Traffic Exchange Related Posts
  • Search Engines Submission This is what hopefully will be the begining of a series of threads on webhosting, web marketing, etc. The first topic up is search engines submission. (This may take several articles...) It's important to get your web site up and mostly fleshed out before submitting to search engines. Why? Although......
  • Search engines to blame for malware spread? There are a couple news stories about a McAfee SiteAdvisor report about the search engines responsibility for sites that distribute malware. McAfee said Friday that the epidemic of spyware and viruses could be linked to search engines. According to research from the company, even seemingly benign search terms could bring......
  • Saving you from yourself or specifying which index file to use with apache As I said, I mistakenly uploaded a page of links that I use for the main administration across many sites to this domain. Unfortunately, the server preferred using the index.html to the index.php that serves up the USUAL home page. So, for about an hour after my slipup.... the main......
Blog Traffic Exchange Related Websites
  • SEO Tips for Blog Traffic Generation While it may be true to say content is king when it comes to blog publishing, the truth is that writing your blog content is not by far the only thing that you should be focusing on when it comes to attracting a readership following. Quality SEO, or search engine......
  • To Make Certain Each Web Page Of One's Site Is Listed In Search Engines Like Google Even when you desire to internet search engine boost your website, write intended for followers initially and appearance motor robots next. Google, Live messenger, Yahoo, and so on., are in possession of some really smart software creeping the web, however spiders don't buy merchandise inside online retailers, join news letters......
  • Powerful SEO Procedures To generally be successful in SEO link building, here are a few strategies that can be done. The process could be distinct but the result will be excellent. You can apply one of these techniques and acquire wonderful result after some time. Or you can look for the best......
en.pdf24.org    Send article as PDF   

Similar Posts


See what happened this day in history from either BBC Wikipedia
Search:
Keywords:
Amazon Logo

Comments are closed.


Switch to our mobile site