Preventing web crawlers from indexing everything



Ok, so we’ve seen how to password protect directories to keep the web crawlers out, but I don’t want to go through that. I want to keep the page open, but I don’t want it spidered and indexed by the bots.

There are ways for doing this too. In fact there are several. The most commonly accepted and respected way of telling a bot not to crawl certain areas of a website is with what’s called a robots.txt file. Usually this is put in the same folder as your main site index and looks like this.

User-agent: *
Disallow: /

The above will keep all robots out of your site. This might be too heavyhanded though. Let’s say the msnbot has been a bit too voracious with your downloads area

User-agent: msnbot
Disallow: /downloads/

That should be enough to keep it out of that folder. Here’s another example.

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

You can get more complicated than this if you need to.
Here’s a link to googles robots.txt file

To exclude a specific file from being indexed, you might try the following meta tag in your document.
you can also use index and follow to fine tune what you want to restrict or allow.

I’m not certain that the respecting of a meta tag is widely held. robots.txt is more likely to be followed.

To just exclude the googlebot, you might try this…

according to Googles page on removing pages from the index

Google apparently will respect that tag and that would allow other bots through.

Related Posts

Blog Traffic Exchange Related Posts
  • Creating a redirect page This is one that comes in handy a lot. Like many things in computing there are a number of ways to accomplish this. My favorite though is one fo the simplest. But first, it's probably worth asking why you would want a redirect page and just clarify what I mean.......
  • Search Engines Submission This is what hopefully will be the begining of a series of threads on webhosting, web marketing, etc. The first topic up is search engines submission. (This may take several articles...) It's important to get your web site up and mostly fleshed out before submitting to search engines. Why? Although......
  • Make an autorun cd show a web document on autoplay... There's a utility called Thumbs that looks like a good quick way to make a cd launch a web documented on autoplay in Windows 95/98/ME/NT/2000/XP/ ...Of course, autoplay under windows is fairly easy to setup. If you have a program on the disk you can just have autorun.inf in the......
Blog Traffic Exchange Related Websites
  • Search Engine Optimization- The Significance From a small local business in a town to a big industry playing on an international platform, marketing has been a prime tool for any company or business to succeed. People started with using tools like write-ups, references, pamphlets and much more to convey their means or advertise themselves among......
  • Keep Those Spammers Out With .htaccess File Spammers possess a skill for creating overrides for you to even probably the most guaranteed aspect of the system such as these which are not readily acknowledged as potential locates. The .htaccess file can be used to preserve e-mail harvesters away. That is considered extremely successful since all of these......
  • SEO Page techniques That Really Work It is true there is a ton of information on the web about SEO, and finding what is right and wrong can seem impossible to do. What we are going to do is offer a few solid tips for optimizing your sites for sites for on-page SEO factors.One of the......
www.pdf24.org    Send article as PDF   

Similar Posts


See what happened this day in history from either BBC Wikipedia
Search:
Keywords:
Amazon Logo

Comments are closed.


Switch to our mobile site