January 13, 2001 Comments
However, if you create such doorway pages, you run a high risk of being penalized for spamming by the search engines. This is because even though the doorway pages are optimized for different search engines, they are still pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Infoseek and vice-versa. The best way to do that is to use a robots.txt file.
You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file.
Here is the basic syntax of the robots.txt file:
User-Agent: [Spider Name]
Disallow: [File Name]
For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write
User-Agent: Scooter
Disallow: /myfile1.html
To tell Excite's spider, called ArchitextSpider, not to spider the files myfile2.html and myfile3.html, you would write
User-Agent: ArchitextSpider
Disallow: /myfile2.html
Disallow: /myfile3.html
You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Excite not to spider the files myfile2.html and myfile3.html, you would write
User-Agent: Scooter
Disallow: /myfile1.html
User-Agent: ArchitextSpider
Disallow: /myfile2.html
Disallow: /myfile3.html
If you want to disallow all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write
User-Agent: *
Disallow: /myfile4.html
However, you cannot use the wildcard character in the Disallow line.
Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory.
I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here.
Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating doorway pages. What you need to do is to prevent each search engine from spidering pages which are not meant for it.
For simplicity, let's assume that you are targeting only two keywords: tourism in Australia and travel to Australia. Also, let's assume that you are targeting only four of the major search engines: AltaVista, Excite, HotBot and Northern Light.
If you recall my article on creating the hallway pages, I had recommended that while naming the doorway pages, you use a suffix to identify the search engine for which you are creating the doorway page. As a convention, you can use the first two letters of each search engine as a suffix.
Hence, the files for AltaVista are
tourism-in-australia-al.html
travel-to-australia-al.html
The files for Excite are
tourism-in-australia-ex.html
travel-to-australia-ex.html
The files for HotBot are
tourism-in-australia-ho.html
travel-to-australia-ho.html
The files for Northern Light are
tourism-in-australia-no.html
travel-to-australia-no.html
As I noted earlier, AltaVista's spider is called Scooter and Excite's spider is called ArchitextSpider.
A list of spiders for the major search engines can be found here.
From this list, we find that the spider for Northern Light is called Gulliver. HotBot uses Inktomi and Inktomi's spider is called Slurp.
Using this knowledge, here's what the robots.txt file should contain:
User-Agent: Scooter
Disallow: /tourism-in-australia-ex.html
Disallow: /travel-to-australia-ex.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-no.html
Disallow: /travel-to-australia-no.html
User-Agent: ArchitextSpider
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-no.html
Disallow: /travel-to-australia-no.html
User-Agent: Slurp
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ex.html
Disallow: /travel-to-australia-ex.html
Disallow: /tourism-in-australia-no.html
Disallow: /travel-to-australia-no.html
User-Agent: Gulliver
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ex.html
Disallow: /travel-to-australia-ex.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines.
When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine.
An useful tool to check the syntax of your robots.txt file can be found here.
Cheap Submission Services
Quality SEO services, dirt cheap prices.
Web Hosting
Windows Hosting and Linux Hosting
ClickSweeper Software
Revenue-driven Bidding, Free Trial
SEO Friendly Directory
Get a Sponsored Listing for only $29!
Video blogger Sage Lewis keeps you up to date with what's hot in the world of search engine marketing.

| www.flickr.com |
Search Engine Guide Blog | Search Engine Marketing | Internet Search Engines | SEM Resources & Consultants | Newsletters | Advertise | About | Site Map
Search marketing information for small business owners.
Fetching the best small business news.
A friendly place to share small business ideas and knowledge.
A different kind of small business marketing conference.
Home of our network.
Copyright © 1998 - 2008 K. Clough, Inc. All Rights Reserved. Privacy
FreeFind Site Search Engine - FreeFind adds a "search this site" feature to your website, making your site easier to use. FreeFind also gives you reports showing what your visitors are searching for, enabling you to improve your site. FreeFind's advanced site search engine and automatic site map technology can be added to your website for free.
(Unpaid placement - FreeFind is a Search Engine Guide partner.)