Robots.txt files are often mentioned as being an important foundation of a search friendly web site. To site owners and small businesses who are new to search marketing, the robots.txt file can sound daunting. In reality, it's one of the fastest, simplest ways to make your site just a little more search engine friendly.

(SEG Bootcamp articles are no-frills content designed to bring small business owners up to speed on the concepts and techniques needed to market their businesses online.)

What is Robots.txt?

Robots.txt is a simple text file that sits on the server with your web site. It's basically your web site's way of giving instructions to search engines about what how they index your web site.

Search Engines tend to look for the robots.txt file when they first visit a site. They can visit and index your site whether you have a robots.txt file or not; having one simply helps them along the way.

All of the major search engines read and follow the instructions in a robots.txt file. That means it's a pretty effective way to keep content out of the search indexes.

A word of warning. While some sites will tell you to use robots.txt to block premium content you don't want people to see, this isn't a good idea. While most search engines will respect your robots.txt file and ignore the content you want to have blocked, a far safer option is to hide that premium content behind a login. Requiring a username and password to access the content you want hidden from the public will do a much more effective job of keeping both search engines and people out.

What Does Robots.txt Look Like?

The average robots.txt file is one of the simplest pieces of code you'll ever write or edit.

If you want to have a robots.txt file for the engines to visit, but don't want to give them any special instructions, simply open up a text editor and type in the following:

User-Agent: *
Disallow:

The "User-Agent" part specifies which search engines you are giving the directions to. Using the asterisk means you are giving directions to ALL search engines.

The "disallow" part specifies what content you don't want the search engines to index. If you don't want to block the search engines from any area of your web site, you simply leave this area blank.

For most small web sites, those two simple lines are all you really need.

If your web site is a little bit larger, or you have a lot of folders on your server, you may want to use the robots.txt file to give some instructions about which content to avoid.

A good example of this would be a site that has printer-friendly versions of all of their content housed in a folder called "print-ready." There's no reason for the search engines to index both forms of the content, so it's a good idea to go ahead and block the engines from indexing the printer-friendly versions.

In this case, you'd leave the "user-agent" section alone, but would add the print-ready folder to the "disallow" line. That robots.txt file would look like this:

User-Agent: *
Disallow: /print-ready/

It's important to note the forward slashes before and after the folder name. The search engines will tack that folder on to the end of the domain name they are visiting.

That means the /print-ready/ file is found at www.yourdomain.com/print-ready/. If it's actually found at www.yourdomain.com/css/print-ready/ you'll need to format your robots.txt this way:

User-Agent: *
Disallow: /css/print-ready/

You can also edit the "user-agent" line to refer to specific search engines. To do this, you'll need to look up the name of a search engine's robot. (For instance, Google's robot is called "googlebot" and Yahoo's is called "slurp.")

If you want to set up your robots.txt file to give instructions ONLY to Google, you would format it like this:

User-Agent: googlebot
Disallow: /css/print-ready/

How do I Put Robots.txt on my Site?

Once you've written your robots.txt file to reflect the directions you want to give the search engines, you simply save the text file as "robots.txt" and upload it to the root folder of your web site.

It's that simple.

Want to learn more? Check out these resources:

Official Google Blog: Controlling How Search Engines Access and Index Your Website

The Web Robots Page


January 9, 2008





Jennifer Laycock is the Editor of Search Engine Guide, the Social Media Faculty Chair for MarketMotive and offers small business social media strategy & consulting. Jennifer enjoys the challenge of finding unique and creative ways to connect with consumers without spending a fortune in marketing dollars. Though she now prefers to work with small businesses, Jennifer’s clients have included companies like Verizon, American Greetings and Highlights for Children.





Comments(8)

"While some sites will tell you to use robots.txt to block premium content you don't want people to see, this isn't a good idea."

And for those who don't know, the reason for this is that anyone can read the file by typing robots.txt after the URL to then find this folder, like this: http://www.searchengineguide.com/robots.txt

Hey, great information. I’m adding this link to our team reading list!

hey Jennifer,
about the robot txt. file.... I have tried to do that on my site www.patternstudioa.com but it seems that my host officelive will not allow me to take control of my panel... cause when i do upload code of any kind in the text editor > module> html, and paste, my page freezs up and then crashes....
it really sucks. cause I cant do google adsense or get it verified...
what can i do about that.
dean

I was always wondering about robots. I heard somewhere before that having robots.txt is a must, but I never knew exactly how important it is.
Thanks for your great article.

At first I thought that robot text files are so hard to figure out, but after reading your entry I may give it a try since it seems to be simple, thanks for this valuable information Jenny, you really helped us a lot..

Thanks for the great info, robo text are useful in on-page SEO, most people disregard its usage or don't even know how to use it so I highly recommend them to read this great article!!

that's great resources..

Hi Jeniffer,

I may be late to the party, but would like to know if the 'Noindex' attribute is supported in the robots.txt file ?

Comments closed after 30 days to combat spam.


Search Engine Guide > Jennifer Laycock > Search Marketing Bootcamp: Robots.txt File