Hi Jill,

I have a technical question that I can't answer, so I'm hoping you can help!

We've been asked about implementing mod_rewrite on a database-driven site as a tool to improve search engine rankings, but this is something that I know nothing about and can't find anything helpful via Google. I find sites saying how to use it, and saying it will help, but I can't find any figures to prove it. Do the search engines see it as a 'trick' to boost rankings, or is it seen as a genuine attempt to make the site more friendly and informative to the user, given that it will change query strings into nice understandable URLs?

Any help would be much appreciated!

Emma

Jill's Response

Hi Emma,

Good question! We get a lot of questions at the forum about mod_rewrite. It seems that there's a lot of misunderstanding about it due to the fact that search engines are constantly evolving (for the better). The improvement in search engine capabilities often means that tools and programs that once were considered imperative may not actually be needed anymore.

First, let's back up a bit and explain what mod_rewrite is (as it relates to search engine optimization and dynamic URLs) and why you might be considering using it.

Search engines send out their spiders to crawl the Web and bring back as many URLs (Website addresses) and resulting pages as they can. The spiders basically travel from link to link gobbling up all the info they find. This has always worked fine with what's known as "static URLs," which generally contain information that is already compiled into a Web page on your site.

The trouble starts when the spiders get to certain dynamic pages found on dynamically generated sites. Dynamically generated pages don't actually exist on the server until someone clicks a link to retrieve the information. When that happens, data that is stored on the server is automatically pulled from the database and compiled into a Web page. To people like you and me viewing the page, we really don't notice the difference between a static HTML page and a dynamically generated one.

However, dynamically generated pages usually leave clues as to their dynamic nature, and these clues lie in the format of their URL. Instead of a simple URL such as www.mysite.com/productpage.htm you may see something like www.mysite.com/mypage?product=1&type=large&color=teal. That type of compilation allows the server to create a new page based on the exact specifications that the user was looking for. In that fictional dynamic-URL example, you can infer that the page is probably going to be showing some product that comes in a "large" and is teal-colored. Most sites that have lots of products use a database to pull the info from, which provides this sort of flexibility.

The problem with these types of long URL strings is that they can lead to many different addresses which in turn lead to pages that are very similar (and often the same) as other ones on your site. If a search engine spider were to gobble up all the different variations that each product on a site might have, it could end up swallowing millions of similar pages. There's also a chance that the spider would get so busy eating up these pages that it would get stuck on one site for hours or even days, because the number of URLs for it to ingest are infinite. (Kinda like me with my chocolate supply!)

Because of those problems, for years the search engines had programmed their spiders to stay away from URLs that contained many query strings (question marks, equal signs, etc.). Very often those sites didn't get added to the search engines' databases.

But, things change. These days, dynamically generated sites have become standard practice for millions of sites. The search engine programmers understand this, and have figured out ways to allow their spiders to crawl these sites, although somewhat more slowly at times.

A look at the search results at Google and Yahoo show tons of pages that have dynamic looking URLs. Take my forum pages, for instance. A typical thread there has at least 1 equal sign ( = ) and 1 question mark ( ? ), but these URLs are getting indexed every day without a problem. Even longer URLs from the forum with many query strings are getting spidered and indexed. A quick check at Google shows me that at least 12,500 URLs from my forum are in Google's index!

For example, take a look at this one: http://www.highrankings.com/forum/index.php?showtopic=6335&st=90. We've got a question mark, 2 equal signs and an ampersand, yet it shows in Google's cache without a problem.

Okay, so back to mod_rewrite. The idea behind using mod_rewrite for dynamic sites was to change dynamic-looking URLs into static-looking ones. Since question marks and equal signs were the usual tip-off to the search engine spiders to avoid certain pages/URLs, many smart programmers would use mod_rewrite on their server in a special way that would enable them to change those characters into some other character that was more crawler-friendly -- such as a forward slash ( / ).

For example, my above forum page URL using mod_rewrite might instead look like this:

http://www.highrankings.com/forum/index.php/showtopic/6335/st/90

In this instance, the question marks, equal signs and ampersand were turned into slashes. Now the URL *appears* to be one that would be used for a static page (even though it's not). This should theoretically make it more apt to be indexed by the search engines.

Before the engines started indexing URLs with query strings without many problems, using this technique was a good idea to encourage spidering. There's nothing "spammy" about it, and it won't get you into trouble with the engines if you do it right. However, these days, it really doesn't seem to be necessary for most sites.

My theory is always "if it ain't broke, don't fix it," which is why I wouldn't recommend moving towards this solution unless you are very sure it is the only way your URLs will get indexed. According to my programming friends, this solution uses up lots of extra server processing time, which can slow down your pages, and perhaps use up more of your bandwidth. I've also seen instances where the engines index both the dynamic-looking and static-looking URLs, which only serves to cause more problems!

For more information on indexing dynamic sites, I suggest reading my interview with Alan Perkins, and doing a search at the High Rankings forum to check out the various threads we've had on the topic.

Hope this helps!

Jill
June 29, 2004





CEO and founder of High Rankings®, Jill Whalen has been performing search engine optimization since 1995 and is the host of the free High Rankings Advisor search engine marketing newsletter, author of "The Nitty-gritty of Writing for the Search Engines" and founder/administrator of the popular High Rankings Search Engine Optimization Forum. In 2006, Jill co-founded SEMNE, a local search engine marketing networking organization for people and companies in New England.

High Rankings is an internationally recognized search engine optimization firm located in Framingham, MA specializing in search engine optimization, SEO consultations, in-house training, site audit reports, search marketing seminars and workshops. High Rankings has a 100% success rate for substantially improving client rankings and targeted traffic.

Jill speaks at national and international conferences and has been writing about SEO and search marketing since 2000. She's been quoted in such publications as The Wall Street Journal, U.S. News & World Report and The Washington Post. Her articles have appeared in numerous print magazines and online websites including CIO Magazine, CMS Focus, The Internet Marketing Report, ClickZ, WorkZ, Inc.com, Entrepreneur, Lycos Small Business, WebProNews, SiteProNews and others. Jill has also appeared on many online and offline radio programs such as Entrepreneur Magazine's E-Biz Radio Show, SearchEngineRadio and the eMarketing Talkshow.





Search Engine Guide > Jill Whalen > To Mod_Rewrite or Not