Hi Jill,
I have a technical question that I can't answer, so I'm hoping you can
help!
We've been asked about implementing mod_rewrite on a database-driven
site as a tool to improve search engine rankings, but this is
something that I know nothing about and can't find anything helpful
via Google. I find sites saying how to use it, and saying it will
help, but I can't find any figures to prove it. Do the search engines
see it as a 'trick' to boost rankings, or is it seen as a genuine
attempt to make the site more friendly and informative to the user,
given that it will change query strings into nice understandable URLs?
Any help would be much appreciated!
Emma
Jill's Response
Hi Emma,
Good question! We get a lot of questions at the forum about
mod_rewrite. It seems that there's a lot of misunderstanding about it
due to the fact that search engines are constantly evolving (for the
better). The improvement in search engine capabilities often means
that tools and programs that once were considered imperative may not
actually be needed anymore.
First, let's back up a bit and explain what mod_rewrite is (as it
relates to search engine optimization and dynamic URLs) and why you
might be considering using it.
Search engines send out their spiders to crawl the Web and bring back
as many URLs (Website addresses) and resulting pages as they can. The
spiders basically travel from link to link gobbling up all the info
they find. This has always worked fine with what's known as "static
URLs," which generally contain information that is already compiled
into a Web page on your site.
The trouble starts when the spiders get to certain dynamic pages found
on dynamically generated sites. Dynamically generated pages don't
actually exist on the server until someone clicks a link to retrieve
the information. When that happens, data that is stored on the server
is automatically pulled from the database and compiled into a Web
page. To people like you and me viewing the page, we really don't
notice the difference between a static HTML page and a dynamically
generated one.
However, dynamically generated pages usually leave clues as to their
dynamic nature, and these clues lie in the format of their URL.
Instead of a simple URL such as www.mysite.com/productpage.htm you may
see something like
www.mysite.com/mypage?product=1&type=large&color=teal. That type of
compilation allows the server to create a new page based on the exact
specifications that the user was looking for. In that fictional
dynamic-URL example, you can infer that the page is probably going to
be showing some product that comes in a "large" and is teal-colored.
Most sites that have lots of products use a database to pull the info
from, which provides this sort of flexibility.
The problem with these types of long URL strings is that they can lead
to many different addresses which in turn lead to pages that are very
similar (and often the same) as other ones on your site. If a search
engine spider were to gobble up all the different variations that each
product on a site might have, it could end up swallowing millions of
similar pages. There's also a chance that the spider would get so
busy eating up these pages that it would get stuck on one site for
hours or even days, because the number of URLs for it to ingest are
infinite. (Kinda like me with my chocolate supply!)
Because of those problems, for years the search engines had programmed
their spiders to stay away from URLs that contained many query strings
(question marks, equal signs, etc.). Very often those sites didn't
get added to the search engines' databases.
But, things change. These days, dynamically generated sites have
become standard practice for millions of sites. The search engine
programmers understand this, and have figured out ways to allow their
spiders to crawl these sites, although somewhat more slowly at times.
A look at the search results at Google and Yahoo show tons of pages
that have dynamic looking URLs. Take my forum pages, for instance. A
typical thread there has at least 1 equal sign ( = ) and 1 question
mark ( ? ), but these URLs are getting indexed every day without a
problem. Even longer URLs from the forum with many query strings are
getting spidered and indexed. A quick check at Google shows me that
at least 12,500 URLs from my forum are in Google's index!
For example, take a look at this one:
http://www.highrankings.com/forum/index.php?showtopic=6335&st=90.
We've got a question mark, 2 equal signs and an ampersand, yet it
shows in Google's cache without a problem.
Okay, so back to mod_rewrite. The idea behind using mod_rewrite for
dynamic sites was to change dynamic-looking URLs into static-looking
ones. Since question marks and equal signs were the usual tip-off to
the search engine spiders to avoid certain pages/URLs, many smart
programmers would use mod_rewrite on their server in a special way
that would enable them to change those characters into some other
character that was more crawler-friendly -- such as a forward slash
( / ).
For example, my above forum page URL using mod_rewrite might instead
look like this:
http://www.highrankings.com/forum/index.php/showtopic/6335/st/90
In this instance, the question marks, equal signs and ampersand were
turned into slashes. Now the URL *appears* to be one that would be
used for a static page (even though it's not). This should
theoretically make it more apt to be indexed by the search engines.
Before the engines started indexing URLs with query strings without
many problems, using this technique was a good idea to encourage
spidering. There's nothing "spammy" about it, and it won't get you
into trouble with the engines if you do it right. However, these
days, it really doesn't seem to be necessary for most sites.
My theory is always "if it ain't broke, don't fix it," which is why I
wouldn't recommend moving towards this solution unless you are very
sure it is the only way your URLs will get indexed. According to my
programming friends, this solution uses up lots of extra server
processing time, which can slow down your pages, and perhaps use up
more of your bandwidth. I've also seen instances where the engines
index both the dynamic-looking and static-looking URLs, which only
serves to cause more problems!
For more information on indexing dynamic sites, I suggest reading my
interview with Alan Perkins, and doing a search at
the High Rankings forum to check out the various threads we've had on
the topic.
Hope this helps!
Jill
FreeFind Site Search Engine - FreeFind adds a "search this site" feature to your website, making your site easier to use. FreeFind also gives you reports showing what your visitors are searching for, enabling you to improve your site. FreeFind's advanced site search engine and automatic site map technology can be added to your website for free.
(Unpaid placement - FreeFind is a Search Engine Guide partner.)