I couldn't find any "meaty" questions this week, so I thought I'd just talk generally about what makes a site "crawler-friendly." I used to call this "search-engine-friendly" but my friend Mike Grehan convinced me that the more accurate phrase was "crawler-friendly" because it's the search engine crawlers (or spiders) that your site needs to buddy-up to, as opposed to the search engine itself.
So, how do you make sure your site is on good terms with the crawlers? Well, it always helps to first buy it a few drinks. (grin) But, since that's not usually possible, your next-best bet is to design your site with the crawlers in mind. The search engine spiders are primitive beings, and although they are constantly being improved, for best results you should always choose simplicity over complexity.
What this means is that cutting-edge designs are generally not the best way to go. Interestingly enough, your site visitors may agree. Even though we SEO geeks have cable modems and DSL, our site visitors probably don't. Slow-loading Flash sites, for example, may stop the search engine spiders right in their tracks. There's nothing of interest on the average Flash site to a search engine spider anyway, so they're certainly not going to wait for it to download!
Besides Flash, there are a number of "helpful" features being thrown into site designs these days that can sadly be the kiss of death to its overall spiderability. For instance, sites that require a session ID to track visitors may never receive any visitors to begin with -- at least not from the search engines. If your site or shopping cart requires session IDs, check Google right now to see if your pages are indexed. (Do an allinurl:yourdomainhere.com in Google's search box and see what shows up.) If you see that Google only has one or two pages indexed, your session IDs may be the culprit. There are workarounds for this, as I have seen many sites that use session IDs get indexed; however, the average programmer/designer may not even know this is a problem.
Another source of grief towards getting your pages thoroughly crawled is the use of the exact same Title tags on every page of your site. This sometimes happens because of Webmaster laziness, but often it's done because a default Title tag is automatically pulled up through a content management system (CMS). If you have this problem it's well worth taking the time to fix it.
Most CMS's have workarounds where you can add a unique Title tag as opposed to pulling up the same one for each page. Usually the programmers simply never realized it was important, so it was never done. The cool thing is that with dynamically generated pages you can often set your templates to pull a particular sentence from each page and plug it into your Title field. A nice little "trick" is to make sure each page has a headline at the top of the page that is utilizing your most important keyword phrases. Once you've got that, you can set your CMS to pull it out and use it for your Titles also.
Another reason I've seen for pages not being crawled is because they are set to require a cookie when a visitor gets to the page. Well guess what, folks? Spiders don't eat cookies! (Sure, they like beer, but they hate cookies!) No, you don't have to remove your cookies to get crawled. Just don't force-feed them to anyone and everyone. As long as they're not required, your pages should be crawled just fine.
There are plenty more things you can worry about where your site's crawlability is concerned, but those are the main ones I've been seeing lately. One day, I'm sure that any type of page under the sun will be crawler-friendly, but for now, we've still gotta give our little arachnid friends some help.
One tool I use to help me view any potential crawler problems is the Lynx browser tool
. Generally, if your pages can be viewed and clicked through in a Lynx browser (which came before our graphical browsers of today), then a search engine spider should also be able to make its way around. That isn't written in stone, but it's at least one way of discovering potential problems that you may be having. It's not foolproof, however. I just checked my forum in the Lynx browser and it shows a blank page, yet the forum gets spidered and indexed by the search engines without a problem.
This is a good time to remind you that when you think your site isn't getting spidered completely, check out lots of things before jumping to any conclusions.
February 10, 2004
CEO and founder of High Rankings®, Jill Whalen has been performing search engine optimization since 1995 and is the host of the free High Rankings Advisor search engine marketing newsletter, author of "The Nitty-gritty of Writing for the Search Engines" and founder/administrator of the popular High Rankings Search Engine Optimization Forum. In 2006, Jill co-founded SEMNE,
a local search engine marketing networking organization for people and companies in New England.
High Rankings is an internationally recognized search engine optimization firm located in Framingham, MA specializing in search engine optimization, SEO consultations, in-house training, site audit reports, search marketing seminars and workshops. High Rankings has a 100% success rate for substantially improving client rankings and targeted traffic.
Jill speaks at national and international conferences and has been writing
about SEO and search marketing since 2000. She's been quoted in such
publications as The Wall Street Journal, U.S. News & World Report and The
Washington Post. Her articles have appeared in numerous print magazines and
online websites including CIO Magazine, CMS Focus, The Internet Marketing
Report, ClickZ, WorkZ, Inc.com, Entrepreneur, Lycos Small Business,
WebProNews, SiteProNews and others. Jill has also appeared on many online
and offline radio programs such as Entrepreneur Magazine's E-Biz Radio Show,
SearchEngineRadio and the eMarketing Talkshow.