While search engines are pretty good at finding web sites and getting their content indexed, many site owners have no idea their web sites can be created in a way that looks good to people, but can't be read by search engines. If your site is built entirely in Flash, relies too heavily on Javascript or uses drop down boxes and forms to let people find your content, the search engines may be missing out on your content.

(SEM Bootcamp articles are no-frills content designed to bring small business owners up to speed on the concepts and techniques needed to market their businesses online.)

Search engine spiders do an impressive job of finding and indexing the content on web site around the net, but there are still a few stumbling blocks that could be keeping them from indexing the content on YOUR site. Thankfully, it only takes a few seconds of your time to make sure an easy to fix problem isn't keeping your site content from being indexed.

Search Engines Are Smart, But They Have Limits

While there are several things that can trip up a search engine, most small business sites that aren't being indexed suffer from one of the following problems. They've either mistakenly set their robots.txt file to "noindex," have an all Flash or graphic site, or rely on programming like frames or drop down boxes for navigation. Of course there's no sense worrying about which of those issues is the problem until you know if there's a problem to worry about.

The Spider Simulator is Your Friend

The good news is there's a very simple way to find out if there is a problem on your site. Thanks to handy little tools called "spider simulators," you can view your web site the same way a search engine's spider would. The Spider Simulator offered by Webconfs.com is a great, free tool that will help you figure out if your web site is working against you when it comes to search engine rankings.

simulator.gif

To use a spider simulator, you simply type the URL of your web site into the query box and hit enter. The program will run a quick search of your web site and will then show you what information it was able to collect. These programs are designed to read your web site the same way a search engine would, so more often than not if a spider simulator can't read your content, a search engine can't either.

With that in mind, let's take a look at two small businesses here in central Ohio and find out how their web sites fare in a spider simulator.

Example #1: Market Blooms Columbus, a florist in Columbus, Ohio

When I enter the URL for this local florist, the spider simulator sends back some pretty disturbing results.

florist.gif

According to the spider simulator, there's only one brief phrase that can be read. No links, no content...pretty much nothing other than a handful of worthless meta keyword tags. With that in mind, let's take a look at the web site.

marketbloom.jpg

It's not a bad looking site. It tells what the company does, it has a picture of the owners, it has a phone number and email address. It even gives the address of the business and tells you a little bit about their offerings. Unfortunately, it's not sharing any of that information with the search engines.

Why? Their address, phone number and description of their business are all part of a single graphic. Since search engines can't read graphics, the search engines have no idea what the site is about. It would be easy for Market Blooms Columbus to solve this. An afternoon spent making some changes to the site would go a long way toward helping engines like Google have a little more information. Chances are high this site could rank for phrases like "Columbus Ohio florist," "north market florist" and "Columbus flowers" if they'd only spend a little time making sure search engines could read the content on their site.

In fact, when I run a search for "north market florist columbus," their site doesn't appear. If they'd simply redo the site so they were using some graphics and some text, they'd probably have no problem showing up on the first page for that phrase.

Example #2: Bloomtastic, a florist in Dublin, Ohio

If I enter their URL into the spider simulator, I get quite a bit more data back.

florist2.gif

For the Bloomtastic site we actually get page content and tons of links. It's not a lot of content, but it is content and there are some keywords in there. Additionally, seeing all those links tells us the the search engine can find its way deeper into the site to dig up and index more content. Since every page of a web site is another chance to earn search rankings, this is essential to your site. If you have more than one page on your site and a spider check like this one isn't showing those links, you need to have someone look at how your links are coded.

Sure enough, if you run a search on Google for "Columbus Ohio florist, the Bloomtastic site ranks second.

Graphics Aren't The Only Problem

In the first example, the big issue was an all graphic site. That does happen now and then to small businesses, but these days it's all Flash sites that tend to be the more common offender. While search engines are getting better at reading the content in Flash, they're not quite there yet. (Run a check in the simulator on the Diet Coke site and the only link it picks up is the "get Flash now" link.)

Another common problem is a robots.txt file that accidentally got set to block indexing instead of allow indexing. This one is super easy to fix, especially if you read through the search engine marketing bootcamp article on creating your robots.txt file.

The last one I see pop up pretty frequently is a site that relies on drop down menus or forms to let people navigate around the site. Generally this doesn't keep the entire site from being indexed, but it can block large amounts of content. If you have any content that can only be reached by a drop down box or by filling out a form and clicking an enter button, then chances are pretty high the search engines can't find it and haven't indexed it. Search engine spiders can't fill out forms like humans do; unless you give them a direct text link, the content might as well not even exist.

There's no doubt there are other things that can cause problems with the spiders, but these are some of the quickest and easiest to check for. The good news is they all have remedies. The first step is finding out you have a problem. If you do, it's time to contact someone you trust so they can take a look at your site and help you figure out what the problem is.


April 3, 2008





Jennifer Laycock is the Editor of Search Engine Guide, the Social Media Faculty Chair for MarketMotive and offers small business social media strategy & consulting. Jennifer enjoys the challenge of finding unique and creative ways to connect with consumers without spending a fortune in marketing dollars. Though she now prefers to work with small businesses, Jennifer’s clients have included companies like Verizon, American Greetings and Highlights for Children.





Comments(13)

Even though Columbus is the capital, it's always a shock when someone mentions my city. And Dublin... that was almost creepy. I typically assume the rest of the world views Ohio as some sort of black hole where nothing actually exists.

Anyways, your article was very informative :)

Great post, but your Spider Simulator link is actually to Market Blooms Columbus.

Hay it is true that many site owners have no idea their web sites can be created in a way that looks good to people. search engine marketingspiders do an impressive job of finding and indexing the content on web site around the net. Be care full about spiders.

@Chadwyck: I typically assume the rest of the world views Ohio as some sort of black hole where nothing actually exists.

If they do they have their head in the sand. For example, the concentration of Internet marketing brainpower in Ohio is really incredible.

Thanks for the heads up, Jim. The link is now fixed.

Adding to the above article, making the inner pages of your site optimized for search engines is as important as the homepage itself. We get most of the traffic from search engines directly to the inner pages where we likely have more content. A good practice would be making them crawlable by search engines.

Thanks for a great and timely post. As a SEM trainer, the biggest thing I tell my clients and students is that pretty does not matter to the SE, code does and target specific content of which keyword prhases are hugh part are very important. Also, I have seen websites use their header or main graphic on every page show their company name and show their phone but not repeat this critical robot information in the text on their pages. Lia Barrd, HaikuWebServices.com

This was a good post and I used the free tool to check out our website. Can someone tell me if it is better to lots of keywords like the example above or only a few.

The keywords that were listed were likely from the keyword meta tag. While some spider simulators will pick them up, the search engines don't really use them anymore. The only meta tag you really need to put any thought into is the meta description tag. That's because some engines will use that tag as the snippet that describes your site in the search results.

The keywords really belong in your title, your content, your headlines and your links. The search engines will pick them up from the content on your page.

Thanks for the explanation Jennifer

I just tweeted this to you, but thought I'd comment here too that this was a really helpful article. I guess google is now going to fill out forms and crawl (just in). I thought of you when I saw this article: http://searchengineland.com/080411-140000.php

You have a lot of interesting articles on your site. Thanks!

Dear Ms. Laycock, thank you so much for your informative article and the link to the free spider simulator. I am fairly new to website design and I recently spent many frustrating hours trying to figure out how to get my website (www.accidentawardslasvegas.com) built using MicroSoft's Small Business Office Live program "verified" with google so I could check whether it was viewable by spiders. I was never able to figure out how to insert HTML or MetaTags in a way that would satisfy Google, even after I posted questions about that on Yahoo Answers and scanned all the forums on small business live - but thanks to your article and the link, I was able to finally confirm that my website is visible to spiders. - Dan.

This is info I've been looking for a while...I wondered exactly what search engine bots saw when they came to my site and even more importantly, the direction they read it.

My site MSB pulled up boat loads of content and there are some things getting crawled that are perhaps hurting my PR. I just read a viable solution for instructing bots to NOT follow certain links and pages but its obviously a tedious tweaking process.

Thanx for this post!

Comments closed after 30 days to combat spam.


Search Engine Guide > Jennifer Laycock > Search Engine Marketing Bootcamp: Make Sure the Search Engines Can Read Your Site