Why do the search engines constantly have to evolve into a different type of engine? Why can't they stay the same?
To answer this, let's look at the ultimate goal of a search engine. What do the search engines want to do? They want to provide relevant results to you, the user. Why can't they do that under the current system?
There are several reasons why the current system isn't working. For one thing, the Internet is growing at an unheard of rate. Plus, spammers are growing at an unheard of rate as well. In many ways, the engines are fighting a losing battle to provide relevant results while combating spamming and duplicate pages.
In essence, the engines need a way to store more pages, combat spam, and still provide (or attempt to provide) pertinent results. So, in an effort to provide relevant results, the engines began sliding in other variables, which is where the 1st, 2nd, and 3rd generation search engines come in.
1st, 2nd, and 3rd Generation Engines
By understanding the path we've taken to get where we are in this crazy search engine business, it might give us some insight into where we're going.
You may have heard of 1st, 2nd, and 3rd generation engines, but what exactly does that mean?
Michael Campbell explains,
In the beginning, search results were very basic and largely depended on what was on the Web page. Important factors included keyword density, title, and where in the document keywords appeared.
First generation added relevancy for META tags, keywords in the domain name, and a few bonus points for having keywords in the URL. Basic spam filters emerged that got rid of keyword stuffing and same color text. The portals also made their appearance, and engines started looking like giant billboards and overstuffed yellow pages.
All of this is quite familiar, isn't it? Almost too familiar.
But, do META tags hold as much importance as they once did? No. Does using keywords in various tags help as much? Generally not.
Instead, the engines took it a step further in their quest for relevant results by bringing in 2nd generation engines.
Second generation, which is in full swing with the themes thing, added much in the way of off page criteria and link analysis. A few of the major components they employ are tracking clicks, page reputation, link popularity, temporal tracking, and link quality. Then they started adding in term vectors, stats analysis, cache data, and context where two-word keyword pairs were extracted from a page to better categorize it.
We'll cover "term vectors" and other information mentioned in the above paragraph later in this article. For now, let's continue with 2nd generation engines.
We all know how important a good solid link popularity is these days. Does any old link count? Certainly not. The days of huge link exchange programs with no thought for "related" links are over.
Plus, with Google's PageRank system and DirectHit's method of tracking clicks and the length of visits, we're seeing more evidence of a 2nd generation engine.
But what is a 3rd generation engine? It's almost mind boggling to consider.
Third generation is already underway. It adds word stemming and a thesaurus on top of the term vector database to assist in keeping a search in context. Auto extraction of keyword pairs also helps automatically categorize a page, where searches like `shop for' or `find' trigger totally different search results based on the context or intent of the person doing the searching.
G3 adds Web maps which, although not searchable, are a useful filtering tool to get rid of duplicate sites and many stand alone pages that drive traffic to only a few destinations. This means pages like doorways, gateways, entry, splash, or whatever you want to call them, will soon get filtered out.
They will also be extracting as much data as possible about your individual searching habits. All the major engines plan on building personal profiles, little robots that `come to know you' over a period of time, based on past searching habits.
Okay, so we have a good idea of where the search engines are headed, but how can we keep up? The 2nd and 3rd generation engines are theme-based, but what does that mean, and how does it translate to what we need to do with our own sites?
What are "Theme" Engines?
What exactly is a "theme" engine? First, let's hear the scientific definition. This isn't easy reading, so it might help if you have a brown paper bag handy in case you hyperventilate.
Computer scientists working with Campbell define "themes" or "topics" as,
Using a term vector database, they weigh page keyword density to calculate the page vector, which is compared and stored relative to the term vector. They then compute a Web page reputation by graphing interconnectivity and link relevancy, making sure the reputation of the page and the content on the page actually match. The closest matches get the highest search engine positioning.
Uh huh. Kinda hurts the brain cells, doesn't it?
Now, let's look at an easier-to-understand explanation. How does Michael Campbell define a theme engine?
One. The answer is one. What you say about your Web page, how the structure of other people's Web pages compares on the same topic, and what other people say your site is about, must match, be in harmony with each other, be as one.
Or, in the cold hard world of the search engines, where everything is weighted and calculated according to mathematical formulas, whoever is closest to the 1.000000 without going over is the winner, coming up tops in the search engine.
A theme engine looks at all the information on a `seed set' or a group of sites and pages that it has already spidered and has in its index. It assigns each page in the index a number or page vector. This becomes the `core' of the search engine.
Suppose you just submitted a Web page, so you are now in competition with everything in the core. The engine looks at everything on your page, from one and two keyword phrase densities, to page length, compares it to the seed set and assigns your page a number, for each keyword phrase. These numbers assigned to the keyword phrases are known as `term vectors.'
The closer your term vector is to the page vector, the better chance your page has of being a top ten contender for any particular keyword phrase. You might even be `folded in' to the core, bumping off some other page, causing it to fall out of the search engine. (Some engines will adopt the `pay to stay in the core' model in the near future, so paid sites won't get bumped out.)
Then, there is what the rest of the Internet and its users have to say about your page. Link analysis, traffic, stats, and cache data are all taken into consideration and analyzed.
The next step is to add in and calculate words in incoming links to your page, making sure they match up to your term vector. So, what the search engine has determined that your page is about must match what the rest of the Internet says your page is about in their links to you.
So in review, in layman's terms, here is what I would define as a theme based engine:
What you say your page is about, what the search engine calculates your page to be about, and what the rest of the Internet thinks your page is about, must match, according to their mathematical formulas.
Then, as the whipped cream topping on top of the theme behavior sundae, are the stats and cache data. If your site is one of a search engine's top exit pages, it must be good, because people don't come back and search some more once they've found your site. You just got a big boost in positioning. And, if your site gets searched and clicked on so often that you are in the engine's cache for speedy data retrieval, your site must be very good indeed.
All of these factors, both on and off page criteria, help define what a theme-based search engine is looking for. They are looking for unanimous approval that your site is all about a particular topic. And the more narrow the focus on that topic, the better your site will do.
Take a deep breath. You probably feel like your mind is burning with information, because this is a lot to digest. Go get a cup of coffee (or a stiff drink), and let's get back to work.
Which Engines are Theme Engines?
In Campbell's opinion, all search engines are moving toward being theme-based.
It's just another way of saying they are implementing `second generation' search engine strategies. Some engines call it `in context' searching, while others call it `rank and reputation' or `on topic.' These are all different ways of saying the same thing: adding off-page criteria to help determine relevancy.
So, with all of the engines gravitating toward being theme engines, does this mean that we have to scrap our current search engine optimization strategies? Not necessarily.
Let's look at a few of our current optimization strategies to see how effective they'll be with theme engines.
Current Optimization Strategies
With the move toward theme engines, will cloaking be as effective when working with a theme engine?
John Heard, producer of IP-Delivery, a leading cloaking software, says it will be just as effective and even allow for more flexibility in page content if implemented properly.
According to Heard,
There is no difference between a cloaked or non-cloaked site when it comes to themes for in-bound link popularity in most cases. However, it should be noted that a cloaked site can choose what links it does or does not show to the search engines. This is potentially advantageous.
Say that you want to trade links with someone. You want the advantage of their link popularity but you don't want to send your popularity back to them. A cloaked page will help you do this if you set it up right. By placing the links only on the consumer version page but excluding them from the search engine optimized (cloaked) page, you can 'hide' them from the engine.
So yes, in that way, cloaking can affect link popularity. It's entirely in the hands of the SEO professional in the manner it's used. Cloaking is handy if you want to cross link sites and show those links to humans but don't want the engines to see the links.
A good example is if you own a computer hardware site and a travel site. The two topics are not theme related so you don't want the engines to see the links between them. On the other hand, you might want your site visitors to see the links and a cloaking system would give you the best of both worlds.
2. Keyword Weight
Is keyword weight dying in importance, similar to its optimization buddies the META tags?
Not at all.
Campbell explains that keyword density is a very important foundation upon which everything else is built.
Different types of documents or pages have different characteristic densities. The seed set of Web pages that the theme engine used to populate its database will determine what is a normal keyword density for each keyword, based on the entire collection of pages for any particular topic.
Since the term vector database (TVD) is an open-ended application, other applications can be run on top of it. This gives the search engines the ability to change the target keyword densities from the normal parameters at will, to give the illusion of fresh search results, without needing to recompile the database. Smoke and mirrors mostly, but it keeps the very important keyword density target moving.
3. Competitive Keywords
One real problem when working with theme engines is getting stuck in the wrong vector if there are already many sites on a particular subject. Being in the wrong vector will mean that your page won't match the term vector, so your site's ranking will suffer.
But what if you're working with a highly competitive keyword phrase?
If there are already 50 documents with 100 percent relevancy associated with a term vector in the database core, you are not likely to get in unless you pay for it. If you are really lucky, you might nail some off-page criteria that makes your site more important and bump some other site off. It is do-able, but it is a lot of hard work.
If you need instant traffic, just go after the low hanging fruit. Go after a second or third, yet popular, way of saying the same thing. For example, the phrase cellular phones is fierce and mobile phones is tough. Wireless phones is a very popular search phrase but has relatively little competition. My advice would be to go after the low hanging fruit first, and then try playing with the professionals at the top of the tree.
Getting stuck in the wrong vector is nasty. You'll need to change the content on your page to be sure it cannot be taken out of context. Make sure that on your banana bread recipe page, don't say you're growing fond of the recipe. Otherwise, the vector might determine your site is about growing bananas and not banana bread. The good news is that we can expect TVD's to get more accurate as they add more context intelligence.
4. Stop Words
Another problem in working with theme engines concerns stop words. If an engine considers a word a stop word, it won't get indexed at all. So, if your keyword phrase contains a stop word, you need to work around it. "If the engine is filtering out the word Web in the phrase Web site hosting, it means focusing your efforts on the phrase site hosting or saying the same thing but in a different way, like domain hosting," explains Campbell.
5. Redundancy Filters
With the theme engines looking for redundant Web pages, how can you avoid setting off the redundancy filters?
Simply put, the days of having mirror sites are over.
So, to avoid setting off the redundancy filter, don't duplicate, mirror, or copy your pages. Don't use "cookie cutter" templates with the keywords swapped out.
The filters are getting even tighter with Web maps. They can tell if a bunch of pages are doorways, or dupes, even if they are stored on different domains, because the page length and bite size are similar and they all point to the same place. They'll all get nuked in the culling process.
Campbell suggests sitting down and writing what the page is about.
Then once the page is complete, look at the target keyword densities you would like to achieve and start working the keywords into the title, headlines, links, and body copy of the page.
Try not to go too crazy with doorway pages for each site. Spread them around on different domains. Set up completely different Web sites to sell related yet different product lines, and create your own mini Internet of linked sites.
6. Lengthy Pages
With theme engines, you'll be walking a fine line between giving the engine what it wants to see (related content) and providing too much information.
If you provide too much information, it's likely that the page pertains to more than one topic, which means you'll have a more difficult time getting a top ranking.
But is it also a problem with TVD's compressing large pages?
"The TVD doesn't actually store the entire page," says Campbell.
It looks at the page, tries automatically to determine what it is about, and reduces it down to only a few words, like a dozen or so possible keywords and phrases.
The more words there are on a page, the more likely you are to talk about several topics, which in turn dilutes the dozen or so possible term vectors that the page can be about. Ideally, you want to focus the page on a single theme or keyword and describe its context with several two or three word combos.
If I had to pick a number, I would say to try to keep pages between 100 to 700 words, unless you really know what you are doing.
7. Changes in Ranking
With term vector databases, your pages may have been discovered by the engine's spider and given a ranking but have not yet been added to the database. Does this mean that once the pages are actually added to the database, their ranking could go up or down?
Yes, a page may have been discovered by a crawler but not yet folded in to the TVD. The temporary positioning in search results is based on the likelihood that your page contains relevant information. It's commonly called page reputation or what your page is known for. It is largely based on what incoming links say your site is about.
Once the engine recompiles its index, the page reputation will be compared to the term vector using a complicated mathematical formula and weighting scheme. In short, the reputation of the page and the term vector of the page must match to be a top 10 contender. The further away the numbers are from each other, the less relevancy, and the poorer the positioning of the page in search results.
How does Inktomi's 3rd Generation Engine Compare with AltaVista's?
Campbell says that he hasn't seen a lot of difference between the two.
They all seem to be going in the same general direction. But to be sure, the customization or proprietary experience at one engine over another will be their big selling point in the future.
They will definitely want to make their search experience unique -- to give the users a brand, or reason, why they would rather fight than switch. Otherwise, they may fade into the same old bland mediocrity and continue to lose traffic because of it.
Tips on Working with Theme Engines
How can we create Web pages that theme engines will like and boost our odds at getting top rankings?
What Does the Future Hold?
In the future, you might be able to load the engine full of lists of keywords. Your interests, likes and dislikes, geographical info, and favorite Web sites can be entered, from which the engine can create a context engine just for you. Just think, they'll know what your next search is likely to be, even before you do.
It's almost frightening, isn't it?
For More Information
Michael has already published three very good search engine reports and is working on another that discusses:
If you would like to order any of Michael's reports go to http://www.kudosnet.com/sep/?r.reports
More Research Sources
Read the WWW9 research papers by visiting http://www9.org/w9cdrom
Read AltaVista's September 15, 2000 press release which mentions their 3rd generation engine. http://doc.altavista.com/company_info/ press/pr091500.html
Read Inktomi's April 11, 2000 press release announcing their 3rd generation search engine. http://www1.inktomi.com/new/press/ 2000/gen3.html
June 7, 2001
Robin Nobles is the Co-Director of Training of Search Engine Workshops, where they teach "hands on" search engine marketing workshops in locations across the globe. They also provide a networking community for SEOs called The World Resource Center for Search Engine Marketers and have expanded their workshops to Europe with Search Engine Workshops UK.
Copyright © 1998 - 2017 K. Clough, Inc. All Rights Reserved. Privacy