Search Engine Guide
Home
Search
Engines
Knowledge
Base
Vendor
Directory
Newsletters
About
Search The Internet: 


Top Site Listings provides two additional ways to stay current on search engine optimization techniques, trends, and other vital information:


Subscribe to have these articles delivered directly to you each week via email.



In addition to receiving the weekly articles, be sure to sign up for the Top Site Listings monthly newsletter.

Search Engine Optimization

Article provided by:
Top Site Listings
© 2002 Orbidex.


Back To Article Index

Tell Your Friends About This Site




Google.com - An In-Depth Look at the Search Engine Giant
By Eric Lander - March 28, 2002

Google has established itself as one of the premier search engines available on the Internet, and actually boasts about its size and comprehensiveness. Remaining one step ahead of the other search engines with every technological advancement, Google is poised to be a search engine for years to come. Why does this matter to search engine optimization specialists and search engine marketers? Simply because if you are to succeed, you must understand the largest search engine out there.

It is no surprise that Google has set industry precedent in the past. When Google makes a change, the other engines take notice and often follow suit. That fact alone proves that if there is any one search engine to spend the majority of your time with, it is Google.

The Comprehensiveness of Google

As the leading search engine, Google has become reliable for updating its search engine results pages ("Serps") on (roughly) a monthly basis. This has allowed both optimization specialists and SEO clientele to know and expect some things to change fairly often.

From a SEO and SEM point of view, it becomes increasingly important to understand what allows Google to retain the largest database of web-related documents for searchers. Using information provided on Search Engine Showdown, we know that Google's database is composed of the following document types, with the quantities of each type within the index:

Indexed Web Documents
1,465,000,000
Unindexed URLs
500,000,000
Other Types of Files
35,000,000
Indexed Web Pages / Refreshed Daily
3,000,000

That translates into the following percentages:

Indexed Web Documents
73.10 %
Unindexed URLs
25.00 %
Other Types of Files
1.75 %
Indexed Web Pages / Refreshed Daily
15.00 %

…With an index of this size and stature, it is amazing that they can even maintain the database, let alone update it! Let's take a bit closer of a look to see what each of those four categories includes.

Indexed Web Documents

Simply the web pages that we all, as Internet browsers come across. With each and every one of the nearly one and one half billion documents of this type, Google has indexed the text-based contents of these pages and made their content available to those searching on Google.com.

Unindexed URLs

These actually appear fairly often within a number of search queries on Google.com. A key to look for is the tags that follow the actual link from the SERP. If there is a small clip of text underneath the link, than that URL has been indexed. If there is no text available underneath the link, than the page either does not offer any indexed text-based contents, or, is an unindexed URL. To further clarify, if you can view the size of the page (for example: "22K") than the page has been indexed by Google.

Other Types of Files

Refers to the growing list of files that Google has the ability to read and index. Some of the most popular file types that are indexed and not classified as Indexed Web Documents include Adobe Acrobat PDF Files, Microsoft Office Documents (Word, Excel, PowerPoint, etc.), WordPerfect, Flash, and others.

Indexed Web Pages / Refreshed Daily

These are the same as the Indexed Web Documents, but are reviewed daily by Google and their listings within the SERPs are also updated. The reason for listing these refreshed documents, is that many of the pages have been noted by Google to consistently show continual changes over time. (News-related pages and sites are a good example.)

Optimizing for the Entire Google Search Engine

Optimization is really about getting as much useful content out to your users and targeted market as is possible. Google is a great avenue to do just this because of it's abilities to index and refresh so many listings.

As one would expect, the majority of the information that an SEO or an SEM should strive to get into the Google search database - are web pages and HTML formatted documents. That is what the vast majority of Google's users are looking for - and expect. Of course, there is far more work done to retain a market than to just get a document listed - but we have covered such topics in the past.

As a word of caution though, it becomes evident that many documents that intended to become available to semi-private audiences online, may in fact reach the general searching public. If your company or web site contains any files that fit the "Other Types of Files" category as listed above, make sure that those files can be seen by the general public. If not, password protect their URLs as much as possible or disallow access from all user agents (especially GoogleBot!) from accessing the folder that such files reside in.