Everybody likes Google because it's fast and easy to use. It has a vast database. But the clincher is that it really works — you can find anything. Google launched in 1998. Stanford grad students Sergey Brin and Larry Page were working on a class project to identify meaningful patterns in Web link structure. They were struck by the significance of "backlinks" (pages linking back to a site) and realized these could be used to build a better search engine.

Name of the Game
Originally, the search engine was called "Googol," which refers to the number 10 to 100th power (1 followed by 100 zeroes), representing the infinite number of Web search documents. After presenting their project to an angel investor, Brin and Page received a check made out to "Google." They thought about it for a couple weeks, then decided to open an account in the name "Google." And the rest is history.

The PageRank Innovation
Google turned the search world on its head with PageRank. PageRank turned out to be a real technological breakthrough as most major search engines use link popularity for relevancy now. How does PageRank work?

"Google's PageRank search technology works by first identifying the link structure of the entire Web, then ranking individual pages based on the number and importance of pages linked to them," said Google software engineer Matt Cutts. I gather from talking to Cutts that importance (the popularity and relevance of the backlink) is more critical than the number of backlinks.

Does Google Have a Weakness?
Google works better on searches for specific information (like snowfall in Sweden) than for general information (like dogs) because search results aren't categorized, thus unwieldy when searching broad terms. In case you didn't know, Google has a directory with category listings, but most people use Google.com.

The newer search engines like Wisenut and Teoma started classifying results by category. For instance, Teoma (in Beta) divides the dogs query into folders: Dogs Breeds, Dogs Training, Dogs Cats, German Shepherd Dogs, Animals Shelter, Dogs Lovers, Great Pyrenees , Humane Society. Most users don't know how to narrow their searches because they're not familiar with research procedures.

Will Google follow the trend and start categorizing? "Google is in its second generation of experimenting with category based results," replied Cutts. "Users apparently do not like having too many category options, but presenting clear and concise categories is important to users."

Google's Success Formula
Google has two main sources of revenue: advertising and search services. Cutts reports that their Ad Words program currently yields up to five times the average click-through rate for traditional banner ads.

The real cash cow might be Google's search services to major portals and corporate sites. Google has about 130 customers in over 30 countries. Such customers include Yahoo! and international properties, Sony and global affiliates, AOL/Netscape, Cisco Systems, and many others. Partners pay Google an upfront search service fee and a CPM (cost-per-thousand results sets delivered) to power search on their Web sites. Google receives a fee for every search conducted on its partners' sites.

New Google Tool Bar
Are you one of several million who have downloaded the Google Tool Bar? Google just released a beta version that permits you to vote on site popularity. This could bring human opinion into the equation rather than the sole emphasis on link structure.

If you download the beta version, you can rank your search results with a voting button. Will they incorporate this feedback into their algorithm? "Rather than using the votes to tinker with the specific rankings of particular pages or sites, the feature would most likely be used to bolster the relevance of overall results," replied Cutts. He said the data collected thus far looks promising, but it's too soon to make any conclusions.

How Does Google's Algorithm Work?
It ranks sites by the words listed on the page and by the keyword phrases in the title and description. The robot reads "keyword" and "description" meta tags, ranking the page's popularity, based on the number and importance of sites linked to the page.

How to get high rankings? Cutts says the guidelines are pretty simple. "Stay away from hidden text, hidden links, cloaking, sneaky redirects, lots of duplicate content on different domains, and doorway pages. Webmasters should also stay away from programs that send automatic queries to Google. The worst thing you can do is try to cheat: shortcuts to boost PageRank or rankings usually do more harm than good. Even if an SEO does think they've found a shortcut, about 2/3rds of the time it may be a sting operation. Don't bother with link exchanges, signing guestbooks, or other tricks -- the best use of a webmaster's time is building good content and honestly promoting their site. When Google punishes spam like cloaking, we sometimes take out not only the cloaked domain but the SEO's client as well."

A Glimpse into the Future
Google wants a deeper, fresher, and more personalized index. "The future will be less about features and more about the overall usefulness of an engine," noted Cutts. "We believes users want relevancy, but they also want quick, clean results with proven integrity." Do you see XML in the future? "Not anytime soon," replied Cutts. "The main benefit of HTML is that anyone can write it. That's part of why the Web had such meteoric growth. XML is great for machine-to-machine communication, but it's much more difficult for a person to produce by hand."

Google plans to increase its lead across the board in the coming year. "We'll be introducing new ways to search. We don't want to give away any secrets, but Google will provide many helpful surprises in 2002," said Cutts. As usual, Google's focus will be on search and the user experience.

How Deep?
Google provides support for hundreds of file formats found in the deep Web: PDF, RTF, PostScript, Word, Excel, PowerPoint, and more. It crawls and indexes millions of dynamic pages. Every 28 days, Google indexes 3 billion Web documents; it performs a "fresh crawl" of more than 3 million important Web pages every day. Google’s news crawl gives you up-to-the-minute headlines on news queries. A subset of Google's fresh news content is available at http://www.google.com/news/newsheadlines.html.

February 11, 2002

Paul J. Bruemmer has provided search engine marketing expertise and consulting services to prominent American businesses since 1995. As Director of Search Marketing at Red Door Interactive, he is responsible for strategizing and implementing search engine marketing activities within Red Door's Internet Presence Management (IPM) services.

Comments closed after 30 days to combat spam.

Search Engine Guide > Paul Bruemmer > Google: Search Technology for the Millennium