Verticals and Content Engines: The Next Generation of Searching
GUEST COLUMN by Steve Matthews - May 5, 2000
The lack of "Human Indexing". Technology is great, but
unless the general population is willing to take the time to master "Boolean
Operators", the best methodology available is going to be plain old human
classification. The addition of applied keyword categories (a.k.a. "a
controlled vocabulary") would help this system, but the combination of the
"web page free-for-all" and keyword searching has not been an effective.
Directions For Internet Searching: (1) Go
to appropriate portal web site. (2) Enter keyword terms. (3) Begin to drill down
through 47 million web pages.
Overly simplistic? Yes. And a little over
the top? Yes, but the web searching process has gotten no easier over the past
few years. Have a little hope though, things are going in the right
There is little doubt that the number of
indexed pages on the search engines has become unwieldy. Those that laughed at
the 47 million number above can try searching for "business" on Altavista and
get back to the rest of the group later.
This situation gets even more puzzling when
one considers the continued criticisms of the search engines for only indexing
20% of the available pages on the net, or 25%, or 30%… The search engines should
be indexing fewer pages, not more.
So where are these universal finding tools
going wrong? There’s probably no one answer, but some possibilities that should
be considered are as follows:
- The lack of any formal "Collection Policy".
Search engines will index anything and everything submitted to them,
which has made them the freebee marketing tool of the new millennium. There’s
nothing wrong with free, but if Libraries can have adequate collection
criteria for their collection development, so can the search engines. The
"everything’s accepted" approach has created unnecessary bulk to the search
No focus on Subject or Audience. Search engines are collecting on all subjects
for all people; hence the global popularity and huge market caps, but also,
hence the searcher’s frustration. Massive reductions in "collection noise"
(read: useless links) can be achieved by collecting for a particular
audience’s needs, and reducing the subject matter of the collection.
Enter The Vertical
If one views Internet Searching as an
evolutionary process, the progress from Search Engine to Searchable Directory
was a step in the right direction. I’m old enough in web years to remember the
word of mouth "buzz" that Yahoo! caused when it first arrived. Lycos was king,
and the "knowledge structure" approach Yahoo provided seemed so much easier (to
The evolution has now continued with the
addition of Subject Specific Collections or "Verticals". Sites like About.com,
VerticalNet, and to some degree the Open Directory Project.
These sites have taken the emphasis off of being all things
to all people, and placed the focus
on collecting for a single subject, and in some cases, a specific audience.
Another important addition for the Verticals is the use of "Editors" and
"Guides." Editors represent the first foray into the area of subject expertise.
Now, there is an obvious critique regarding the use of untrained editors, but
the combination of expertise with "human indexing" represents a marked
improvement in the overall collection technique.
Where Do We Go From
The searchable collection of the future
will inevitably have more requirements for content inclusion. Many of the coming
trends will likely mirror the theory of "Library Science" and how paper based
collections are developed today. Some of the possibilities may
- Proactive Collection Development:
Search Sites will actively develop their collections
rather than passively waiting for content submissions.
- A formal
will be posted and
adhered to in defining submission and selection criteria. Content can still be
used as a marketing tool, but the free-for-all has to stop.
- A Chosen Audience: Picking an
audience and sticking with that audience will enable Web Sites to narrow the
amount and quality of available content. A properly profiled audience may also
be the "marketable" site of the future.
- Format Limitations: Sites may choose to collect content that meets a certain
"structural format" such as short stories of more than 1500 words, or "subject
format" such as a collection of
legal contracts and agreements .
From the larger perspective, I suspect that
the larger Portals will become the "entry level tools" of future searching.
While the Portals become the hubs of commerce and distribution centers of
breaking news, the content searching function may become less important in the
overall business model (if it hasn’t already). Add to that mix - content
distribution deals, meta-searching, and a number of large commerce-centric
portal-plays by even larger multinational corporations, and the future looks
like a lot of fun.
Steve Matthews is the founder of BPubs.com – The
Business Publications Search Engine
, a Professional
Librarian, and has held many Internet related consulting and implementation
positions. On the ‘Net since 1994, Steve is an avid follower of Search Engine
trends and business models. Comments and questions can be forwarded to
Can you afford
not to know this stuff?
up for the weekly Traffick Newsletter
and stay informed!