The Terrier Project

March 11, 2005 Comments

Bill Hartzer

Bill Hartzer

Articles



Researchers at the University of Glasgow's Department of Computing Science have been working for the past three years on a project that's now called Terrier. Terrier stands for TERabyte RetrIEveR. Terrier is cutting-edge software for the rapid development of Web, intranet and desktop search engines. It is a proejct funded by a UK Engineering and Physical Sciences Research Council (EPSRC) grant. The Terrier project team currently consists of three researchers, five PhD students, and five programmers.

According to the Terrier project website, it is written in Java and has been "successfully used for adhoc retrieval and Web search and cross-language retrieval in a centralised or distributed setting...Terrier originated from a framework initially developed by Gianni Amati. Since then, more people became actively involved in extending and optimising this framework. Terrier has been used for conducting fruitful experimentation, with excellent outcome, allowing for a better understanding of theoretical Information Retrieval."

Some of Terrier's features include hyperlink structure analysis, a combination of evidence approaches, automatic query expansion/re-formulation techniques, query performance predictors, and compression techniques. Most current commercial search engines incorporate a link analysis component in their document ranking algorithm, such as Google's PageRank and IBM’s HITS (Teoma.com). Terrier is different because it includes a novel link analysis component that is more general than Google’s PageRank, does not use parameters such as the damping factor, can be applied in a query-dependent or independent way, and could be used in various applications, such as multilingual retrieval. Other features include length normalisation, a retrieval approaches selector, and dynamic selection of Web retrieval approaches. It includes a set of performance predictors and has a very low computational overhead.

Terrier is available for download as open source software under the Mozilla Public License (MPL). You can download the open source version or find out more information about it at the project's website (http://ir.dcs.gla.ac.uk/terrier/).

Discuss this article in the Small Business Ideas forum.




Comments

Search Engine Optimization Manager, Vizion Interactive
Chairman of the DFW Search Engine Marketing Association