There are literally hundreds of search engines now available that let you tap a range of Internet information, but they all fall into three main categories. Once you know what they are and how they work, you'll have a better understanding of their strengths and weaknesses.
Keyword indexes - such as AltaVista, HotBot and Lycos-produce an index of all the text on the sites they examine. Typically, the engine reads at least the first few hundred words on a page, including the title, the HTML "alt text" coded into Web-page images, and any keywords or descriptions that the author has built into the page structure (see the sidebar "Publicize Your Pages"). The engine tries to ignore raw HTML code, JavaScript commands and the like, and throws out garbage words such as "and," "the," "by" and "for." The engine assumes whatever words are left are valid page content; it then alphabetizes these words (with their associated sites) and places them in an index where they can be searched and retrieved.
This type of search engine usually does no content analysis per se, but will use word placement and frequency to determine how a page ranks among other pages containing the same or similar words. For example, when someone searches for the word Pentium, a page with "Pentium" in its title will appear higher in the search results than a site that doesn't mention "Pentium" in the title. Likewise, a page with 20 mentions of "Pentium" in the body text will rank higher than a page with one instance of the word.
Keyword indexes tend to be fast and broad; you'll typically get search results in seconds (faster than other kinds of engines). But unless you're careful about how you construct your query, you're likely to be overwhelmed with data.
Subject directories - such as Galaxy, NetGuide and Yahoo-are the card catalogs of the Web: They assign sites to specific topic categories based on the site's content. Usually, human judgment is involved. Some employ a review staff to categorize sites; others allow site owners to categorize and describe their own pages; still others ask random site visitors to rate sites.
The advantage of this approach is that sites are pregrouped and easier to browse than those in a raw keyword index. A human-generated subject directory also allows more nuance and subtlety than machine-generated keyword indexes, and should be able to offer meaningful advice on not only where the content is, but how good or bad it is.
However, humans aren't as efficient as machines, and human-generated directories can never be as comprehensive or up-to-date as machine-generated sites. In addition, human judgments are subjective. If you happen to think the same way as the site reviewers, you'll find great value in these subject directories. But if you and the reviewers are on different wavelengths, the site's categorization might seem arbitrary and hard to understand-and you might find that their top picks aren't pertinent to your needs.
Metasearch engines -
such as Dogpile, Inference Find and MetaCrawler-allow you to search a number of databases and engines simultaneously; some even deliver your search results in a single, integrated, rank-ordered list.
A metasearch's major strengths are convenience and breadth: It's easier to harness the power of multiple search engines simultaneously than to visit them one at a time. The multiple searches also let you sift through a wider range of pages than you could access on any single-engine search.
The downside is that metasearches often use the lowest common search denominators. Different engines parse queries differently (more on this later), treat upper- and lower-case letters in queries differently, allow or disallow natural-language queries and so on. To work with the widest possible number of search engines, metasearches tend to use only simple, straightforward search strategies-making it hard (or impossible) to access each search engine's specialized features. If all you need is a general search, great. But if you need a more refined search, a metasearch isn't a good choice.
Back to Top of Page