Features: Track It Down

Windows Magazine
July 1, 1998

Track It Down Finding the information you need among the Web's 300 million-plus pages can be time consuming-and frustrating. Here's how to use Web search tools to zero in on exactly what you're looking for.

- Fred Langa

Great Starting Points
Which Search Engine Should You Use?
Blurred Lines
Narrowing the Field
As Ye Query, So Shall Ye Find
When Boolean Isn't Cool
Better than Boolean?

Searching Beyond the Indexed Web
The Truth Is Out There...
Build a Better Query
Search Assistants for Hire
Publicize Your Pages
These Sites Mean Business
Fred Langa

Great Starting Points

There are tons of search engines out there-we stopped counting at around 500. But here are some of the best we found.
Submit to Search Engines Submit It and NetCreations.

Keyword indexes
AltaVista: http://altavista.digital.com
AOL NetFind: http://www.aol.com/netfind
Excite: http://www.excite.com
HotBot: http://www.hotbot.com
Infoseek: http://www.infoseek.com
LookSmart: http://www.looksmart.com
Lycos: http://www.lycos.com
Northern Light: http://www.northernlight.com

Subject directories
Galaxy: http://www.einet.net
Magellan: http://www.mckinley.com
NetGuide: http://www.netguide.com
WebCrawler: http://www.webcrawler.com
Yahoo: http://www.yahoo.com

Metasearch engines
All4one Search Machine: http://www.all4one.com
Dogpile: http://www.dogpile.com
Highway 61: http://www.highway61.com
HuskySearch: http://huskysearch.cs.washington.edu
Inference Find: http://www.inference.com/ifind
Mamma: http://www.mamma.com
MetaCrawler: http://www.metacrawler.com
MetaFind: http://www.metafind.com
OneSeek.com: http://www.oneseek.com
ProFusion: http://profusion.ittc.ukans.edu
SavvySearch: http://savvy.cs.colostate.edu:2000

Search engine information
Beaucoup: http://www.beaucoup.com/engbig.html
Media Metrix: http://www.mediametrix.com
Search Engine Watch: http://www.searchenginewatch.com
SeekHelp.com: http://www.seekhelp.com
University of California at Berkeley Library: Internet Resources,
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet
Meta-Search Engines,
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/MetaSearch.html

Back to Top of Page

Search the Web for specific information, and you might feel like you're trying to find the proverbial needle in a haystack. But new search sites-and enhancements to old standbys-cut the haystack down to size and promise to ease the frustration of thousands of fruitless hits. You can even enjoy "one-stop shopping" by using a single query that goes out to multiple search sites, or get a running start by tapping into one of the highly specialized engines that have cropped up.

Yet these solutions have created their own problem: With dozens of search engines and features to choose from, where do you even start?

Consider, too, how much ground you're really covering. The Web is a big place, and even the largest search engine, HotBot, indexes only about a third of its pages. AltaVista, the second-largest engine, covers only about 28 percent, while some other well-known engines index only a single-digit percentage of the Web's pages. So even if you do find a search site that suits you, your query might still miss the mark.

Fortunately, there are ways around these problems. You can find what you're looking for if you use the right tools the right way. We'll look at how the different types of search engines work and help you decide which site is best for you. Then we'll show you how to construct effective queries that will hunt through the Web's stacks of data and return only the information you need.

Back to Top of Page

Which Search Engine Should You Use?

There are literally hundreds of search engines now available that let you tap a range of Internet information, but they all fall into three main categories. Once you know what they are and how they work, you'll have a better understanding of their strengths and weaknesses.

Keyword indexes - such as AltaVista, HotBot and Lycos-produce an index of all the text on the sites they examine. Typically, the engine reads at least the first few hundred words on a page, including the title, the HTML "alt text" coded into Web-page images, and any keywords or descriptions that the author has built into the page structure (see the sidebar "Publicize Your Pages"). The engine tries to ignore raw HTML code, JavaScript commands and the like, and throws out garbage words such as "and," "the," "by" and "for." The engine assumes whatever words are left are valid page content; it then alphabetizes these words (with their associated sites) and places them in an index where they can be searched and retrieved.

This type of search engine usually does no content analysis per se, but will use word placement and frequency to determine how a page ranks among other pages containing the same or similar words. For example, when someone searches for the word Pentium, a page with "Pentium" in its title will appear higher in the search results than a site that doesn't mention "Pentium" in the title. Likewise, a page with 20 mentions of "Pentium" in the body text will rank higher than a page with one instance of the word.

Keyword indexes tend to be fast and broad; you'll typically get search results in seconds (faster than other kinds of engines). But unless you're careful about how you construct your query, you're likely to be overwhelmed with data.

Subject directories - such as Galaxy, NetGuide and Yahoo-are the card catalogs of the Web: They assign sites to specific topic categories based on the site's content. Usually, human judgment is involved. Some employ a review staff to categorize sites; others allow site owners to categorize and describe their own pages; still others ask random site visitors to rate sites.

The advantage of this approach is that sites are pregrouped and easier to browse than those in a raw keyword index. A human-generated subject directory also allows more nuance and subtlety than machine-generated keyword indexes, and should be able to offer meaningful advice on not only where the content is, but how good or bad it is.

However, humans aren't as efficient as machines, and human-generated directories can never be as comprehensive or up-to-date as machine-generated sites. In addition, human judgments are subjective. If you happen to think the same way as the site reviewers, you'll find great value in these subject directories. But if you and the reviewers are on different wavelengths, the site's categorization might seem arbitrary and hard to understand-and you might find that their top picks aren't pertinent to your needs.

Metasearch engines - such as Dogpile, Inference Find and MetaCrawler-allow you to search a number of databases and engines simultaneously; some even deliver your search results in a single, integrated, rank-ordered list.

A metasearch's major strengths are convenience and breadth: It's easier to harness the power of multiple search engines simultaneously than to visit them one at a time. The multiple searches also let you sift through a wider range of pages than you could access on any single-engine search.

The downside is that metasearches often use the lowest common search denominators. Different engines parse queries differently (more on this later), treat upper- and lower-case letters in queries differently, allow or disallow natural-language queries and so on. To work with the widest possible number of search engines, metasearches tend to use only simple, straightforward search strategies-making it hard (or impossible) to access each search engine's specialized features. If all you need is a general search, great. But if you need a more refined search, a metasearch isn't a good choice.

Back to Top of Page

Blurred Lines

The above categories are useful as a general guide, but in real life, it's not that cut-and-dried. The major engines have all tried to overcome the weaknesses of whatever category they're in by offering services from other categories.

For example, Yahoo (the premier subject directory) lets you launch AltaVista (one of the best keyword indexes) from within its pages; Yahoo also gives you direct access to Vicinity's online business directory, as well as Four11's "white pages" of e-mail addresses (http://www.four11.com). Likewise, AltaVista lets you augment its keyword index with a subject search powered by LookSmart and with e-mail directory searches provided by Switchboard (http://www.switchboard.com). These examples only hint at the incredible amount of integration and cross-pollination that's a hallmark of the Web today. In fact, almost without exception, you can access any kind of search from any of the major search engines.

As the lines between search engine types continue to blur, many sites now try to entice users with unusual "advanced searches" and other special features. HotBot's "SuperSearch," for instance, is a wonderfully convenient way to build a sophisticated query using dead-simple Web-based forms and menus. In addition, AltaVista has a free service that will translate pages in its database into any of five other languages with just a single click. Some sites, including Excite and Yahoo, now even offer search-page personalization features.

All this heralds a sea change in search engines. A year ago, when you needed to search the Web, your first question would have been, "Where do I start?" The answer depended on whether you were looking for a URL, an e-mail address, a stock history, a weather forecast and so on. But with today's integration of search features, it actually matters much less where you start than how you start.

Back to Top of Page

Narrowing the Field

Considering the Web's vastness, if there's a specialty search engine or service that you already know about that's appropriate for your query, it's a good idea to start there rather than with a general search engine. But often the reason you're searching for things is precisely because you don't know where to find the information you need. In that case, your best bet is to start with the general and move to the specific. This approach to Web searches also opens you up to serendipitous finds that a more narrow initial search might bypass.

Start with a metasearch. Because metasearches instantly tap a number of standalone engines, they let you cast your net very widely. If your search target is an uncommon word or phrase, a metasearch might be all you need, delivering a manageable number of narrowly constrained results right from the start. But metasearches aren't very configurable, and if yours is a fairly common search item, you might find yourself on the receiving end of a uselessly large flood of search results. If that happens, you should try moving on to a standalone search engine.

Choose a major engine's "advanced search" option. If you've bookmarked your favorite engine's home page, change the target to point to the site's advanced search. The advanced option (often called different names by different standalone sites) gives you better defaults, far more flexibility and improved precision. For example, if you query AltaVista's "Simple Search" for something like Windows 98 Setup, AltaVista delivers a time-wasting list of over 2 million pages. But try entering the exact same query in AltaVista's "Advanced Search." Without any extra work from you, the engine returns just 26 documents, all relevant to the search topic. Now that's a power search!

Back to Top of Page

As Ye Query, So Shall Ye Find

It's very important that you build your query with care. This is at once the easiest and hardest part of your search. All the major search engines have their own peculiarities, their own tricks and shortcuts-but, fortunately, there are several tips that work with most of the engines, and knowing them can save you valuable time. As suggested earlier, let's begin in the advanced search portion of a search site.

In general, use many words rather than just one or a few. For example, if you're searching for information on the legal wrangling over control of the Java language, a search on "Sun Microsoft Java Suit"? would get you far more relevant information than would a simple search on Java. If you do need to construct a single-term query, avoid broad terms, such as Windows, shareware or news.

Avoid natural language, even if the site says it can handle it. Perhaps a search engine will let you type Please tell me about all the sites that discuss the capabilities of alternate CPU chipsets, but you're risking getting hits on irrelevant words such as "site" or "discuss." Natural language is complex; keep your searches simple and you'll do better. In this case, simply enter alternate CPU chipsets.

Search not only for your target words, but also for synonyms. For instance, instead of searching for just motherboards, it's helpful to also search for mainboard and planar, two common synonyms for motherboard. Similarly, hard drives are sometimes called Winchesters or hard files, and so on. Some search engines even provide an online thesaurus expressly for this purpose.

Look for common variants of your search words, including singulars, plurals, upper- and lower-case letters, and so on. If your search is on a compound word, you might also try splitting the words. For example, search for both motherboard and mother board. If you use multiple-word queries, enclose the words in double quotes ("mother board") so the search engine will still treat them as a single term.

Quotation marks also help when you have to search for letters and numbers (NT or 3D, for instance) because thousands of pages contain these terms. Just think of all the English words that happen to have the letters "nt" somewhere in them! When you search for terms such as these, make them part of a phrase in quotes: "Windows NT 5.0" or "3D graphics boards". And that leads to the last, and perhaps most powerful tip.

Whenever possible, search for exact phrases. This is probably a power-searcher's single most potent weapon, because a phrase of several words rapidly winnows out extraneous sites and narrows a search to its essence-particularly when you search for the exact order. For example, if you ask HotBot's default search ("all the words") to look for LCD panel prices, it'll return more than 5,000 matches. But if you select HotBot's "exact phrase" option (other sites simply let you indicate an exact phrase with quotation marks), the engine returns just 37 documents, all with very high relevance.

Most engines include a help file detailing query methods that might be unique. If at any point on any search you're stumped as to what to do next or are unhappy with the quality of the results, try the help file for useful tips and tricks to wring more out of your searches.

Back to Top of Page

Boolean Isn't Cool

It might be heresy to say this, but you should avoid simple Boolean logic in your searches whenever possible. You probably don't need it for most Web searches, and there are better and less-ambiguous ways to construct your query. In fact, Boolean searching should be reserved only for instances when all else fails, which isn't often.

The problem isn't that Boolean logic is hard-it's actually quite simple, employing AND, OR, AND NOT (or just NOT) and sometimes NEAR to compare and filter sets of data. For instance, if you search for Windows AND magazine, you should see all pages that have both the word "Windows" and the word "magazine" in them. Searching for Windows OR magazine will show you all the pages that have "Windows" in them and, separately, all the pages that have "magazine" in them-a much larger universe. Windows AND NOT magazine will show you pages that mention "Windows" but not "magazine." Sounds simple, right?

The trouble is, different engines can interpret the same Boolean expression in different ways. Some engines parse a query from left to right, applying each Boolean operator in turn. But other engines assign their own operator priority, where AND ranks higher than OR, for example. And that's where it gets dicey.

Let's say you need to find all sites that mention either half of the Wintel alliance with IBM. You could construct a Boolean query this way: Microsoft OR Intel AND IBM. This would work fine on sites that evaluate Boolean searches from left to right. In this case, the query functions like this: (Microsoft OR Intel) AND IBM, and delivers pages that mention "Microsoft" and "IBM," or "Intel" and "IBM." But engines that assign priorities to Boolean operators could interpret the exact same query as Microsoft OR (Intel AND IBM). That would give you a very different result: You'd get a list of pages that mention "Microsoft" as well as pages that mention both "Intel" and "IBM" together.

This is one reason why different search engines can return dramatically different results on what looks like the same search: Behind the scenes, the search engines might be set up to interpret your queries quite differently.

Back to Top of Page

Better than Boolean?

If you must use Boolean operators, the major search engines let you avoid confusion by using parentheses in exactly the way shown above. It's a little more complex than just typing a simple Boolean string, but it gives you much finer control over which search terms are affected by what operators. With liberal use of parentheses, you can construct extremely complex-yet crystal-clear-Boolean searches. For example, to find documents that contain information about "Microsoft," "Mr. and Mrs. Gates" and "Microsoft VP Steve Ballmer," you could use this advanced query: Microsoft AND (((Melinda OR Mrs.) NEAR Gates) AND ((Bill OR William) NEAR Gates) AND ((Steve OR Steven) NEAR Ballmer)).

If you'd rather avoid proliferating parentheses, most search engines let you perform Boolean-like operations with simple plus (+) and minus (-) signs: A plus sign indicates that the word that follows must be present in the documents, and a minus sign indicates that the following word must not be present. This gives you a functional equivalent to Boolean ANDs and NOTs, and eliminates the complications of ORs, operator precedence and parentheses.

For example, the search +Borland+Kahn returns all pages that mention "Borland" (now called Inprise) and its former CEO, Philippe "Kahn." The search +Borland-Kahn returns only pages mentioning "Borland" that make no mention of "Kahn."

Back to Top of Page

Searching Beyond the Indexed Web

Web searches-believe it or not-are really that easy. Start with a general metasearch, move on to a standalone engine's advanced search and carefully construct your query. Chances are, you'll find exactly what you're looking for-if it's in the indexed portion of the Web.

But some topics might not be on the Web, or they might be in an unindexed portion of the Web. What then?

According to one recent study by S. Lawrence and C. L. Giles that was published in the journal Science (http://www.sciencemag.org), search engines might only tap about 60 percent of the Web-but humans tap it all. So your first step should be to search the part of the online world populated by people rather than pages: Usenet. The postings in these newsgroups contain a wealth of information on almost any conceivable topic. And even if you don't find exactly what you're seeking, there's a good chance you'll find knowledgeable people who can help track down whatever it is you're looking for.

Searching Usenet is easy. In addition to the classic Usenet search engine, Deja News (http://www.dejanews.com), the largest search engines now also offer their own local searches of Usenet content.

If neither the Web nor Usenet has the information you seek, there are two free sites that can link you to just about any kind of content: Beyond General World Wide Web Searching (http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/BeyondWeb.html) and Librarian's Index to the Internet (http://sunsite.berkeley.edu/InternetIndex). Both sites are produced by the University of California at Berkeley, and both are among the richest resources of their kind.

Other sites, such as Inquisit and Northern Light, offer specialty, paid search services geared toward business users (see the sidebar "These Sites Mean Business"). Inquisit offers a "push-like" approach to Web searches: It lets you define custom queries that tap a range of news services and business journals. Inquisit's engine collects the search results and automatically delivers them to your e-mail's inbox. Northern Light lets you collect information from more than 2 million trade journals and specialty databases not available on the Web.

Back to Top of Page

Truth Is Out There...

The volume of content on the Web is now equal to that of the largest physical libraries in the world. But with today's Web-search tools and the right search techniques, it's all at your fingertips-more accessible, and more useful, than ever.

Back to Top of Page

Build a Better Query

No matter which search engine you use, tracking down the info you need starts with a well-built query. Follow these tips and tricks for constructing a more effective and time-saving Web search.

Use a site's advanced search option, if available.
In general, search for many words rather than just one or a few.
Avoid natural language, even if a site says it can handle it.
Search not only for your target words, but also for synonyms.
Include common variants of your search words in your queries.
Avoid searching for standalone letters and numbers (such as NT or 3D); place them in quotes if you must search for them.
Whenever possible, search for complete phrases; place the phrase in quotes.
Avoid complex Boolean searches if possible.
If you must use a complex Boolean query, use parentheses to organize the query and to ensure that the search engine parses the query the way you intend.
Use plus and minus signs to indicate what must (or must not) be included in your results.
Read the search engine's help files; they contain a gold mine of useful tips and hints.

- Fred Langa
Back to Top of Page

Search Assistants for Hire

Finding information online is only half the battle-managing the Web data you find can be an even more time-consuming chore. An offline search assistant can help, by simultaneously querying several search sites and then storing and organizing the results on your hard disk. Here are our favorites.

Inforia Quest 98
Inforia Quest 98 is one of the easiest search programs we've ever used. Putting a query together is a breeze, and you can filter your search universe down to an individual site or query only non-U.S. Web indexes and engines. It also lets you define subsets of the sites it searches, and then query only those subsets for a particular search.

$29.95
download, $24.95
Inforia
562-802-0986, fax 562-926-8691
Winfo #774

WebCompass 2.0
We loved WebCompass when we first looked at it (see Beyond Browsing, March 1997), and it's held a spot on our WinList ever since. It presents comprehensive summaries of the sites it finds and features a customizable list of search engines and scheduling agents. The interface is simple to master, and you can build custom queries with ease.

$49.95
Quarterdeck Corp.
800-354-3222, 573-443-3282
Winfo #723

WebFerretPRO 2.6102
We've placed WebFerret (the slimmed-down version of WebFerretPRO) in our top 100 shareware list (see Shareware & Freeware: The 100 Best, May)-and with good reason. Its claim to fame is speed and simplicity-no elaborate agents to train, and no waiting while it sorts and categorizes results. Just feed it a query and you'll receive URLs in seconds. WebFerretPRO supports Boolean operators and focuses on metasearches of major engines and indexes (no news or phone lists), making it ideal for research.

$26.95
FerretSoft
888-236-2446, 913-385-3888
Winfo #775

WebSeeker 3.3
Stick to WebSeeker's FilterFind option when using this search assistant-it indexes results so you can easily search the returned pages later. It also lets you create archives of Web sites that you can mine for further information without performing another live query. The indexing slows the program down, but you can save queries to help speed up the search of a result set. You can also automatically reject sites, skip pages you've already seen and set auto-alerts for changed pages.

$49.95
Blue Squirrel
800-403-0925, 801-523-1063
Winfo #776

- Amy Helen Johnson
Back to Top of Page

Publicize Your Pages

Just as there's an art to finding things on the Web, there's also an art to being found. Here are some easy steps you can take to help make sure your site gets noticed by search engines.

Page titles. The first and most important piece of information most search engines use to index and rank a site is the home page's title. Many sites simply use "Welcome" or "Home Page" as the title, but these waste an opportunity. Your page's title should ideally be five to 15 words in length, and can include what the page is, your organization and product names, and even a short descriptive phrase. WINDOWS Magazine uses this as its title:
<title>Windows Magazine: PC Tips, News & Reviews</title>

Meta tags. An HTML meta tag is included in a page expressly for site ranking; the text is never displayed to site visitors, but is processed by search engines. The two primary tags are descriptions and keywords:

Descriptions. If you include a description meta tag, most search engines will use that as the basis for your site�s description in the engine�s index. The WinMag site includes this, for example:
<meta NAME="description" CONTENT="The number one resource for information about Microsoft Windows.">

Keywords.In addition to whatever keywords a search engine might generate on its own from your page's text, you can provide additional keywords to the engine via the keyword meta tag:
<meta NAME="keywords" CONTENT="Windows Magazine Internet Web WinMag Win95 NT free tips downloads files WinTune software hardware news PC">

Frequency. The more often your site mentions a word or phrase, the higher your page will place in searches for that word or phrase. But don't cram your page with meaningless repetitions: The better search engines are onto this trick and lower the rankings for pages with ridiculously high keyword repeat rates. Keep important words or phrases near the top of your pages.

HTML "alt text." Most search engines will parse and index the "alt text" included with your graphics. This gives you an additional opportunity to include keywords and phrases that might improve your site's ranking.

Page design. Many search engines are stymied by framed pages; they might only parse and index the frameset page rather than the actual content pages. To avoid this problem, use HTML tables instead.

Reciprocal links. Try to get sites with content similar to yours to link to your page in exchange for a reciprocal link. Some search engines place an importance on pages with a high number of referring sites.

Register your site. Many search engines let you submit URLs for the engine's spider to index. AltaVista, for example, offers an "Add/Remove URL" service at the bottom of its home page.

For more information
There's a lot more-manipulating rankings is a black art unto itself. One of the best resources is Search Engine Watch's Webmaster's Guide to Search Engines (http://www.searchenginewatch.com/webmasters). Another site, RankThis (http://rankthis.webpromote.com), offers good advice, as well as a free online tool to help you improve your site's rankings.

- Fred Langa
Back to Top of Page

These Sites Mean Business

Sometimes a standard search engine isn't enough to find what you need-especially if you're seeking highly specialized or industry-specific material. But a number of low-cost (and free) business-oriented services are available to help you.

Inquisit: http://www.inquisit.com.
$12.95/month; free 14-day trial available.

Inquisit is an excellent resource for business news and new product announcements. Based on agents you create, it searches thousands of publications, trade journals and news services and sends you e-mail with full-text or capsule results. You can use basic Boolean operators in your queries to search the service's broad categories. Delivery methods and schedules are simple to set up; you can specify hourly deliveries and receive alerts by pager.

Northern Light: http://www.northernlight.com.
Special collections, $4.95/month for 50 documents; per-document pricing available.

In addition to its free Web searches, Northern Light's paid services let you tap a broad selection of trade journals and specialty databases not stored on the Web. Using the engine's industry-specific searches, we queried the tax and accounting categories for information on certain IRS rules; two of the first three citations that it found suited our needs perfectly. Unfortunately, you can't save searches, and each new search requires a trip back to the home page.

Shopfind: http://www.shopfind.com.
Free.

Shopfind searches cyberstores to help you find bargains online. When we searched for office supplies, Shopfind returned a variety of sites-with descriptions and prices included for most of them. Shopfind sorts results by store, so they're easy to read. However, there's little help available and no Boolean support.

EDGAR (Electronic Data Gathering, Analysis and Retrieval), http://www.sec.gov/edgarhp.htm.
Free.

Once you get past EDGAR's rudimentary interface and obscure jargon, the Securities and Exchange Commission's database is a gold mine of information about public U.S. companies. The data is in raw format-a direct copy of companies' filings-so you'll have to dig to find what you need. The database only goes back to 1994, and EDGAR lacks automatic notification features.

- Amy Helen Johnson
Back to Top of Page

Fred Langa

Is a senior consulting editor and columnist for WINDOWS Magazine, and writes a weekly column for CMPnet. Contact Fred through his home page at http://www.langa.com, or care of the editor at the addresses on page 20.

Back to Top of Page