Testing Six of the Best
IW Labs Test
By David Haskin
Web search engines are smarter AND stronger than ever. We test six of the best.
To get gold, you must sift through mounds of raw ore. To find valuable nuggets of information on the Internet, you've got to sift through an almost unfathomable number of Web pages-which explains the popularity of Web search services. These sites are information refineries, helping us quickly distill useful material from the mountains of digital dross that comprises the Internet. Search engines also are intensely competitive products, trying to win your loyalty both with fast-and-furious marketing campaigns and by constantly improving their technology. The hot search engine of last year is not necessarily this year's best. Search engines also compete with directories, like Yahoo!, which organize sites by subject categories instead of listing them in an order based on how relevant they are to your search terms. All search engines use the same basic technology: "Spiders" automatically roam the Web looking at the contents of every Web site they have the resources to find, making copies of at least some of the contents. Indexing tools digest this gathered material so that when you conduct a search, relevancy ranking software can prioritize the sites. Although the underlying technology is similar, each search engine implements it differently.
SPIDER TECHNOLOGY
Spider (a.k.a. robot or 'bot) software perpetually crawls from site to site, taking a complete measure of the Web. Once it finds a new site or one that has changed, it sends material back to the search engine. While some spiders grab every site they find, others prioritize their efforts by determining a site's popularity or how frequently it has changed. The assumption here is that popular and frequently changed sites are more likely to be the most useful. Deciding whether to be selective or all-inclusive is tricky. Say an undiscriminating spider finds a site that hasn't been updated in two years and sends back 60 pages for indexing. This could clog your search results lists with time-wasting, out of date pages. On the other hand, old information isn't necessarily bad; there might be valuable data hidden away on that old site.
INDEXING
It would be untenable for search engines to examine all Web sites every time you conducted a search. Instead, search engines use indexes, which list the words contained in as many as 100 million Web pages, as well as other information about the sites. Each item listed in the index points to the resource in which it is located. When you run a search, the engine consults its index and displays a list of all sites that match your query-together with links to those sites. Indexing technology isn't new, but each search engine has its own implementation-primarily at the spider stage. Some engines, such as AltaVista, Excite, and Infoseek, have their spiders send back every word in a Web site for indexing. Others return only a portion of the text in a Web site-just the first half, for example. As with spidering, indexing more data may either waste your time with nonproductive results or unearth nuggets of information hidden deeply in obscure sites.
RELEVANCE RANKING
Let's say that our broad search for "bicycle" nets 25,000 Web pages. Obviously, that's too many sites for you to wade through in order to find the specific information you want. To address this problem, search engines try to determine which sites are most relevant to you. All engines display the most relevant sites at the top of the search results list. They also provide additional cues, such as a numerical ranking. Again, search engines use different techniques for determining relevancy. One common technique is to count the number of times the search term appears. The logic is that if the word "bicycle" appears repeatedly, it's more likely to be on that topic than a page with just one instance of the word. Some search engines weigh how often the search term appears in the first n words of the document, the reasoning being that documents are more likely to be relevant if search terms appear higher. If your query includes multiple words, the search engine may consider how closely together those terms appear.
FLESHING OUT FINDS
Besides relevancy rankings, all engines provide at least some textual description of found sites. These are essential for determining whether a site meets your needs. Some sites create the descriptions by using the first n characters of the document, which sometimes results in the display of meaningless information such as menu options. Others, such as Excite and Lycos, use their own technology to extract key words and phrases that describe the site. These techniques employ linguistic analysis to ferret out important words. Still other sites use the HTML description tag in which the site developer describes the site. Where developers are conscientious, this is useful. If the description is missing or poorly written, the citation in the search results is not useful.
ENGINES OF CHOICE
The search engine business moves quickly. For example, WebCrawler (http://www.webcrawler.com) was one of the first search engines to appear and we included it in our last roundup. It's still around, but it's no longer competitive so we left it out. Moreover, new search engines seemingly appear every week. However, we didn't include new search sites unless they offered something unique and useful, or unless their capabilities compared well with the established search engines. The six sites we review are the top dogs; the ones that give the best overall results on a variety of searches. They're not equal: Some find more information than others, and some offer more non-search services. But each has something valuable to offer.
THE REVIEWS
AltaVista
http://www.altavista.digital.com
In a dramatic, last-minute turnaround worthy of The Karate Kid, AltaVista has recaptured its crown as monarch of brute-force searches. Just before we went to press, it released to the public a new index that digests 100 million pages, roughly twice as many as its chief competitors. Test results were commensurate: AltaVista consistently found more Web documents-often many more-than HotBot, its closest rival. There's more good news, too. AltaVista's interface, which formerly was a bit of a garbled mess, is much improved (though still not quite as easy to use as HotBot's). The home page now contains only a search form, an ad, and standard menu options. Similarly, its search results screens are far less cluttered and easier to navigate than before. Its relevancy rankings proved reliable in our tests, though its site descriptions often weren't useful. AltaVista is trying some innovative approaches for refining searches. The most successful of these attempts occurs after you search, then click the Refine button. A series of words gleaned from found documents appears; you can elect to exclude or "require" each of these associated terms in a subsequent search. Among the refining words for our search on "bicycle" were "racing" and "helmet." Refining the search with those words helped find pages that discussed issues related to racers wearing helmets. There's also a flow chart that graphically summarizes search results, letting you choose among words associated with the top-level key words found at the site and include these in subsequent, refining searches. Overall, we found this feature less useful as it was somewhat time-consuming and imprecise. AltaVista has added an advanced search page in which you can frame complex Boolean operators. You can perform proximity searches as well as search for pages changed within a specific timeframe. Another unique feature allows you tell AltaVista to attach added importance to specific words when creating its relevancy rankings. However, you still must master its obtuse search syntax to create complex searches. AltaVista's usability has improved to the point that it's now on a par with that of most other search engines-although, again, it still doesn't match HotBot's friendly forms-based approach. Nonetheless, AltaVista has clawed its way back to the top spot when measuring raw search power. If AltaVista can't find it, there's a high likelihood that it's not on the Web.
Excite
http://www.excite.com
Excite's 50 million indexed pages match the numbers found at other leading search engines, but its search results were consistently in the bottom half of the group. Nonetheless, Excite still offers a lot. For instance, we appreciated its ability to search for concepts: A query for "martial arts" finds sites about kick-boxing and karate even if the original search term isn't in the page. After a search is done, Excite suggests words for narrowing the query. A search for sites about the Zen poet Basho suggested we also add the search term "haiku," an astute suggestion. Unfortunately, the feature didn't always work this well. After conducting a broad search for "bicycle," among the words it suggested adding to refine the search were bicyclists, bicycles, bike, and biking. Power searchers will appreciate the appropriately named Power Search page, a forms-based aid that lets you quickly create Boolean searches. Like Infoseek, Excite also does an excellent job of helping you after you conduct a search. Besides its suggestions for further narrowing the search, if you find a page that appeals to you, clicking on a link automatically finds more pages that are similar to it. We found Excite's relevancy rankings to be solidly accurate, although we disliked its site summaries. Excite draws its summaries from text within the page, but not necessarily from introductory text. The descriptions often mirrored the second, third, or fourth sentence on a page, which often weren't descriptive. Excite has a directory of 140,000 Yahoo-style listings, which pales compared to Infoseek's 500,000 directory listings. And it doesn't offer much brute search power. However, it does make searching-and refining searches-simple, and its excellent relevancy ranking helps you home in on useful information.
HotBot
http://www.hotbot.com
In the topsy-turvy world of Web search services, HotBot unseated AltaVista as the champ a few months ago when we evaluated the services for iw.com, only to be bumped by AltaVista in this roundup. Still, HotBot is a powerful and easy-to-use site that consistently came in second behind AltaVista in our test searches. Most notably, HotBot claims to reindex its approximately 54 million pages every two weeks! That proved out in our tests: Sites that had been updated in the last 10 to 14 days consistently turned up in our HotBot searches but not in searches of the other engines. Moreover, HotBot's advanced querying capabilities are, along with AltaVista's, the strongest in the group. It will return exact matches to your query or near matches. You can limit your search to specific domains (such as .com or .org) and to specific geographic locations. It can even find items embedded in Web pages like ActiveX controls, Java applets, and specific types of images. Our favorite HotBot feature is a search modifier that searches only for pages that have changed within a specified period of time, ensuring timely results. You also can search a site to a specific depth (three pages deep, for example). In addition, you can save searches for later reuse. HotBot's customizable forms-based interface is, in our view, the easiest to use of any search engine. You type your search term and then select how to modify it from a drop-down list. You can click on icons to display additional forms in which you apply advanced searching techniques such as querying by domain. HotBot's relevancy ranking is generally good but not the best. In our tests, it sometimes wasn't clear why certain pages turned up at the top of the list. Its site descriptions generally were useful, but not always. HotBot pulls its descriptions from the description tag or, when that isn't present, from the top headings. This sometimes results in descriptions that are gibberish. For now, HotBot claims the silver medal for power searching. Its directories, which are due for some work, proved anemic in our tests. But when it comes to finesse, HotBot-and its easy-to-use forms for framing searches-has no equal.
Infoseek
http://www.infoseek.com
Infoseek is both broad and deep. It was a close third to AltaVista and HotBot in overall search power, and it is chock-full of amenities, such as an extensive directory. Infoseek claims approximately 60 million indexed pages, and our tests confirmed that it typically found almost as many pages as HotBot and AltaVista. You can use its natural language query capabilities, but for greater precision, you must learn its somewhat complex advanced search syntax. If you do take the trouble, you'll be amply rewarded. You can, for example, search not just for words but also for sites with links to specific URLs. This enables you to perform tricks like searching for all sites with links to your site. Previously cluttered and confusing, Infoseek's interface is now well organized and easy to navigate. Infoseek also provides the best set of post-search options. To the left of the search results is a column with three headings. Best Bets lists interesting sites that closely match your search criteria. The Related News heading displays recent news stories about your search term. Related Topics lists similar topics. This was sometimes useful, sometimes not: A search for "cherry torte" listed "desserts" and "chocolate" as related topics, but a search for "meaning of life" found "Renting vs. buying" and "Baseball in Japan" as related topics. In addition, you can easily refine your searches by specifying new criteria and searching again within the initial results set. For instance, our search for "bicycle" turned up more than 126,000 hits. A subsequent search within that group for "tire repair" found more than 3,000 Web sites and Usenet postings about that subject. Infoseek doesn't have quite the search muscle of HotBot or AltaVista and its descriptions of found sites often were gibberish compared to Northern Light's and HotBot's. However, Infoseek's myriad other services such as its generous Web directory, access to the news, and its yellow and white pages directories, make it the strongest choice for those who prefer one-stop shopping.
Lycos
http://www.lycos.com
Lycos isn't the most powerful search engine, but it is the most highly customizable-and it does a solid job of combining Web search and directory services. Its customization capabilities let you fine-tune how its relevancy ranking works. You access this feature from Lycos Pro (the advanced search interface). You can set it, for instance, to rank documents in which search terms appear in the title more highly than other documents. Besides tweaking the relevancy ranking, you can conduct the actual search from Lycos Pro. Lycos also provides an extensive number of ways to create and modify searches, including proximity searches and setting the order in which search terms must appear in the documents. You can perform natural-language queries such as, "What is the meaning of life" (which, happily, turned up 43,668 responses). However, unlike the other very customizable search engine reviewed, HotBot, Lycos doesn't let you search based on when pages were last modified. We found Lycos' relevancy ranking consistently to be the strongest in the group. However, its method of extracting descriptions from text in the documents often failed to tell us clearly what the site contained. Despite its flexibility in modifying searches and results, Lycos didn't find as many documents as most of the other search engines. Lycos boasts it has indexed 100 million separate URLs and that it rebuilds its index every two weeks. URLs aren't pages, however, and Lycos failed to match HotBot's ability to find recently updated information. Nonetheless, we like Lycos because of its useful mix of features. Its staff of reviewers has specified a number of Web venues as "Top 5 percent sites." Not only can you read and enjoy the reviews themselves, you can run searches strictly within the reviewed sites if you wish. Lycos also has an extensive site directory, as well as news, weather, sports, and the ability to find businesses and people. Power searchers will probably prefer other services, but Lycos still is attractive if you like many Web services in one place.
Northern Light
http://www.northernlight.com
Northern Light is the newest kid on the block and should quickly gain popularity. Its search prowess already is strong and its unique features make it more useful for serious researchers than the other search sites. Northern Light had been available to the public for only a month when we tested it, yet its results were solidly in the middle of the pack, with particularly strong results on searches for obscure sites. A spokesman said search results would improve over the coming months as the number of indexed pages climbs from 38 to 50 million. The quality of its relevancy ranking and site descriptions were excellent. Besides serving as a Web search engine, Northern Light also aggregates and serves the content of about 1,800 publications, many of which aren't available elsewhere on the Web. You can search the entire Web, its special collection of publications, or both at once. For items found in the special collection, Northern Light provides a free article synopsis. Viewing an entire article costs as much as four dollars, with a typical charge being a dollar. Northern Lights' unique custom search folders are an often-successful attempt to categorize found documents. Northern Light places found documents into categories and subcategories that it displays as an outline. The feature worked best when the search terms were ambiguous. For instance, we performed a search for "web spider" and Northern Light returned articles about both insects and the technology used by Internet search engines. Custom search folders made it obvious which were which and thus saved a lot of time. This system isn't infallible, though: One of the initial subgroups to our search for bicycle was "rhythm & blues." Still, Northern Light is onto something significant. We found many of the extra-cost publications to be specialized enough to be truly useful. For example, we found excellent articles about forest management that weren't available elsewhere on the Web. Combined with its already strong search engine, this makes Northern Light an excellent choice for serious information miners.
Smarter Searches Ahead
Search engine vendors insist that the future's so bright that shades will be de rigueur. The future indeed may be bright, but search engines will evolve at a modest pace over time, experts agree. The simplest, but potentially most useful, changes will be in the search engines' interfaces. The challenge, according to Don DePalma, an analyst for Forrester Research, is to "create a universally understandable visual metaphor" for displaying results instead of the current text-heavy interfaces. Beyond that, expect greater efficiency in compiling and perusing search results. DePalma and other experts say search engines in the future will remember your interests and use that information to interpret your requests. "If I search for 'bridge,' it could mean a physical bridge, a covered bridge, a piece of computer equipment-or the word bridge can even be a verb," DePalma says. "The search engine has to be smart enough to recognize the context of what I want." One way to attain this efficiency is by using intelligent agents. "We all are experimenting heavily with agents," says an executive for one major search engine site. These agents not only will search based on your previous interests, but also will periodically update those searches and notify you of relevant new pages. Simple forms of this technology already are at work at sites such as Firefly (http://www.firefly.net) and WiseWire (http://www.wisewire.com). Gregory Grefenstette, project leader in multilingual text mining at Xerox Research Center in Grenoble, France, says that future search result lists will be quite different. "You won't get back a list of documents, but an analysis of how the words are used," he says. Specifically, natural-language processing will return results to ambiguous searches that resemble book indexes. Such results will help you clearly see how search terms relate to other words. If, for example, you search for "research," you'll quickly be able to discern which pages are about "research grants," "research assistants," or "market research." Grefenstette also looks for greater ability to conduct multilingual searches, which is his area of specialty. "By the year 2000, 50 percent of Web content will be non-English," he says. "Look for cross-language information retrieval, which takes the query, identifies the language, and produces queries in different languages." AltaVista already has a crude version of this capability, which enables you to search for pages in specific languages. Translating actual pages, however, still will have to be done locally because of the processing load.
The Right Search Engine All search engines can uncover a needle in a haystack. But how much work will you have to do?
Imagine being transported to a large city and being asked to find a small object that might be hidden anywhere. That's the challenge you face when you try to find useful information among the staggering number of pages on the World Wide Web.
Web search engines are the solution, but before you even start your search, you face a choice. Hyperbole is common among the search engines: Each claims to be the best.
To help Internet World magazine readers find the site that works best, IW Labs put the search engines through their paces by comparing the results using a long list of search terms.
We tested six of the leading search engines: AltaVista, Excite HotBot, Infoseek, Lycos, and WebCrawler.
We found that each can find an enormous amount of information, but a few are clearly superior in the way they home in on the most relevant information and in the interface they offer.
We did not review sites like Yahoo or Magellan because they don't search the entire Web. Rather, these are directories of the Web, closer in nature to gigantic phone books than to search engines. Directories are useful services, and they often include reviews to guide you, but they don't list as many sites as search engines. In addition, while search engines search through the actual contents of sites, querying a directory only examines descriptive words provided by the directory service. Interestingly, most directory services provide an option to launch full Web searches, but when they do, they actually use one of the six main search engines.
Directories are good if you are willing to sort through menu after menu, hunting for the best site. Search engines are much better if you just want to see what's available on a topic since the engine does the hard work. You type in a word or short phrase, and the search results are displayed in a long list you can review.
Search engines also have the advantage of finding information that the directories didn't include on a particular topic. For example, if you're looking for information about a product, you probably want to see not only information created by the company selling the product, but off-hand references from people who are voicing their opinion in a page that wouldn't show up in a directory. The technology behind search engines makes them more effective in uncovering these subtle references.
Spiders, sometimes called bots (as in robots), roam the Web, crawling from site to site. Some spiders move from one site to another indiscriminately while others prioritize and focus their attention on the most popular sites. Depending on your needs, one approach isn't necessarily better than another. For instance, it does little good to view a list of 50 irrelevant pages from the same irrelevant site, which can occur when a search engine indexes every page that its spider can reach. Picking popular sites provides more concise results.
Once the spider is at a site, it reports back to the search engine and indexing begins. Indexes have been used to speed retrieval since long before the World Wide Web--they're part of most database programs and even address book software uses them to help you find information faster.
An index is a list of every word found at every site with a pointer to its precise location (except for unimportant words like "the," "and," and "but"). When you search for the word "widget," the search engine submits that term to the index. The index then finds the word and displays a list of pages containing it.
As with spiders, however, indexing varies among the search engines. Some engines index the entire contents of the page. Other engines index only specific parts, such as the top-level heading. And some search engines look at key words, embedded in "meta tags" at the top of the page to categorize the content.
The basic drill for all Web sites is the same. You visit the site, enter a term, click a button to start the search, and, a few seconds later, view a list of sites that meet your search request. As with any program, an important question when interacting with a search engine is: How easy is it to use?
Search engines should make it simple to frame complex searches. If you search for "automobiles," expect many responses that will take a long time to sort through. However, what if you only want to know about the Ford Taurus and Chevrolet Lumina? For such a narrow search, you shouldn't have to wade through a list of every document on the Web containing the word automobile.
Boolean (or logical) search expressions can narrow down your search. Here are some sample searches for a car shopper:
These Boolean searches can be complex to construct. The search engine should provide some help, either through interface refinements or through the help system.
Note, too, that we placed the full model name in quotes, such a "Ford Taurus." That instructs the search engine to look for "Ford Taurus" as a phrase, not as separate words. If you don't designate the phrase, the search engine will find documents that have the words "Ford" and "Taurus" anywhere within them. In other words, it will find a Web page about somebody named Arthur Ford whose zodiac sign is taurus. The interface of the search engine should make it easy to designate phrases.
Another interface issue is output--how does the search engine display the items it finds? We've all been frustrated by search results that list, say, the first 70 characters from each Web page. Other times that's useful because Web authors tend to offer site descriptions at the top of the page. Sometimes, however, such a list is meaningless. The search engine should make sense of the results.
Web queries often turn up large numbers of documents, so the search engine should provide some idea of how relevant each page is to our query. The search engines provide a list with the most relevant sites at the top of the list. Relevancy ranking is another old technology that, typically, uses factors like how frequently the search term appears in the page. Some search engines combine word frequency with other factors, such as how often the Web page is visited and how close together in the page multiple search terms are.
Relevancy ranking is, at best, an imprecise science. If the developers of the search engine implement it poorly, the rankings will be meaningless. It also is impossible to benchmark, so we did searches on topics with which we are very familiar to get a feel for the accuracy of the relevancy ranking at each site.
Our reviews, especially our selection of "Best of Test" was based on the accuracy of relevancy ranking, the help offered in constructing Boolean searches and the comprehensiveness of the site. You may find that while a few are clearly better than the others, a single search engine isn't best for you. So while we selected a single "Best of Test," we're also printing a detailed report on each site, so you can be sure you're using the search engine that offers what you need.
In our tests, HotBot's search results were unmatched. It also provides what is arguably the simplest-to-use and most customizable user interface in the group. Its skills at refining searches also are the strongest--we particularly liked its ability to search based on when pages were last modified. Its search results are pleasingly displayed and we thought its relevancy rankings were reliable.
AltaVista, the co-winner of the Best of Test last time (Internet World, May 1996), still is a force to be reckoned with. It was only slightly behind HotBot in our retrieval tests, although it retains its busy interface and difficult-to-understand search results pages.
Slightly behind AltaVista was Infoseek (also a co-winner last time) which is a powerful search engine and provides many ways to refine searches. While several other search engines also provide Yahoo-like directories, Infoseek is easily the best combination of Web searching and directory. As a result, Infoseek is an excellent all-around choice.
The other search engines all have something to offer for occasional use. Lycos is comfortable to use and offers a lot of flexibility for finding additional information after you've searched. Excite makes it easy to search through a variety of sources, including news stories. It also combines its competent search engine with a directory service. WebCrawler is the least powerful search engine but offers niceties such as a listing of sites that are most popular among its users.
Here is a report on our test for each of six search engines.
AltaVista
AltaVista no longer came out on top in our search tests, although it was among the best on most searches. It did ace our tests for finding obscure references on obscure pages. For example, when we searched for a telephone number included 0n a back page of a law firm's Web site, it correctly found the page.
You can create extremely precise searches with AltaVista. Like some of its competitors, it can search through the source code of each page so that, for instance, you can find pages with specifically named image files. Or, you can execute a search based on the URLs to which a page links. In other words, you can find all pages with links to your home page.
AltaVista's advanced search syntax, however, is complicated and its help system doesn't help much because it's mired in technical jargon. Still, in a unique, techie-centric way, Alta Vista is trying to help. It offers a new interface to help frame complex searches, although some still may find this new interface too complex.
It works like this: If you perform a simple search that finds many pages, AltaVista optionally displays a "topic map." This map summarizes the search results as a flow chart. Your search term is in the middle of the chart and boxes shoot off from it containing keywords located within the found documents. If you click on one of those boxes, another box appears with common terms found within documents containing both terms.
For instance, our search for "bicycling" found more than 31,000 pages. Radiating out from the box representing "bicycling" were boxes with bicycle-related keywords like "touring," "rides" and "helmet." We clicked on the "Touring" box and one of the keywords that appeared within it was "racing," indicating that, within pages that included both the words "bicycling" and "touring," "racing" was a common word. We clicked on "racing" to add it to the search box, making our search term "bicycle" and "racing."
AltaVista also is known for its dense search results screens. You can choose between standard or detailed forms, but we found both results screens equally dense. It also can display results only with URLs and a few keywords from the top of the document. Alta Vista's relevancy ranking was, in general, useful. However, it wasn't as consistently useful as the ranking found at the HotBot site.
In the last year, serious competitors have emerged. However, AltaVista still is an excellent site if you need to cast your net widely for specific information hidden somewhere on the Web.
The most interesting of those features is ICE -- Intelligent Concept Extractions. This examines your search request and looks for synonymous and similar meanings and searches for them, too. For example, if you search for "youth" it also will search for "teenager."
Despite this technology, in all our simple and complex test searches, the number of pages Excite found was solidly in the middle of the pack. In addition, it found none of our obscure sites and had the worst performance in the group in those particular tests. Nor does Excite have the level of search options found at sites like HotBot or AltaVista. For instance, you can't search based on page modification date or the name of an embedded GIF file.
While these shortcomings mean that serious Web mavens probably won't be excited, average users will like its simple-to-use interface. For example, besides searching the Web, you can choose to search through news articles, city guides, or Excite's directory listings--you need only make your selection by clicking on a radio button.
Next to each found document is a "More Like This" button. If you click on it, Excite finds similar sites to the found one. This makes Excite an excellent choice for serendipitously searching for information by following links.
We also liked that you could sort the list by Web site. Typically, the search engines return multiple pages from the same site. Sorting the list by Web site is a good way to see which pages at individual sites answer your query. If you wish, you can jump from Excite's listing to the site's home page instead of to the specific page found by the search.
We found the relevancy rankings to be above average in reliability and the presentation of found sites is attractive and easy to read. At the bottom of each search results page, are icons for applying the search term to other to resources, including WebCrawler, which Excite now operates. On the downside, Excite extracts key words and phrases to summarize found sites; we often found those summaries difficult to understand.
In the last year, Excite has added extensive directory listings. That, combined with its easy-on-the-eyes and flexible search result screens, make Excite an excellent choice for day-in, day-out Web browsers.
In the more-is-better world of the Web, HotBot claims to have indexed the full text of more than 50 million documents, which ties it for first place with Infoseek. But these are marketing claims; the proof is in the searching. When it got down to cases, HotBot found more documents in our searches than the other search engines. It also aced our test searches for obscure sites, finding, for instance, misspelled references in one obscure site to the name of another lightly trafficked site.
It isn't just its power that makes HotBot the best bet for searching. Its interface is a delight to use. It doesn't force you to learn Boolean syntax, for instance. Instead, you can create Boolean queries by selecting operators from drop-down lists and typing your terms.
Its advanced querying capabilities are quite strong. You can ask it to return only exact matches to your request or near matches. You can limit your search to specific domains (such as .com or .org), to geographic locations, and it searches for embedded items like ActiveX controls, Java applets, images, or videos.
Our favorite search narrower, though, was HotBot's ability to search by the date the page last was modified. Searching for recently modified sites is a good way to avoid a long list of dead end pages. You also can search within the search results. Although the self-evident interface is unlikely to baffle you, HotBot's help system is well written and eschews technical jargon.
We also liked HotBot's on-screen layout. To the basic search screen, you can display modules for advanced capabilities, such as limiting searches to specific domains. After you create a search page that suits you, you can save it so it appears automatically the next time you check in. HotBot does this by saving a cookie to your hard drive.
HotBot's readable output and generally accurate relevancy ranking is another plus. The results are attractively placed on the page, with the page title at the top of the listing. HotBot does a good job of creating the summary without providing a gibberish explanation. You also can ask HotBot to display more terse descriptions.
The only thing missing from HotBot is directory-like services. It has a link to the Wired Source, developed by its corporate sibling, Wired magazine, which provides links to a handful of useful sites. Unlike multipurpose sites like Infoseek and Lycos, however, HotBot is solely for searching.
HotBot is attractive, powerful, and easy to use. That makes it an excellent choice for both experienced searchers and relative newcomers.
Whether it's the best search engine, as it claims, is questionable. However, in the last year, Infoseek underwent a significant remake and its search engine has greatly improved--it claims an index of 50 million pages.
The results for Infoseek Web searchers were quite good overall. Our basic searches typically turned up about a third fewer sites than HotBot, but that still was ahead of most of the other search sites.
Plus, our tests confirmed Infoseek's claim that it does a better job than its competitors at removing defunct Web pages. For instance, most other search engines displayed a site that was updated three months before our test, then removed one month after that. Infoseek had removed that site from its index. This diligence undoubtedly explains, in part, why it didn't display as many found pages as other search engines.
Infoseek's advanced querying capabilities give you a running shot at finding specifically the pages you want. For instance, like HotBot, you can search within previous search results. Its advanced search capabilities enable you to search for words within URLs. For example, if you know the name of a company has "Johnson" in it, searching the full index for that name finds an overwhelming number of responses. Searching only for URLs containing the name Johnson provides a more manageable number of responses. In addition, like several other sites, you can search for pages with specific links in them, such as pages with links to your home page.
Infoseek's searches are case sensitive and, notably, it offers plain language queries. For some queries like "what are the lyrics of 'My Funny Valentine'" it worked. But for other queries like "how many home runs did Ted Williams hit in 1955?," it didn't.
After searches, particularly after general searches that find a lot of pages, its Related Topics lists can be extremely handy. For instance, after searching for "bicycling," the related topics included "Cycling associations" "Bike racing" and "Street biking."
We were less satisfied, however, with the interface and output. We found most of Infoseek's screens to be over-busy and its search result screens are no exception--it was often was hard to read the results with links to other Infoseek items crowding the screen. We also found its relevancy ranking frequently often wasn't very useful. Nor does Infoseek offer many display options. You can show the summaries along with found documents or you can hide them and see only the URLs of found pages. Infoseek would benefit from an option to show brief summaries.
Strictly as a search engine, Infoseek is a strong contender. However, by combining its search engine with a thorough directory service, Infoseek makes a strong case for frequent visits as you navigate the Web.
Lycos' test results consistently placed it toward the back of the pack. On our simple search for "bicycling," it turned up less than a quarter of the sites found by HotBot. Also, Lycos found only a handful of obscure sites.
While Lycos can't match the brute strength of HotBot, AltaVista, and Infoseek, it provides laudable searching finesse. Its flexible and easy-to-use custom search syntax enables you to find only exact matches to your search term or to use stemming, which would find "bicycles" and "bicycling" if you search for "bicycle." Uniquely, you also can have Lycos retrieve pages only if they contain the search term a specific number of times.
Another likable aspect to Lycos is its flexibility after you complete the search. You can click on a button in its "Get more on" box to have Lycos find images, audio clips or other multimedia items relevant to the search topic. Another button finds Lycos' "Top 5%" sites related to your search request. These site reviews by Lycos' editorial staff are included in the directory.
Lycos did a consistently good job of placing the most relevant pages near the top of the list. The help system is thorough, well written, and quite personable--it's arguably the handiest search engine help system. Besides being lucid, it provides many examples. On the downside, the search results screen provides too little information to accurately describe found sites.
Along the left side of the screen is a list of broad topics offered by Lycos--clicking on one of the topics provides a Yahoo-like listing of sites. It also has links for services such as stock prices and guides to many large cities.
If AltaVista is like a nitro-powered dragster, Lycos is more like a basic sedan, comfortable and appealing to a wide range of users. Put differently, HotBot, AltaVista and Infoseek are better tools for hardcore researchers but Lycos is a reasonable choice for those who simply want to get the most out of the Web.
WebCrawler also offers less when it comes to customization options. You can set it to display either the page titles of found sites or summaries, and you can determine whether it shows 10, 25 or 100 results at a time. You can determine whether it shows a little icon to show relevancy or whether it displays a percentage. Beyond that, though, there are few additional tweaking options.
After the jam-packed pages of sites like Alta Vista, a little sparseness would be welcome--if WebCrawler provided more power. However, it consistently came in last in our search tests, often finding a fraction of the sites found by the high-end search engines like HotBot and AltaVista.
Among its more appealing features, however, is natural language querying. Like Infoseek's natural language querying tool, in our tests WebCrawler's worked well in some cases and not well in others.
One useful but uncommon querying capability that WebCrawler does support is proximity searching, which finds one word within a specified proximity of another. This enables you to, say, find Web pages in which "insurance" is located within three words of "fraud." This is a good way to find pages about specific topics that don't lend themselves to phrase searching.
WebCrawler's search result pages have some pleasant surprises. If you search for, say, "restaurants in Chicago," the first item in the list asks if you want to see a map of the city. Like Infoseek, WebCrawler provides a "Find Similar Pages" link for finding pages that are like the one you selected. However, we found it difficult to learn much about the contents of pages because the summaries often were garbled. Also, we didn't have a high level of confidence in WebCrawler's relevancy rankings.
Over the years, WebCrawler has added a modest directory to its offerings that some may find useful. A fun aspect of the directory service is its listing of the most popular sites that people jump to from WebCrawler so you can see what's popular.
Perhaps it shouldn't be surprising that WebCrawler hasn't kept up with the times. Once a university research project, it now is maintained by a competitor--Excite. WebCrawler is like a personable pioneer, rich in history. But it's a bit out of date.
URLs for the Search Engines
David Haskin is a contributing editor of Internet World.
IW Labs Test
By David Haskin
Robot Technology
Search engines rely on two tools to gather the information from the World Wide Web: spiders and indexes.
Interfaces Can Help or Hinder
Whichever technology a search engine uses, it eventually must serve the user--that's us. Differences in user interfaces can be just as pronounced as differences in the underlying technology.
The first search would be useful if you are researching both cars and don't care whether a site talks about one, the other, or both. The second search would be helpful if, say, you are looking for comparisons between the two autos. The third search would be useful if the previous one turns up a lot of reviews that include the Honda, which you don't want to read.
Best of Test
We're still looking for the Holy Grail of the Web--the search engine that can find absolutely everything but is simple enough for even newcomers to use. Until that day comes, if we could use only one search engine, it would be HotBot.
Excite
HotBot
InfoSeek
Lycos
WebCrawler
Alta Vista
AltaVista is a powerful search tool. However, it's like a nitro-fueled dragster: very powerful but you wouldn't drive it to the grocery store. Similarly, Alta Vista is a bit much for quick, simple searches.
Excite
won't live up to its name if you're a hard-nosed Web researcher, but it does offer a lot to less demanding users. It has some interesting features for finding information and it provides a lot of flexibility when viewing the information you've found.
HotBot
HotBot is a relative newcomer to the search engine field and it is, indeed, a hot site. It was the most powerful searcher in our tests, it has a rich set of search capabilities that are easy to use, and it sports an attractive interface.
Infoseek
Infoseek tries to be the best in two worlds. It's a darned good search engine and it's a directory service that includes useful features such as personalized news and links to tools for finding phone numbers and businesses.
Lycos
Like other Web search engines, Lycos claims it is "the most complete catalog of Web site addresses available." In reality, Lycos won't win any contests of Web searching strength. However, over the years Lycos has combined its competent search engine with a decent Web directory to make itself a helpful tool for finding what you want.
WebCrawler
WebCrawler was the first large-scale Web search engine, a university project when the Web was in its infancy. Today, WebCrawler takes a less-is-more approach to searching the Web. The pages at this site, starting with the home page, have plenty of white space, making WebCrawler easy on the eyes.
AltaVista - http://www.altavista.digital.com
Excite - http://www.excite.com
HotBot - http://www.hotbot.com
Infoseek - http://www.infoseek.com
Lycos - http://www.lycos.com
WebCrawler - http://www.webcrawler.com
Copies of articles in Internet World magazine are
available at http://www.iw.com
(c)1997 Mecklermedia Corporation. All rights reserved.