Searching the Web for General and Scientific Information
C. Lee Giles, Steve Lawrence, Kurt Bollacker
NEC Research Institute
Princeton, NJ
Monday, August 31, 2:00 PM, ENS 602
Although the World Wide Web was originally created as a collaboration tool for scientists, it has grown to be viewed as an extremely large and diverse but poorly organized database. The Web contains a practically endless supply of relevant scientific information for researchers and other users, but finding the answer to even a simple question is often difficult if not impossible. First generation search engines make great strides by providing keyword search on Web documents, but the services they provide tend to have several severe shortcomings. The precision and ranking of recalled Web pages is often poor, including both dead links and pages with "spamming" keywords. It has been shown that any single search engine only covers a small part of the Web, and different search engines have differing interfaces and capabilities, making using multiple engines tedious. Documents that are not stored as HTML (e.g. images, Postscript files) are completely invisible to Web search engines, and the explosive growth of the Web only exacerbates all of these problems.
I will discuss some of the general difficulties in using the Web as a
scientific tool and some of the challenges that much be met in order
to overcome them. There are two projects at NEC which attempt to meet
some of these challenges.