555_SP'04_resources

CLSC 555 Information Systems Webliography: Web Information Retrieval
by
Becky Jones

|General|History| Aims and Objectives|Case Studies| Social Aspects| Teaching| Evaluation|

General

In Computers in Libraries Marshall Breeding emphasizes the need for librarians to integrate their many sources of information and complex system of services into a "seamless" and united library Web site in order to compete with popular user-friendly competition such as Google. Breeding offers practical advice such as single-point authentication. This site is philosophical, not technical but strongly suggests following the development of Web Services.

Mark Hall (Computerworld)raises the ugly but imminent problem of storage capacity “doubling in the last fifteen years while physical access rates have improved by a mere 10% annually during the same period.” Hall clarifies many of the related difficulties that affect security, data management and replication issues. He also reminds us that government is forcing companies and individuals to be able to produce stored “data” with immediate results.

Katherine Adams (Online feature), presents a clear, concise introduction to information extraction. This emphasizes how natural language processing and wrapper induction offer the user the benefit of “breaking up the Web into small, more manageable pieces.” Although it includes a helpful glossary and references, it is slightly outdated.

In a paper for ACM Computing Surveys, Kobayashi and Takeda discuss the growth of the Internet, tools used for Web-based information retrieval and ranking, and future directions of Web Services, Internet Shopping, and Multimedia Retrieval. There is some interesting debate about whether the cynics or the optimists are correct in their vision of how problems such as slow retrieval time and unsuccessful retrieved results will be solved. Because of all the data and statistics presented, there are too many distracting citations; they should have used endnotes or formatted differently.

Although this is a book review of Modern Information Retrieval by Yates and Riberio-Neto, it provides a number of interesting links. The introduction and one other chapter are available online with some other resources. Unfortunately, the site is not yet complete. A reminder from the authors: Perfection does not exist.

Fenella Saunders in Discover briefly discusses how companies such as Fast Search and Transfer (FAST) and Pacific Northwest National Laboratory are creating “cybertapestries” or solutions to information retrieval from the “exponential growth of the Net.” Saunders presents several approaches such as IBM’s CLEVER, a prototype search engine with “humanlike” intelligence to promote the reader’s curiosity. There are too many clichés.

G.A. Andy Marken appeals to corporate America to protect itself against the injustices of the Web information retrieval system that allows “for negative and malicious information about their products, services, or colleagues.” Marken, an experienced advertising and public relations man, suggests companies go to www.dejanews.com for a “rude awakening.” This offers a look at the misuse of “information” on the Internet and Web from a capitalist viewpoint.

Although this article was written in 1998 for Records Management Quarterly , it is included here because it is well-written and extremely helpful if you do not know where to begin searching the Web. There are excellent resources provided by Konnie Kustron who is a lawyer, professor, and former branch manager for LEXIS-NEXIS. Some of the sites have changed or disappeared.

History

Time writers struggle to make this a true technology article in discussing the possible demise of Google as the "reigning sultan of search" and in asking us "what's at stake culturally and socially in the search wars." There is a concise but useful diagram at the end of the article which shows how Google crawls the Web. Their concern for the FDC's taking "baby steps to "keep search engines honest" lacks sincerity.

Marcia Bates, a professor at UCLA, contributes some specific improvements in Web information retrieval design in order to create a "good" system that requires the team efforts of the information specialist and the content expert and/or the programmer. This article is filled with useful examples, reasonable solutions, and excellent references. The author should write more; she deserves her "Best Journal of the American Society for Information Science Paper of the Year Awards."

Chris Sherman in ONLINEassesses his predictions for Web search in 2004 that he had made in a previous article in 1999. He discusses the convergence (ex. Time Warner & AOL, information underload, the Open Directory Project (ODP), browser-free searching, whole document queries, and the future. As he stated himself, he "didn't touch on other areas where notable progress is being made."

Mark Weiser gives a concise one-page history of the technology revolution while forecasting that computer use will become widespread on university campuses! He explains how technology can make student life easier as far as their schedule, dining, health, and transportation are concerned but he is concerned about privacy issues. In 1998, the author understood the "ubiquity" of the computer but not the impact and complexity of its tremendous growth and development.

Aims and Objectives

In an article in The Washington Post , Rick Weiss discusses ways such as Internet Archive(www.archive.org) to preserve Web sites. Weiss S believes that Digital Object Identifier (DOI) is an example of a workable system;it gives a virtual but permanent "bar code" to participating Web pages and can always be found by its unique DOI even if the site moves or changes. The writer draws no conclusions.

This is a brief article in The Chronicle of Higher Education that shows how universities who register with Google can track what search terms students use in order to find out what type of information their patrons are searching. This enables the school or institution to provide better service on their Web site as well as in the library. Of course, this custom service by Google requires a fee.

Michael Banks recently provides in ONLINE four categories of ways that you can search successfully at pay sites He lists ways to access online stock photo agencies, newspaper archives, magazine article archives, and eBay and other auctions. Banks is not in favor of hacking but offers some realistic solutions to budget cuts that may affect Web information retrieval at the library or at home. Although useful, this article is not well-organized or formatted.

There is a short article in Science News that simply announces the arrival of what was new in 2002 at the NEC Research Institute in Princeton, New Jersey - an algorithm developed by Gary Flake that could make searching on the Internet more precise. This "community-identification algoithm" should be interesting to academic searchers or Internet snobs that generally want to target specific types or content area sites. Flake and colleagues unfortunately have to remind us that this database also can be used by companies for marketing.

Maryellen Allen explains Bluetooth (www.bluetooth.com); this is an example of a new wireless data transmission protocol. Allen manages to be objective in her analysis and to be clear in her explanation of the technology. For this particular product, Allen says you need two but she doesn't see this as an "unsurmountable" problem. MONEY?

Steve Silberman presents a clever accounting of the "world's smartest search engine," created after two-hundred fifty years by Autonomy, a language-centered company in Cambridge, the United Kingdom. One should know something about Autonomy because(one way or the other) it counts as clients: the Associated Press, News Corporation, Lucent Technologies, Merrill Lynch, and the United States Department of Defense. "Lynch(the founder) had zeroed in on the Achilles' heel of search engines." Silberman warns about "a new, more mature phase of technology - an era in which humanity will no longer believe it's standing at the center of the universe."

Case Studies

In Information Technology and Libraries there is a comparison by Linda Bills, Rachel Cheng, and Alan Nathanson of two separate studies at Wesleyan University Library and the Tri-College Consortium (TCC - Bryn Mawr, Haverford, and Swarthmore College) that were using two successful approaches to Web page management;"this report compares their different approaches, contrasting in-house versus outsourcing approaches, an independent database versus one built from OPAC, and open source versus proprietary software." This is an excellent resource for a librarian that wants to find a system that standardizes guide formats, assuring user benefits. There was some "loss of individual writing style for the presentation of the library resources;" however, individual libraries found solutions to ensure librarians flexibility.

Susan Feldman , the president of Datasearch, briefly comments on the results ofthe NEC Research Institute study about search engine's indexing of the Web in 1999, "using basic Boolean queries that required exact matches." This study shows the inequities and bias of most Web indexing. Google and others rely too often on popularity or number-of-links to "improve" retrieval; the user may find it difficult to choose between quality or quantity. Feldman draws no significant conclusions.

Social Aspects

Brian Kenney in the March 1, 2004 Library Journal writes about the OCLC's 2003 Environmental Scan, organizing it into five sections: social, economic, technology, research and learning, and library. Kenney "reinterprets" Ranganathan's old rule as "for every reader, huge amounts of free-floating content, anywhere, anytime." There should be concern: the author "dubs" the "intersection of the orderly world of libraries and the 'free-associating, unrestricted, and disorderly' web" as the "twilight zone."

In a brief article in Physician Executive James Hawkins questions doctors about their strategy to help patients deal with health information on the Internet. It is surprising that a recent study conducted by the RAND Corporation think tank showed that much of the information was "generally accurate" but "often incomplete and hard for consumers to understand." Hawkins offers no suggestions or solutions, just poses an obvious question.

In an article in Government Computer News one understands why the public complains about ineffective,inefficient government bureaucracy. It announces the Environmental Protection Agency 's transfer of its air polution data retrieval system to browser access at $1200 per hour. Apparently, one of the motivations for this move was competition from CNN's Interactive Envrionment Web site.

Teaching

Jeff McLaughlin's article in WebNet Journal originally was written as a response to another professor who was asking him to delete some student essays from the Web in order to curb plagiarism. McLaughlin understands the reality and immensity of the problems the Internet creates and offers Web sites and Maxwell Smart for both students,teachers, and librarians. The real issue remains - the need for both human integrity and Internet integrity.

In the March, 2004, Technology & Learning there is a one-page article about Grokker, a data visualization tool that "groups search results pages according to content, then maps them into concentric spheres based on their relative importance." This could prove an extremely useful and effective way for teachers or librarians to help users because the maps can be edited, saved, and shared by e-mail. This appears in the Trend Watch and does not offer much information.

The User Access to Services Committee , of RUSA Machine-Assisted References Section (MARS) in 2001 provides an excellent and lengthy bibliography for a number of other academic case studies as well as users' information-seeking behavior. Many of these organized Web sites are still current and useful;it also reflects the successful teamwork of an ALA committee. Some, not many, of these sites have changed.

Evaluation

In the October 1, 2003 , Library Journal , Judy Luther believes that metasearching technology is going to create a portal that "could allow the library to become the one-stop shop their users and potential users find so attractive" in Google. Some predict this could happen by 2005 causing three major issues for librarians: "understanding metasearch's potential role in serving their users, rethinking how the library's resources are presented, and developing realistic expectations of this evolving technology." The references at the end need to be formatted in a more user-friendly way to be useful.

Created by the students at the School of Library and Information Sciences,this site at the University of North Texas "connects users to current Information retrieval research." The site provides successful and efficient access to a vaiety of resources and includes many of its own such as an annotated bibliography of online papers and text processing technology. There are no bells and whistles; it concentrates on the topic and access.

In MIT's Magazine of Innovation, Technology Review , Wade Roush discusses the new search technologies from companies that are challenging Google to give users what they want, or think they want faster. This is an extensive academic article from many points of view and includes a "stamp collecting" comparison table among AskJeeves/Teoma, Google, and Mooter. Case studies and research indicate that users show little loyalty for logos if it means faster, more efficient service.

"Evaluating Web Search Results Rankings" is a well-constructed visual and content explanation of current comparison results. The authors, Demetrios Elipoulos and Calvin Gotlieb, draw thoughtful, organized conclusions from data, especially about relevancy. There were no particularly creative conclusions drawn.

Created by Becky Jones 2004 for CLSC555 Mini_Project4
© Becky Jones 2004