PROVIDING INTERNET REFERENCE SERVICE FOR THE NEW ZEALAND DIGITAL LIBRARY: Gaining Insight into the User Base for A Digital Library

Sally Jo Cunningham

Department of Computer Science
University of Waikato
Hamilton, New Zealand
email: sallyjo@cs.waikato.ac.nz

As the concept of the "digital library" matures, these systems are evolving from simple WWW-accessible indexes for unorganized documents, to applications more closely duplicating the services of a physical library. These expanded services generally require considerable human intervention, for example, to organize documents into semantic categories or to formally catalog documents. The New Zealand digital library project has experi-mented with the provision of formal reference services for its computer science collection, by having a human subject expert use email to respond to client queries submitted via a form-based web page. This technique is relatively straightforward and simple to implement; however, its asynchronous nature does not support the intensive conversational interaction usually found in face to face reference interviews. In addition to providing a useful service for puzzled clients, an online reference service can also give digital library developers an insight into the problems experienced by their users' information that is otherwise difficult to obtain. This information can be useful in tailoring the collection and the digital library services to the target user group.   1. INTRODUCTION

The New Zealand Digital Library (http://www.nzdl.org) is a set of full text indexes that are searchable over the WWW. The flagship collection for the project, the Computer Science technical reports collection, provides an index to over 40,000 computing-related research papers scattered in over 300 ftp sites worldwide. The system has been available over the Internet since 1995, and sees a significant amount of use in the international CS research community; it is becoming known as a useful interface to locating high quality, cutting edge work that is yet not formally published. A major goal of our project is to create a generic collection creation toolkit, so that digital library organizers can quickly and easily construct WWW-accessible interfaces and indexes to their own document sets. To investigate the impact of including reference service support, we have set up an experimental "digital reference librarian" service for the computer science technical reports collection.

We also have been concerned with investigating the information needs of our target audience (in this case, computer science researchers), so that we might use this information in tailoring the general library framework for specific user communities. The reference queries posed by collection users give us an insight into the difficulties they experience in using the digital library and the types of information that they cannot locate online. This type of "negative"- or "failure"-based user profiling data is very difficult to obtain, and can be very useful in fleshing out the user profiles obtained by other means (such as transaction log analysis).

This paper is organized as follows: Section 2 describes the implementation of the NZ digital reference service, and discusses previous Internet-based reference systems. Section 3 discusses the types of queries that we have received, and the ways that this information can impact the design of our computer science digital library are detailed in Section 4. Section 5 presents our conclusions.

2. APPROACHES TO IMPLEMENTING AN INTERNET-BASED REFERENCE SERVICE

Internet-based reference services have been trialed by a few conventional libraries, primarily as an aid for distance education or otherwise off-campus students (for example, see the University of California at Irvine page at http://suns.lib.uci.edu/~slriweb). At present the only "digital library" to include a formal reference service is the Internet Public Library (http://www.ipl.org/), an experimental digital library created to explore the extent to which Internet-based free resources can serve as the basis for a free public library. These reference systems all follow the same implementation pattern: to access the reference service, users fill out a brief WWW page form, describing their problem and also providing contextual details for the query (such as the intended use for the information, the required level of detail, the expected format of the response documents, etc.).

The form is automatically emailed to the person serving as a reference librarian, who also replies to the client by email. While this implementation technique is relatively simple, librarians participating in these Internet-based reference services recognize that the asynchronous communication supported by email is not ideal for an ordinarily communication-intensive task such as reference work (Koyama et al, 1998). Face-to-face reference services, such as those provided by conventional physical libraries, rely heavily on the "reference interview" to clarify client queries. Patrons typically approach the reference librarian with a query that only approximates their information need: for example, the client may ask an overly general question ("where are the books about the United States?", when the client needs to know the US GNP for 1988) or an inappropriately explicit question ("where is the Reader's Guide?", when that resource may not be the most pertinent to answer their query). The reference librarian then engages the user in a structured conversation to elicit the user's true information need (Katz, 1982). Unfortunately, the relatively slow rate at which email "conversations" can be carried out effectively precludes its use for a formal reference interview; instead, the digital reference librarian must base the response entirely on the patron's initial (and nearly always sub-optimal) query. Experiments in providing reference service via synchronous communication software such as videoconferencing (Lessick, et al., 1997; Pagell, 1996) or a MOO (Meyers, 1997) have yielded mixed results. On the one hand, conversational reference interviews are made possible, which can greatly improve the quality of the reference response. However, these systems introduce problems of their own: users find it difficult to master the MOO interface, the videoconferencing systems require awkward-to-use hardware on both the user and librarian sites, and the efficacy of both may be hampered by slow connect times.

We implemented a Web-form and email based reference service for our computer science technical reports collection, with the author of this paper serving as the reference librarian (the author has a Ph.D. in computer science, and has received postgraduate training in library and information science). While synchronous service was considered, this possibility was largely precluded by the time zone differences between our New Zealand site and our largely North American and European users. The WWW form asks users for four pieces of information: a brief statement of their query or information need, their email address, their occupation or current status as a student (to gauge the level at which the response information should be presented), and the purpose for which the information is required (again, to aid in tailoring the reference response).

3. QUERIES SUBMITTED TO THE SERVICE

During the six months that the reference service has been offered, we have received 33 queries. This relatively low rate of requests for help, in contrast to the relatively high rate of use for the technical reports collection (averaging 250 queries per day), confirms earlier work indicating that computer science researchers are reluctant to seek help from search intermediaries (Cunningham and Connaway, 1996). Other WWW-based reference services catering to a less specialized group of users report a far higher rate of reference queries - to the extent that they are forced to reduce the number of queries received, for example, by setting an upper limit on the queries that are accepted each day, or by accepting queries only from patrons of the physical library supporting the digital reference service (Koyama, 1997).

Patrons requesting assistance from our online reference service ranged from high school students to Ph.D. candidates, from academics to commercial programmers. This wide variety of users was surprising, as the collection had been intended as a research tool for computing academics and professionals. One of the primary tenets of our digital library framework is that an effective library will be comprised of a set of collections, each geared to particular user group (in contrast to the current trend towards a single collection of highly heterogeneous documents that attempts to service all potential user interests). Given the sometimes inappropriate users that our current computing collection attracts, there is a need to develop an effective mechanism to direct digital library patrons to the suitable sub-collection within the library. Queries received can be categorized as: requests for a copy of a specific, known document (although generally the user does not know the full bibliographic details of the document); requests for a factual answer to a question; and requests for aid in conducting a general topic search. These query types are consistent with the classes of patron questions encountered in conventional libraries (Katz, 1982). No queries were received that requested help in using the digital library system software, indicating that the user interface and query construction technique is not causing difficulties for clients.

In replying to queries, both online and paper resources were consulted. On average, approximately an hour was spent constructing each reply - far in excess of the five to ten minutes that are reported in physical libraries. This longer reference research time is consistent with the results reported by other libraries offering online reference service (Koyama et al, 1998). We conjecture that a number of factors may influence the increase: the lack of a formal reference interview prevents the reference librarian from simplifying the reference search process by narrowing the query; the librarian may take a more leisurely approach to answering the query in the absence of a patron clamoring anxiously for an immediate response; and without feedback from the patron, the reference librarian may literally not know when the query has been satisfactorily answered and the reference search can be terminated.

4. IMPLICATIONS FOR THE FURTHER DEVELOP- MENT OF A CS DIGITAL LIBRARY

While the level of use for this service has been low, the types of queries posed have shown enough consistency to provide the following insights into the needs of our user base and the ways that these needs can guide the further development of the collection and the digital library framework: access to older material.

Our previous research indicated that computer researchers prefer, naturally enough, to use computers whenever possible in accessing information. They are generally not aware of paper resources that may better serve their needs than the currently available online indexes and collections, or may settle for a less than ideal (but readily accessible) online document rather than go to the effort of using print-only indexes to locate physical document (Cunningham and Connaway, 1996). This strong user preference for immediately available, electronic documents raises troublesome ethical issues for digital collection developers such as ourselves, since in the computing field few technical reports and published papers produced before 1991 are available online in full text form; by making only the most recently produced research documents so easy to locate and retrieve, we may inadvertently abet the disuse of earlier research documents. This problem was first illustrated with the advent of the Medline electronic index to medical documents, which primarily references documents published after 1966. Its ease of use, in comparison to searching print indexes for earlier materials, has led to the virtual disuse of medical research published prior to 1966, leading to sometimes comical and sometimes tragic situations for patients, as old research results are slowly rediscovered through "new" experiments.

To provide access to computing research that was produced before approximately 1991, when the practice of providing Internet-accessible copies of papers became common practice in computer science departments worldwide, we are attempting to augment the technical reports collection with entries from the Karlsruhe bibliography (http://ftp.nus.sg/docs/csbiblio/index.html). Karlsruhe is currently the largest computer science/information systems subject bibliography available online (approximately 750,000 entries). Merging the two collections is not a trivial task, as it involves amalgamating fielded bibliographic entries with the uncataloged full text documents currently indexed by the NZ digital library, and providing an interface that can gracefully present these two very different types of search results. While this is not an ideal solution - only the bibliographic details for these earlier papers will be searchable, and the full text will not be readily available online - it is likely that increasing awareness of the existence of relevant older papers will encourage their use, particularly since document delivery services such as UnCover support online ordering and faxing of document copies.

ADEQUACY OF ONLINE RESOURCES

When we began to offer this reference service, it was unclear how much research-level computing material was available over the WWW. To explore this question, when answering reference queries online resources were consulted first, and paper sources second. With the exception of requests for specific pre-1990 documents, online resources were generally found to be adequate to answer queries posed by computer science researchers. This appears to be partially an artifact of the high obsolescence rate for computing documents, as the average age of a document referenced in a computer science or information systems research article is 5 years (Cunningham, 1995; Cunningham and Dillon, 1997); users appear to prefer the recently produced documents that are more likely to be found online.

Additionally, a surprisingly large amount of very high quality research material is available online for most, if not all, fields of computing: research institutions often store their working papers in anonymous ftp repositories, individual researchers maintain cv pages with links to copies of their published work, and groups of researchers with a common interest build subfield-specific web pages pointing to source files for both published and unpublished work. These documents appear in a variety of formats - the most common of which are html, PostScript, and TeX. Currently our computing collection can only handle PostScript documents and is geared to harvesting large numbers of a documents from a relatively small number of sites - thus missing out on the many sites containing a few non-PostScript documents, such as the cv pages and subfield web pages. The documents currently unavailable through our technical reports collection are numerous enough, and of a high enough quantity, to indicate a need to modify our collection development software to accommodate a more heterogeneous set of source documents. Further, it became clear that certain search techniques and bibliographic repositories were particularly useful for answering queries: for example, the UnCover service is useful in locating journal articles published after about 1985, a request for recent documents by a particular author is often most easily answered by locating that author's home page, factual queries in some fields can be answered by consulting the appropriate online textbook, etc.

We are compiling a list of common search techniques or useful resources that can be stored in FAQs or "pathfinder" text files (for example, see the search hints files created by the Internet Public Library reference group at http://www.ipl.org/ref/QUE/PF/ and http://www.ipl.org/ref/FAQ/PF/). These will be listed on the same WWW page as the link to the reference request form, for consultation by patrons before they submit a reference query.

COMMUNICATING THE SEARCH PROCESS TO PATRONS

When providing reference service, the reference librarian can rarely point the client to precisely the piece of information he/she requires; instead, the librarian provides a sampling of the documents likely to be of use, and explains to the user the steps used to obtain these and similar documents. In essence, the reference librarian performs a preliminary search and then describes this preliminary search to the client. Currently, reference librarians for this and other online reference services create these search descriptions manually, which is fairly tedious. Software that can support the creation and easy annotation of searches over the WWW would greatly enhance the process of creating these reference query replies; we are currently designing a system to record search histories so that they can be used as a basis for reference query replies.

5. CONCLUSIONS

It is straightforward to implement a forms-based reference service for a digital library, using email to convey reference answers to the patron. A major drawback in relying on email is that it does not readily support the highly interactive query clarification process typical of a face to face reference interview. However, the synchronous online communication techniques previously trialed for online reference services (MOOs and videoconferencing) have significant problems as well, requiring supporting hardware (videoconferencing) and forcing the user to grapple with an awkward, unfamiliar interface (MOO). Additionally, synchronous communication may be difficult to provide when large time differences exist between the patrons and the reference service provider. From the point of view of a digital library developer, the provision of a reference service can be extremely useful gathering details on the information needs and searching problems of the library's target user group. We are using the insight gained while providing this experimental service to tailor our computing research collection by adding supporting FAQs and search hints, developing a tool to support the communication of successful search patterns and results between the reference service provider and the patron, and augmenting the collection itself with additional documents and document types.
 
 

REFERENCES

Cunningham, Sally Jo and Stuart Dillon. (1997). "Authorship patterns in information systems research," Scientometrics, 39 (1): 19-27.

Cunningham, Sally Jo and David Bocock. (1995). "Obsolescence of computing literature," Scientometrics, 34 (2): 255-261.

Cunningham, Sally Jo and Lynn Silipigni Connaway. (1996) "Information searching preferences and practices of computer science researchers," In Proceedings of the 6th Australian Conference on Computer-Human Interaction (OZCHI '96), Hamilton, New Zealand.

Katz, William. (1983). Introduction to Reference Work. volume II: Reference Services and Reference Processes. New York: McGraw-Hill.

Koyama, Janice; Janice Simmons-Welburn; Schelle Simcox, and Susan Lessick. (1998). "Reference service in a digital age", panel discussion session at the American Library Association Midwinter Conference, Jan. 11, 1998, New Orleans, LA, USA.

Lessick, Susan, Kathryn Kjaer, and Steve Clancy. (1996). "Interactive reference service (IRS) at UC Irvine: Expanding reference service beyond the reference desk," ACRL '97 National Conference, http://www.ala.org/acrl/paperhtm/a10.html

Meyer, Judy. (1993). "Reference services in the virtual library," 9th Texas Conference on Library Automation, April 1993.

http://info.lib.uh:70/00/articles/uhlibrary/myers/virtref

Meyer, Judy. (1997). "Serving reference users",

http://www.ala.org/editions/ cyberlib.net/4jmyer01.html

Pagell, Ruth A. (1996, February). "The virtual reference librarian: Using desktop videoconferencing for distance reference," Electronic Library, 14: 21-26.