A LARGE SEMANTIC NETWORK FOR ASSOCIATIVE SEARCHING

Tamas E. Doszkocs

National Library of Medicine
Bethesda, MD 20894, USA
E-Mail: DOSZKOCS@NLM.NIH.GOV

Keywords: Semantic Network, Network, Searching, Associative Searching, Retrieval, Information Retrieval, AURA, Associative User Retrieval Aid, MEDLINE, CATLINE, Natural Language Retrieval, National Library of Medicine, MeSH, Medical Subject Headings, UMLS, Unified Medical Language System, Database, Information Processing, Associative Information Processing, Hypertext, Hypermedia, Artificial Intelligence, Connectionist Models, Visualization, CITE, OPAC.

Abstract: Topical search failure rates of 25%-50% are a fact of life in contemporary information retrieval systems. The two main reasons for such "zero hit" retrievals are search vocabulary problems, i.e. the user not knowing what search terms to use, and search logic problems, i.e. the user or the system not employing an appropriate search logic against the given database. The latter problem is particularly common in "user friendly" search interfaces, where the system automatically "AND"s together all query terms input by the user, resulting in no retrieval.

This paper discusses prior work and reports on research in progress aimed at creating and utilizing a global semantic associative database, AURA (Associative User Retrieval Aid) to deal with both the semantic and search logic aspects of information retrieval.

AURA is a prototype semantic network of over two million natural language phrases derived from more than a million MEDLINE titles. These natural language phrases are associatively linked to the National Library of Medicine (NLM) MeSH (Medical Subject Headings) and UMLS Metathesaurus (Unified Medical Language System) controlled vocabulary and classification resources.

The associative semantic network allows users to choose appropriate medical subject headings in response to free form natural language query input. Once user selections are made, the system automatically formulates a weighted logic search, or an optimal Boolean search strategy, utilizing class membership information about the selected subject headings.

The prototype system has been implemented to assist in searching NLM's CATLINE (book catalog) and MEDLINE (journal articles) databases.

1. INTRODUCTION

R & D in information retrieval has been characterized by the use of statistical and probabilistic approaches, and to a lesser extent, linguistic and artificial intelligence methods and techniques, in all important aspects of the retrieval process, namely query and document analysis, document classifi-cation, query-to-document matching and query and database modification and restructuring (Sparck-Jones, 1973; Salton, 1983; Salton, 1989; Larson, 1991). In essence, these approaches involve associative information processing via the identification and explicit use of term-term asso-ciations (automatic thesaurus construction), document-document associations (automatic classifica-tion), term-document associations (automatic indexing) and query-document associations (automatic retrieval).

From the user's point of view, such systems offer a great deal of flexibility, including free-form natural language query input, ranked display of closest matching items, and automatic query refine-ment and search strategy modification by the system in response to user feedback concerning the relevance of retrieved items.

By contrast, commercial retrieval systems and services have, until recently, limited themselves to rudimentary automatic keyword indexing (often supplemented by labor intensive human controlled vocabulary indexing), manual construction of machine-readable thesauri and classifications and the manual assignment of such thesaurus and controlled vocabulary terms to records to describe document content. The explicit matching of search queries (formulated as Boolean expressions) to documents and/or document surrogates is typically accomplished via the exact match processing of inverted lists corresponding to the search keys and the matching logic specified in the query. Most operational systems do not perform any automatic associative information process-ing, beyond the passive use of the inherently associative aspects of the human intellectual effort involved in indexing and classification, and the explicit network of associations between cited and citing references, footnotes and similar traditional referential devices.

2. PRIOR WORK AND RECENT DEVELOPMENTS

2.1. Associative Information Processing

The human record is a reflection of mankind's collective memory and as such, it is a mirror of reality associations.

From the earliest days of computing, visionaries, self-avowed computopians and IR pioneers have explored a wide variety of associative approaches to information processing (Bush, 1945; Doyle, 1961; Stevens, 1965; Stiles, 1961; Lesk, 1969; Nelson, 1970; Sparck Jones, 1971; Salton, 1975; van Rijsbergen, 1977; Doszkocs, 1978).

At their very best, information retrieval systems are hit-and-miss due to the fundamental mis-match between the language of authors, indexers-catalogers and searchers, the wonderful richness, versatility and ambiguity of natural language and the consistent inconsistency of people in their writing, indexing, classification and searching (Doszkocs, 1983; Borgman, 1989; Belkin, 1987; Egan, 1988; Marchionini, 1989; Allen, 1991; Cleverdon, 1991; Saracevic, 1991; Vizine-Goetz, 1991). In looking at the ample evidence provided by the literature in cognitive psychology, ergono-mics and information retrieval concerning the fundamental limitations of both short-term and long-term memory in humans, and their limitless capacity for serendipity, imagination and creativity, one can reasonably expect that by providing associative displays of search terms and related documents warranted by the global database, an associative information retrieval system can ease the perennial problems of no retrieval, too little retrieval or too much retrieval from the user's subjective relevancy point of view.

The limitations of associative term co-occurrence data for automatic query expansion in document retrieval systems (Salton, 1986; Peat, 1991) suggest that machine-assisted methods might be more appropriate and productive, as evidenced by research on associative interactive dictionary displays for user selection and human controlled query modification in a very large scale operational retrieval environment (Doszkocs, 1979; Doszkocs, 1983). Such findings are none too surprising, given that humans remain the most intelligent com-ponents of our systems, including our expert systems for retrieval, indexing and classification (Alberico, 1989; Bates, 1990; Belkin, 1990; Brooks, 1987; Croft, 1987; Deerwester, 1990; Fox, 1987; Gauch, 1989; Gibb, 1988; Hawkins, 1988; Hjerppe, 1989; Humphrey, 1989; Krulee, 1989; Marcus, 1991) Meadows, 1982; Micco, 1987; Milstead, 1992; Pejtersen, 1987; Pejtersen, 1989; Pollitt, 1987; Richardson, 1989; Roysdon, 1989; Sanchez, 1987; Sharif, 1988; Smith, 1987; Sparck Jones, 1988; Vickery, 1987).

Associative information processing techniques have become an integral part of new and com-plementary information technologies, such as hypertext and hypermedia, a variety of artificial intelligence applications (e.g. expert systems and connectionist models) and scientific information visualization.

2.2. Hypertext and Hypermedia Retrieval

Much as indexing and document clustering result in an implicit network of related records and inherently facilitate nonlinear retrieval, hypertext and hypermedia (Nelson, 1981; Nielsen, 1990; Chen, 1991) support a network of explicit linkages among information objects and intrinsically facilitate nonlinear associative browsing (Egan, 1989; Agosti, 1991; Lesk, 1991). The explicit linkages in hypertext and hypermedia systems are typically created via human intellectual assign-ment, be it the classical cited and citing references in bibliographic records or logical pointers to images, data tables or text elsewhere in the database. The automatic generation of "hyper-paths" in databases using a neural network algorithm has been proposed (Lelu, 1991).

2.3. Artificial Intelligence

AI knowledge representation and knowledge processing has had a profound influence on the development of "intelligent information retrieval systems" (Croft, 1987; Fox, 1987; Smith, 1987; Turtle, 1990). It is important to recognize that the (by and large handcrafted) AI knowledge structures in essence represent rich associative relationships among the data, and the various AI processing methods, whether rule-based and procedural in nature, or ad hoc and intuitive, have a fundamental associative characteristic (Partridge, 1990). This observation also applies to practical natural language processing (Doszkocs(1986), and particularly to the variety of applied "linguistic engineering" tools and resources, such as spell-checkers, computer dictionaries and thesauri, that are increasingly incorporated as integral compo-nents of information retrieval systems.

2.4. Connectionist Models

Connectionist models (artificial neural networks, spreading activation models, associative maps, parallel distributed processing) represent information as a network of weighted, interconnected nodes. In contrast to more conventional information processing methods, connectionist systems are "self processing", in that no external algorithm operates on the data in the network. The network literally processes itself, with "intelligent behavior" emerging from the local interactions that occur concurrently between the numerous network components (Reggia, 1988). Such models can explicitly and dynamically capture the rich variety of implicit associations in a database. Connectionist models in information retrieval were surveyed by (Doszkocs, 1990). Due to their inherent abstraction, generalization and adaptive learning capabilities, such systems promise to be very useful in emulating intuitive, human-like associative information processing (Nash, 1989) implemented an expert system for thesaurus management using neural network principles (Biennier), 1990) employed a connectionist method to retrieve information in hyperdocuments (Sutcliffe, 1991), described the use of vector space models in conjunction with connectionist ideas and natural language understanding (Kwok, 1991), reported on associative query modification using an adaptive artificial neural network architecture (Wilkinson, 1991), explored the use of the cosine similarity measure in a document retrieval system based on a neural network model (Lin, 1991), implemented a self-organizing semantic association map using Kohonen's feature map (Waltz, 1991), discussed parallel distributed processing R & D at Thinking Machines Corporation (Doszkocs, 1991), assessed the potential of neural networks in libraries (Okaya, 1991), presented the use of a neural network for terminology associations, and (Lewis, 1991) summarized research on adaptive intelligent informa-tion retrieval.

2.5. User Interfaces and Information Visualization

In topical searching, natural language is the least and the most common deno-minator among users and, in a very real sense, a few words are worth a thousand pictures in terms of user query input. At the same time, a picture is worth a thousand words on the system output side, when it comes to conveying information to the user about associative database content and relationships, and pathways to follow in vast and growing information spaces. Information retrieval systems today offer a dazzling variety of user interfaces, including natural language, command language, direct manipula-tion WIMP (windows, icons, menus, pointers), graphical, visual and exotic interfaces to suit a wide variety of problem and task domains, different types of databases and idiosyncratic users. Hybrid approaches include natural language query input, graphical user interfaces to restricted Boolean query formulation based on generalization/aggregation hierarchies, filter/flow metaphors for com-plete Boolean expressions, dynamic query methods with continual visual representation and integrated navigational visual network structures for browsing associative query, document content and term relations (Fowler, 1991; Henry, 1991; Newby, 1991; Schneiderman, 1991).

2.6. The State-of-the-Art in Information Retrieval Systems

During the 1960's and 70's, commercial search systems essentially ignored the achievements and recommendations of the IR research community. The usual reasons given were the small scale and lack of generalizability of experimental R & D systems. By the late 1980's, however, many of the established operational systems began to include capabilities pioneered by the research community.

We would like to illustrate the gradual incorporation of IR research into pragmatic applied deve-lopment by examples from the National Library of Medicine:

The first large scale implementation of dynamic online associative index term displays, the Associative Interactive Dictionary, was implemented for the MEDLINE, TOXLINE and CHEMLINE databases at NLM in 1978 (Doszkocs, 1978).

The following year saw the first operational implementation of CITE (Computerized Infor-mation Transfer in English), an inverted file based system with natural language query input, lexical-morphological and partial syntactic query analysis, limited AI knowledge structures and retrieval heuristics, closest match, weighted logic search strategy, ranked output and dynamic asso-ciative query and search strategy modification based on user feedback and relevance judgements in searching the MEDLINE medical literature database, (Doszkocs, 1979). It is important to point out that CITE used the same pre-existing inverted file data structures utilized by ELHILL, NLM's conventional, inverted file, Boolean search system.

A flexible CITE Online Public Access Catalog was made available for public use in the NLM Reading Room in 1983 (Doszkocs, 1983).

IRX, a state-of-the-art information retrieval system for experimentation was subsequently designed and implemented by another group of researchers at NLM (Harman, 1988).

The UMLS (Unified Medical Language System) Knowledge Sources, Experimental Edition, on CD-ROM, with Metathesaurus and Semantic network was released by NLM in September 1990 (Lindberg, 1990). The Metathesaurus (Meta-1) contains 28,816 core concepts and related concepts, and 35,307 supplementary chemicals, for a total of 64,123 entries. Domain experts reviewed and enhanced the core and related concepts. The Meta-1 entries contain 208,559 lexically unique terms that include approximately 16,000 Medical Subject Headings from 1990 MeSH and also 1990 MeSH Supplementary Chemicals and terms from other major biomedical vocabulary sources. Organized as a relational database (and also as a hypercard stack for the Macintosh ver-sion), Meta-1 contains fields for the main concept and related information, synonyms, lexical variants, syntactic categories, semantic types and a variety of other data elements. Importantly, Meta-1 includes approximately 4.3 million unique pairs of medical subject heading cooccurrences from approximately 3 million MEDLINE journal citations. These co-occurrence records represent MeSH heading pairs that were assigned to the same citation. As such, these subject heading cooccurrences form a static network of topical associations that can potentially serve as an invaluable semantic user aid in formulating search strategies and either broadening or narrowing unsatisfactory searches.

The National Center for Biotechnology Information (NCBI) at the NLM released its experimental Entrez: Sequences CD-ROM database in 1991 (NCBI, 1991). Entrez: Sequences is a retrieval system developed at the NCBI which provides an integrated approach for gaining access to nucleo-tide and protein sequence databases (Genbank, PIR and GenInfo) and to the approximately 86,000 MEDLINE citations in which the sequences were published. A key feature of the system is the concept of 'neighboring', which permits a user to locate related references or sequences by asking Entrez to 'Find all papers that are like this one' or 'Find all sequences that are like this sequence.' In effect, for each of these records a list of its nearest neighbors has been computed by treating the record as a query against the database using a cosine coefficient vector retrieval method [NCBI, 1991). The relevant vectors are based on key terms coming from the titles, abstracts and MeSH headings associated with documents and weighted in a manner to take advantage of the relative importance of the different kinds of terms used. The top twenty documents retrieved become the neighborhood list for that document, and this information is also stored on the CD-ROM. While not all neighbors of a document found this way are guaranteed to be relevant, in initial testing, the first neighbor found is relevant to the query about 80% of the time. Figure 1 shows an example of the nearest neighbors list from Entrez MEDLINE.
 
 


 
 

Figure 1. NLM NCBI MEDLINE Hypertext Links Among Preclustered Closest Matching Items

By the late 1980's, the unprecedented explosion in the mass utilization of personal computers and workstations, office automation, word processing, desktop and CD-ROM publishing, database management, E-mail, local area and wide area networking and public and personal information utilities have resulted in information retrieval becoming one of the "hottest" areas of software development and one of the best selling product categories in the marketplace.

The latest generation of information retrieval systems have overcome most of the problem-of-scale limitations of early experimental IR, and representative systems, such as

DOW-QUEST (on the Connection Machine )

ConQuest,

EXCALIBUR/SAVVY,

GESCAN,

INTELEQ/INFOTRACT,

Knowledge Finder,

LMDS/LOGICON,

PL/Personal Librarian,

SEARCH EXPRESS,

TRW/FAST DATA FINDER,

VERITY/TOPIC,

WAIS (The Wide Area Information Server of Thinking Machines Corporation)

and many others, incorporate most of the desirable features long ago advanced and championed by the IR research community [Levine, 1991). Figure 2 summarizes the features of the most advanced operational systems:

• Hybrid Interfaces ( NL, WIMP, GUI )

(natural language,windows-icons-menus-pointers, graphical user interfaces)

• Linguistic Engineering

• Knowledge Structures

• Boolean, Fuzzy and Pattern Logic

• Ranked Output in Context

• Hypertext and Hypermedia

• Associative Query Modification

• Support for Standards

• Client Server Architecture

• Parallel Distributed Processing " Non von Neumann,"

"Connectionist " " Neural Network" search engines

Figure 2. Features of the Most Advanced IR Systems

3. A PRAGMATIC ASSOCIATIVE RETRIEVAL SYSTEM

During the current year, we undertook the prototype development of a large-scale, associative retrieval system at NLM, based on the following assumptions:

• Large textual databases can be regarded as inherently associative knowledge bases.

• The implicit associations in such databases can be made explicit by the use of well established automatic text processing methods, including pragmatic linguistic, statistical-probabilistic, know-ledge-based, heuristic and neural network indexing approaches as applied to individual records, as well as globally to subsets of the database and the entire database itself (Humphrey, 1989; Croft, 1991; Fuhr, 1991; Hersh, 1991; Rau, 1991; Warner, 1991). Commercial products, such as AIDA, are now available that combine statistical, linguistic and heuristic techniques to perform intelligent text analysis and indexing (Jones, 1991).

• Where available, controlled vocabularies (e.g. MeSH), classification systems (e.g. NLM Classification) and semantic knowledge bases (e.g. NLM UMLS METATHESAURUS), and their associative linkages with the subject databases (e.g. META-1 MeSH co-occurrences in MEDLINE) can be utilized as a powerful additional learning and training resources in machine-assisted indexing, classification and searching.

• Search queries and incoming textual records, for instance new citations for MEDLINE or books to be cataloged and added to CATLINE, can be regarded as queries and can be subjected to the same automatic text analysis procedures as the database records themselves (NCBI, 1991).

• Existing inverted file data structures, e.g. the many variants of B-trees used in a wide variety of operational retrieval systems, can support efficient weighted logic, closest match search strategies and ranked output of records, index terms and classification codes most similar to the query (Noreault, 1977; Doszkocs, 1980; Motzkin, 1991). Both static and dynamic associative query and database processing can be supported in such a system.

• Machine-assisted, rather than fully automatic procedures enable the system to maximally utilize the world knowledge and intelligence of human beings in recognizing and choosing appropriate items of interest that are "intelligently" filtered and suggested to them by the system. Such an approach is expected to synergistically combine the best abilities of the user with the prodigious associative memory and processing power of the system, and promote both consistency and cost effectiveness in indexing, classification and searching in a unified manner.

Fig. 3 presents the main components of the system.


 

Figure 3. The Associative Indexing, Classification and Search System

The most important associative data and information resources of the system are the UMLS METATHESAURUS knowledge base and the AURA Phrase Association Database. These resources are indexed as typical inverted file databases.

The AURA phrase association database contains approximately two million natural language phrases derived from one million MEDLINE citations. These phrases occur in at least two journal article titles in the sample database. There are 12,000 topical MeSH (medical subject headings) used in indexing the MEDLINE and CATLINE databases. Each natural language phrase is linked to all the medical subject headings (in the respective citations) which have been assigned by a human indexer as a "central concept" to describe the content of the given journal article or book.

Figures 4 through 9 illustrate the use of the AURA associative phrase dictionary for computer assisted search modification.

Figure 4 shows a failed CATLINE search using the Grateful Med end user interface.

Figure 4. "Zero-hit" CATLINE search.



Figures. 5(a) and 5(b) illustrates medical-subject-heading-to-medical-subject-heading associa-tions (co-occurrences in MEDLINE) from the UMLS METATHESAURUS.
 
 

Figure 5 (a). MeSH headings associated with PEDOPHILIA in the Diseases and Pathologic Processes category (i.e. disease related MeSH Trees)



Figure 5 (b). MeSH headings associated with PEDOPHILIA in the
Behaviors category (i.e. behavior related MeSH Trees

Fig. 6 shows medical subject headings and their MeSH classification (tree) numbers,strongly associated with the natural language phrase "molestation" in the AURA phrase association database.
 
 

Child Abuse, Sexual     F3.126.842.89.110.80

Incest                           I1.880.735.442

Paraphilias                   F3.709.597.608.700

Pedophilia                   F3.709.597.608.700.580

Sex Offenses               I1.198.240.748
 

Figure 6. Medical Subject Headings (and their classification [tree] numbers)
associated with the natural language phrase "Molestation"

Fig. 7 illustrates a modified search strategy based on the MeSH headings suggested by the METATHESAURUS and the AURA Phrase Association Database.
 
 

Figure 7. Modified Search Strategy (using automatically suggested
semantic associations and search logic)

Figures 8 and 9 show a successful retrieval.


Figure 8. Successful associative retrieval: from "zero-hits" to 21 relevant items.


Figure 9. Books Retrieved by the Associative Search Strategy on
"Molestation and Pedophilia"

4. CONCLUSION

Research and development in information retrieval has provided the theoretical basis for a variety of associative information processing models and techniques over the past three decades.

While the operational feasibility and pragmatic utility of a number of statistical, linguistic and artificial intelligence approaches to fully automatic indexing, classification and searching has been demonstrated in recent years in actual, scaled-up, real life applications, many established, highly visible and heavily utilized systems of indispensable value, however, continue to rely on human intellectual effort in indexing, classification, and mediated searching of large interdisciplinary textual databases.

As demonstrated above, the proposed associative retrieval approach is expected to capitalize on the considerable intellectual investment represented by the human indexing and classification effort traditionally applied to many existing textual databases, such as book catalogs and bibliographic citation databases. The experimental system presented above promises to facilitate successful information retrieval by its exploitation of the rich network of implicit semantic associations and classificatory information in such databases, and by its adaptation to the user's intelligence, desire for control, sense of serendipity and the holism and creativity of user-system synergy (Rice, 1988).

REFERENCES

Agosti, M.; Colotti, R.; & Gradenigo, G. (1991). "A Two-Level Hypertext Retrieval Model for Legal Data," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al. Chicago, Illinois, 1991. pp. 316-325.

Alberico, R. & Micco, M. (1989). Expert Systems for Reference and Information Retrieval. Westport, CT: Meckler, 1989.

Allen, B. (1991). "Topic Knowledge and Online Catalog Search Formulation," Library Quarterly, 61(2): 188-213, 1991

Bates, M.J. (1990). "Where Should the Person Stop and the Information Search Interface Start?" Information Processing & Management, 26 (5): 575-591.

Belkin, N.J. & Croft, W.B. (1987). "Retrieval Techniques," In: Annual Review of Information Science and Technology, 22: 109-145.

Belkin, N.J. & Marchetti, P.G. (1990). "Determining the Functionality and features of an Intel-ligent Interface to an Information retrieval System," In" ACM SIGIR 90: 151-177.

Biennier, F.; Pinon, J.M. & Guivarch, M. (1990). "A Connectionist Method to Retrieve Informa-tion in Hyperdocuments," In: INNC 90 Paris, International Neural Network Conference, 1: 444-448, Kluwer, Netherlands.

Borgman, C.L. (1989). "All Users of Information retrieval Systems are not Created Equal: an Exploration into Individual Differences," Information Processing and Management, 26: 237-252.

Brooks, H.M. (1987). "Expert Systems and Intelligent Information Retrieval". Information Pro-cessing & Management, 23(4): 367-382.

Bush, V. (1945). "As We May Think," Atlantic Monthly, 176(1): 101-108.

Chen, C. C. (1991). "The Coming of Digital Visual Information Age: Implications for Information Access," In: NIT '91: 4th International Conference on new Information Technology, edited by C.C. Chen. Newton, MA: MicroUse Information, 1991. pp. 5-18.

Cleverdon, C.W. (1991). "The Significance of the Cranfield tests of Index Languages," In: Pro-ceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Develop-ment in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 3-12.

Croft, W.B. (1987). "Approaches to Intelligent Information Retrieval,"Information Processing & Management," 23 (4): 249-254.

Croft, W.B.; Turtle, H. R. & Lewis, D. D. (1991). "The Use of Phrases and Structured Queries in Information retrieval," In: Proceedings of the Fourteenth Annual International ACM/SIGIR Con-ference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. 32-45.

Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K. & Harshman, R. (1990). "Indexing by Latent Semantic Indexing," Journal of the American Society for Information Science, 41 (6): 391-407.

Doszkocs, T.E. (1978). "AID, An associative Interactive Dictionary for Online Searching," Online Review, 2 (2): 163-173.

Doszkocs, T.E. (1979). "AID, An associative Interactive Dictionary for Online Bibliographic Searching," (Doctoral Dissertation), University of Maryland, 124 pp, 1979. University Microfilms order: 79-25741]

Doszkocs, T.E. & Rapp, B.A. (1979). "Searching MEDLINE in English: a Prototype User Inter-face with Natural Language Query, Ranked Output, and relevance feedback," In: Proceedings of the ASIS Annual Meeting, 16: 131-139.

Doszkocs, T.E.; Rapp, B.A. & Schoolman, H.M. (1980). "Automated Information Retrieval in Science and Technology," Science, 208: 25-30.

Doszkocs, T.E. (1978). "Implementing an Associative Search Interface in a Large Online Biblio-graphic Data Base Environment," In: New Trends in Documentation and Information. Proceedings of the 39th FID Congress, Sept. 25-28, 1978, Edinburgh, UK, pp. 295-297, Aslib, London, 1980.

Doszkocs, T.E. (1983). "CITE NLM: Natural-Language Searching in an Online Catalog," Informa-tion Technology and Libraries, 2 (4): 364-380.

Doszkocs, T.E. (1986). "Natural Language Processing in Information Retrieval," Journal of the American Society for Information Science, 37 (4): 191-196.

Doszkocs, T.E.; Reggia, J. & Lin, X. (1990). "Connectionist Models and Information Retrieval," In: Annual Review of Information Science and Technology (ARIST), Volume 25, 209-260.

Doszkocs, T,E. (1991). "Neural Networks in Libraries: the Potential of a New Information Technology," In: NIT '91: 4th International Conference on new Information Technology, edited by C.C. Chen. Newton, MA: MicroUse Information, 1991. pp. 27-33.

Doyle, L. B. (1961). "Semantic Road Maps for Literature Searchers,"Journal of ACM, 8, 553-578.

Egan, D.E. (1988). "Individual Differences in Human-Computer Interaction," In: M. Helander (ed), Handbook of Computer-Human Interaction, The Netherlands, Elsevier, pp. 543-568.

Egan, D.E.; Remde, J.R.; Gomez, L.M.; Landauer, T.K.; Eberhardt, J. & Lochbaum, C.C. (1989). "Formative Design Evaluation of SuperBook," ACM Transactions on Information Systems, 7: 30-57.

Fowler, R.H.; Fowler, W.A.L. & Wilson, B. A. (1991). "Integrating Query, Thesaurus, and Documents through a Common Visual representation," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp.142-151.

Fox, E.A. (1987). "Development of the CODER System: A testbed for Artificial Intelligence methods in Information Retrieval," Information Processing and Management, 23 (4): 341-366.

Fuhr, N. & Pfeifer, U. (1991). "combining Model-Oriented and Description-Oriented Approaches for Probabilistic Indexing," In: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 46-56.

Gauch, S. & Smith, J.B. (1989). "An Expert System for Searching in Full-text". Information Processing & Management, 25 (3): 253-263.

Gibb, F. & Sharif, C. (1988). "CATALYST: An Expert Assistant for Cataloging," Program, 22: 62-71.

Harman, D.; Benson, D.; Fitzpatrick, L. & Huntzinger, R. (1988). Goldstein, C., "IRX: An Information Retrieval System for Experimentation and User Applications," Proceedings of RIAO 88 Conference on User-Oriented Content-Based Text and Image Handling. pp. 840-848.

Hawkins, D.T. (1988). "Applications of Artificial Intelligence (AI) and Expert Systems for Online Searching," Online, 12 (1): 31-43.

Henry, H.K. (1991). "Human-Computer Interfaces and OPACS: Introductory Thoughts Related to INNOPAC," Library Hi Tech, 9 (2): 63-68.

Hersh, W.R. & Hickam, D. H. (1991). "A Comparative Analysis of Retrieval Effectiveness for Three methods of Indexing Aids-Related Abstracts," In: Proceedings of the ASIS Annual Meeting, 28: 211-217.

Hjerppe, R. & Olander, B. (1989). "Cataloging and Expert Systems: AACR2 as a Knowledge Base," Journal of the American Society for Information Science, 40 (1): 27-44.

Humphrey, S.M. (1989). "MedInDex System: Medical Indexing Expert System," Information Processing & Management, 25 (1): 73-88.

Jones, R. L. (1992?). "Automatic Document Content Analysis - the AIDA Project," to be published in Library Hi Tech.

Kwok, K.L. (1991). "Query Modification and Expansion in a Network with Adaptive Architec-ture," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 192-199.

Krulee, G.K. & Vrenios, A. (1989). "An Expert System Model of a Reference Librarian," Library Software Review, 8: 13-15.

Larson, R. (1991). "Classification Clustering, Probabilistic Information retrieval, and the Online Catalog," Library Quarterly, 61 (2): 133-173.

Lelu, A. (1991). "Automatic Generation of "Hyper-Paths" in Information retrieval Systems: A Sto-chastic and an Incremental Algorithms," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 326-335.

Lesk, M.E. (1969). "Word-Word Associations in Document Retrieval Systems," American Documentation, 20: 27-38.

Lesk, M. (1991). "The CORE Electronic Chemistry Library," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 93-101.

Levine, E. (moderator). (1991). "Full Text: from Tutorial to Innovations," In: Proceedings of the ASIS Annual Meeting, 28: 363-376 (misc. papers and abstracts).

Lewis, D.D. (1991). "Learning in Intelligent Information Retrieval," Machine Learning, Proceed-ings of the 8th International Workshop, pp. 235-239. Santa Monica, CA: Morgan Kaufman.

Lin, X. (1991). "A Self-Organizing Semantic map for Information Retrieval," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 262-269.

Lindberg, D. & Humphreys, B. (1990). "The UMLS Knowledge Sources: Tools for Building User Interfaces," Proceedings of the Fourteenth Annual Symposium on Computer Applications in Medical Care, 14: 121-125.

Marchionini, G. (1989). "Information Seeking Strategies of Novices in a Full Text Electronic Encyclopedia," Journal of the American Society for Information Science, 29 (3): 165-176.

Marcus, R. S. (1991). "Computer and Human Understanding in Intelligent Retrieval Assistance," Proceedings of the ASIS Annual Meeting, 28: 349-59.

Meadows, C.T.; Hewett, T.T. & Aversa, E.S. (1982). "A Computer Intermediary for Interactive Database Searching: 1. Design," Journal of the American Society for Information Science, 33 (5): 325-332.

Meadows, C.T.; Hewett, T.T. & Aversa, E.S. (1982). "A Computer Intermediary for Interactive Database Searching: 2. Evaluation," Journal of the American Society for Information Science, 33 (6): 357-364.

Micco, H.M. & Smith, I. (1987). "Expert Systems in Libraries: Do They Have a Place?," Library Software Review, 6: 125-28.

Milstead, J.L. (1992). "Methodologies for Subject Analysis in Bibliographic Databases," Informa-tion Processing & Management, 28 (3): 407-431.

Motzkin, D. (1991). "An Efficient Directory System for Document Retrieval," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Informa-tion Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 291-304.

Nash, Allen & Walkington, Toni. (1989). "Records Management Using Neural Networks," In: AI in Industry & Government International Conference, November 23-25, 1989, Hyderabad, India. pp. 575-585.

NCBI (National Center for Biotechnology Information), National Library of Medicine, Entrez: Sequences, PIR, GenBank, MEDLINE, Compact Disc, October 1991, Pre-release 4 and Entrez ‘ User's Guide.

Nelson, T. (1970). "No More teacher's Dirty looks," Computer Decisions, pp. 16-23.

Nelson, T. (1981). Literary Machines. Swarthmore, PA, 1981, ISBN: 0-89347-052-X

Newby, G.B. (1991). "Navigation: A Fundamental Concept for Information Systems with Implications for Information retrieval," In: Proceedings of the ASIS Annual Meeting, 28: 111-117.

Nielsen, J. (1990). Hypertext and Hypermedia. San Diego: Academic Press.

Noreault, T.; Koll, M. & McGill, M.J. (1977). "Automatic Ranked Output from Boolean Searches in SIRE," Journal of the ASIS, 28 (6): 333-339.

Okaya, Y. (1991). "Terminology and Neural Network System: the Experience of the BRAIN and NLRI Project," In: NIT '91: 4th International Conference on new Information technology, edited by C.C. Chen. Newton, MA: MicroUse Information, 1991. pp.133-143.

Partridge, D. & Wilks, Y., eds. (1990). The Foundations of Artificial Intelligence, Cambridge Cambridge: University Press.

Peat, H. J. & Willett, P. (1991). "The Limitations of Term Co-Occurrence Data for Query Expan-sion in Document Retrieval Systems," Journal of the American Society for Information Science, 42 (5): 378-383.

Pejtersen, A.M.; Olsen, S.E. & Zunde, P. (1987). "The Term Association Thesaurus: an Informa-tion Processing Aid Based on Associative Semantics," In: Empirical Foundations of Information and Software Science III. Proceedings of the Third Symposium. Plenum, New York: Plenum. pp. 175-186.

Pejtersen, A.M. (1989). "A Library System for Information Retrieval based on a Cognitive Task Analysis and Supported by an Icon-based interface," Proceedings of the Twelfth Annual Interna-tional ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by N.J. Belkin & C.J. Rijsbergen, SIGIR Forum 23 (1): 40-47.

Pollitt, A.S. (1987). "CANSEARCH: An Expert Systems Approach to Document retrieval," Infor-mation Processing & Management, 23 (2): 119-138.

Rau, L.F. & Jacobs, P.S. (1991). "Creating Segmented Databases from Free Text for Text Retrie-val," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 337-346.

Reggia, J. A. (1988). "Self-Processing Networks and Their Biomedical Implications," Proceedings of the IEEE, 76: 680-692.

Rice, J. (February 15, 1988). "Serendipity and Holism: the Beauty of OPACs," Library Journal, pp. 138-140.

Richardson, J. (1989). "Toward an Expert System in Reference Service: A Research Agenda for the 1990s," College and Research Libraries, 50 (2): 231-248.

Roysdon, C.; White, H.D., eds. (1989). Expert Systems in Reference Service. New York: Haworth Press.

Salton, G. A. (1975). Theory of Indexing. Philadelphia: Society for Industrial and Applied mathematics.

Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill.

Salton, G. (1986). "On the Use of Term Associations in Automatic Information retrieval," In: 11th International Conference on Computational linguistics. Proceedings of COLING '86. AKS, Bonn, Germany: AKS. pp. 380-386.

Salton, G. (1989). "Automatic Text Processing," In The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Massachusetts: Addison-Wesley. 530 p.

Sanchez, E. & Zadeh, L.A., eds. (1987). Approximate reasoning in Intelligent Systems, Decision, and Control. Oxford: Pergamon Press.

Saracevic, T. (1991). "Individual Differences in Organizing, Searching and Retrieving Informa- tion," In: Proceedings of the ASIS Annual Meeting, 28: 82-86.

Schneiderman, B. (1991). "Visual user Interfaces for Information Exploration," In: Proceedings of the ASIS Annual Meeting, 28: 379-384.

Sharif, C.A.Y. (1988). "Developing an Expert System for Classification of Books Using Micro-based Expert System Shells," British Library Research Report 32, London: British Library research and Development department.

Smith, L.C. (1987). "Artificial Intelligence in Information Retrieval," Annual Review of Informa-tion Science and Technology, 22: 41-77.

Sparck Jones, K. (1971). Automatic Keyword Classification for Information Retrieval. London: Butterworths.

Sparck-Jones, K. & Kay, M. (1973). Linguistics and Information Science. New York, Academic Press.

Sparck Jones, K. (1988). "Intelligent Interfaces for Information Retrieval Systems: Architecture Problems in the Construction of Expert Systems for Document Retrieval," In: Future Trends in Information Science and Technology, pp. 47-73, P.A. Yates, ed., Los Angeles: Taylor Graham.

Stevens, M.E.; Guiliano, V.E. & Heilprin, L.B. (1965). Statistical Association methods for Mechanized Documentation. Washington, DC: National Bureau of Standards (Occasional Publication no. 269).

Stiles, H. E. (1961). The Association Factor in Information Retrieval. Journal of the Association for Computing Machinery, 8 (2): 271-279.

Sutcliffe, R.F.E. (1991). "Distributed representations in a Text Based Information Retrieval Sys-tem: a New Way of Using the Vector Space Model," Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 123-132.

Turtle, H. (1990). Inference Networks for Document Retrieval, Ph.d Thesis. Computer and Information Science Department, University of Massachusetts at Amherst.

van Rijsbergen, C.J. (1977). "A Theoretical Basis for the Use of Co-Occurrence Data in Informa-tion retrieval, Journal of Documentation, 33: 103-119.

Vickery, A.; Brooks, H.; Robinson, B. & Vickery, B.C. (1987). "A Reference and Referral System Using Expert Systems Techniques," Journal of Documentation, 43 (1): 1-23.

Vizine-Goetz, D & Drabenstott, K.M. (1991). "Computer and manual Analysis of Subject terms Entered by Online Catalog Users," In: Proceedings of the ASIS Annual Meeting, 28: 156-161.

Waltz, D. L. (1991). "IR Research and development at Thinking machines Corporation," In: Proceedings of the ASIS Annual Meeting, 28: 371-372.

Warner, A.J. & Wenzel, P.H. (1991). "A Linguistic Analysis and categorization of Nominal Expressions," In: Proceedings of the ASIS Annual Meeting, 28: 186-191.

Wilkinson, R. & Hingston, P. (1991). "Using the Cosine Measure in a Neural Network for Document Retrieval, Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, edited by A. Bookstein et al, Chicago, IL. pp. 202-210.