INDEXING WITH MICROCOMPUTERS:
PAST, PRESENT AND FUTURE

INDIZACION MEDIANTE MICROCOMPUTADORA:
Pasado, Presente y Futuro

Ana D. Cleveland and Donald B. Cleveland

School of Library and Information Sciences
University of North Texas
Denton, TX 76203, USA


 
 
  Keywords: Indexing, Microcomputers, Indexing Software, Hypertext, Expert Systems.

Abstract: Indexing has been foremost from the beginning of computer applications to the problems of information retrieval. Although meaningful progress has been made in automatic indexing, the major success lies in computer-aided indexing. This approach provides an economic balance between unskilled labor, skilled labor and a computer. At first, indexers used word processors, database management programs and general pro-gramming languages to index with microcomputers. Dedicated indexing software became available in the early 1980s. Today there are two converging technological streams important to indexing: 1) Expert systems and 2) hypertext. The new techno-logy offers a vast potential for indexing software in the 21st century.

Resumen: Desde el inicio del uso de computadores para recuperación de información, la indización ha sido la aplicación más solicitada. Adelantos prometidores en el area de indización automatica se han obtenido, sin embargo, los mayores logros se han obtenido en el area de indización realizada mediante computadores. Este acercamiento prover un balance económico entre mano de obra no diestra, diestra y la computadora. En el con-mienzo, la indización mediante el uso de computadores implicaba para los indizadores el uso de procesamiento de palabras, programas para el manejo de bases de datos y cono-cimiento del lenguaje de programación para indizar mediante microcomputadores. Al comienzo se la década de los 1980 programas para indizar mediante computadores fueron creados. Actualmente hay dos tendencias o vertientes para la indización: 1) sistemas expertos y 2) hipermedios.
 

 
1. IN THE BEGINNING

Indexing has drawn attention from the very beginning of computer applications to the pro-blems of librarianship, documentation and information science. When scholars write a definitive history of information retrieval, most likely it will be a chronicle of computers and the problems of indexing.

The reason for using a computer in indexing is simple: it minimizes the tedious, time-consuming and error-prone task of indexing. However, exactly which tasks should be automated has not always been clear. An early problem was the lack of defined goals in the use of computers in indexing. Applications ranged from a simple use of the computer to compile and print human-assigned terms all the way to the compilation and printing of terms that were entirely assigned by the computer.

Should the computer simply automate the procedures of the expert indexer or should the computer find innovative ways to do the job? Should a computer be assigned the total task of indexing (that is, "automatic indexing") or should it be used as a supportive tool to a human indexer (that is, "computer-assisted indexing"). These were some of the early questions posed.

The use of computers for indexing went through several stages. These stages have begun to be repeated. At first information retrieval practitioners tried to use the computer to perform intel-lectual tasks, such as selecting effective index terms from text, without too much success. Throughout the fifties, sixties and seventies, attempts at automatic indexing passed through a parade of schemes, some clever and innovative, others naive and unworkable.

The first successful applications involved the formatting and printing of indexes. Computers have been used as printing aids for producing manual indexes for several decades.

The next stage was the development of computer-assisted indexing. At this stage the compu-ter took over more and more of the mundane tasks of indexing, such as the manipulation of tedious lists in string indexing. Computer-assisted indexing uses the computer to do the clerical work while a human performs the intellectual task of indexing. Indexing has both an intellectual aspect and a mechanical aspect and computer-assisted indexing has attempted to find a practical ground between the two. This approach has provided a balance between unskilled labor, skilled labor and a computer and it is economically feasible.

Today the rapidly evolving technology has indexers looking once again with great interest at new opportunities to use the computer as an aid to the intellectual aspects of indexing. Thus the cycle is complete because this is where we began.

Microcomputers have introduced a new approach to the application of computers to indexing tasks. As soon as micros became available indexers put them to work on indexing problems. The earlier methods included the use of word processors or database management programs, or pro-grams written in a general programming language. Indexers often rewrote mainframe computer programs and meshed them with wordprocessing programs. The general approach in these methods was to use wordprocessing programs for data input and editing and then to add routines for sorting, paging, formatting and printing.

Dedicated indexing software for microcomputers entered the marketplace in the early 1980s. Year by year the quality of these programs has improved and some of the current ones are very useful. Two types of indexing programs evolved in the last decade. The first operated on files that were created by wordprocessors and the second, like digitized 3 x 5 cards. In the first case, the indexing programs "marked-up" the word-processed files and then printed the resultant formatted files. In the second case, the program replicated the antiquated manipulation of 3 x 5 cards, except that the sorting and printing was done by a computer.

But the immediate future offers much more than just refined and debugged traditional soft-ware packages. Indexing is obtaining respectability and most indexers already understand that computers are vital tools and are visionary about the future.

2. THE PRESENT

The 1980s can be characterized as the age of pedantic indexing with microcomputers; in-dexers fussily insisted that the machine follow conventional procedures. One paper was published in England that blasted an indexing software because it violated the standards of the Indexing Society for book indexes by doing such unforgivable things as putting a comma between the entry term and the first page number.

However, some pedantic measures are necessary in indexing and during the 1980s practical people, like Linda K. Feters (Feters, 1987), proposed guidelines for evaluating microcomputer indexing software. These guidelines are listed here because they summarize the 1980s approach to indexing software and certainly serve as useful points for evaluating indexing software. Her guidelines are:

• Formatting

- Automatic formatting of the final index in a commonly recognized style, i.e., run-in or indented.

- Automatic creation of an acceptable number of subentries(3-7).

- Suppression of repeated main entries.

- Automatic combining of page references for identical entries.

• Entering and editing entries

- A reasonable length for each entry.

- Easy recall of previously entered data and on-screen editing.

- Method for displaying and printing entries at any point in the indexing process.

- Method for storing or copying previously used headings or subheadings.

- Capability of storing the final index as a word processable disk file while preserving the original records for future use.

• Sorting

- A sort order that treats upper and lower case letters the same.

- Method of marking characters or words that are not a part of the sort order, such as articles and prepositions.

- Capability to sort letter-by-letter or word-by-word.

- Capability to sort by main entry and each level of subentries.

• Printing effects

- provision for underlining, bolding, subscripts and superscripts.

- Provision for changing or inserting the codes for printing or typesetting as needed for each publisher.

• Cumulation or merging of Indexes

- Capability to handle enough entries for large or multivolume projects.

- Capability of cumulating or merging separately created indexes into one large index.

After reading these guidelines, the nature of current indexing software should be clear. The guidelines also illustrate what indexers expect from the software.

The 1980s have gone and a new decade of opportunity has begun. The current writers on indexing software usually point out that microcomputer-generated indexes use techniques "similar to card-based procedures." This means that most of the software has simply automated the infa-mous 3 x 5 card. These writers are correct in saying that the basic mechanical procedures have been transferred to a computer screen. But if that is all we want from software, then our outlook is terribly limited. There is very little difference between tapping a keyboard and quickly jotting notes on cards, which can be alphabetized in fifteen or twenty minutes. In fact, pencils probably cost less than the electricity to run a computer workstation.

But, there is another side to the coin. Computers and people may be better than people alone. Don R. Swanson (Swanson, 1960) concluded that "even though machines may never enjoy more than partial success in library indexing, a small suspicion might justifiably be entertained that people are even less promising." Perhaps the answer lies in natural, creative intelligence being complemented by computing machines.

3. THE NEW PROMISES

The newly evolving microcomputer technology has potential applications in indexing. Of particular interest are expert systems and hypertext, both of which have captured the attention of librarians and other information personnel.

The good news is that the convergence of expert systems concepts and the growing power of microcomputers is accelerating and the productive intersection is eminent. The possibilities for developing expert systems for microcomputers may very well be the key to the acceptance of expert systems by the general user. Commercial systems for microcomputers are appearing at a rapid rate and these systems are more than toys for academic experimentation. They have a workable, real-world potential.

In recent years a major marketing emphasis of expert systems software has been on tools called "shells." In the beginning this type of software required specially designed software and they were expensive. But rapidly in the past few years this situation has changed drastically. Shells are appearing on the market for microcomputers and these shells are both inexpensive and are viable application tools.

Shells leave the intricacies of programming to the software and free the expert systems designer to concentrate on extracting knowledge from experts and building this expertise into the system. The system designer needs to have a minimum of programming skills.

The standard expert system shell is a package containing a compiler that translates a rule-based language into an internal representation that a computer can deal with, and a run-time system that will allow a user to interact with the compiled knowledge base. Typically, the run-time facility will allow the expansion of the meaning of a question, will provide the user the option of changing previous answers to a question, and will allow the user to volunteer answers to questions before they are asked.

Vendors are developing and marketing these programs for personal computers. They were cautious at first, but they have discovered that the real market is for shells that help users design their expert systems without depending on sophisticated inhouse programmers. The marketing focus lies here and the public is ready to respond. Perhaps indexers, too.

Some prominent researchers have doubts about the application of artificial intelligence (AI) to the indexing problem. For example, Gerard Salton (Salton, 1986) has expressed concerns that AI methods may be even more difficult to use than the automated text analysis methods of the past three decades. He particularly points to the problems of generating meaning from texts, the problem of such systems in having a consensus on the nature of domain and world knowledge and the problems of diverse users. However, it should be noted that the concept of AI is basically concerned with exactly these types of issues and the problems may not be insurmountable.

Indexing is an absolute "natural" for an expert system due to the facts that there are 1) recognized experts in the field, 2) the experts can be proved to be better than nonexperts, 3) the task is time definable (a few minutes to a few hours at most) and 4) the task is a combination of cognitive and mechanical processes.

Expert systems for indexing are already in operation. For example, the American Petroleum Institute has been using an expert system to help select index terms from abstracts of articles appearing in the technical literature since 1985 (Martinez and others, 1987). They use the API thesaurus as a base and a rule- based expert system selects the index terms, which are subsequently reviewed by human indexers. There are other working systems and a number of experimental prototypes.

The other new technology that has caught our attention is hypertext. In the past few years hypertext has been of major interest in the information processing world. Hypertext is related to the concept of associative memory, and that concept is as old as recorded history. When a medieval scholar sat down to write a history of the world he used associative memory techniques. He began with the creation of the universe, as recorded by the scriptures, but then his mind jumped to the classical writers and, at first, he was disturbed by the apparent inconsistences of the two view-points. But then, he associated the basic ideas of one to the other, saw the differences in light of his own understanding and beliefs, and so forth. This is associative memory. All thinking is associative and mankind has recognized this simple idea for a long time.

Although the idea of associative memory existed before electronic computers, the modern concept of hypertext is often credited to Vannevar Bush (Bush, 1945) and the ideas in his classic paper "As We May Think." When the technology became available, hypertext became a popular tool.

There are many examples of possible applications to indexing, but an obvious one is the-saurus construction. Clearly, one of the most basic intellectual aspects of indexing is thesaurus construction, use and maintenance. A thesaurus, even a simple one, is a complex and intricate tool. To keep it alive and appropriate is a continuing effort. If it is not maintained, updated and carefully monitored, it will become useless. Maintenance is more than just adding new terms. It involves replacement and change in the structural relationships of old terms and new terms, changes in rules, correction in spelling, etc. Future indexing software, using associative techniques, could aid in minimizing the costs and increasing the effectiveness of thesaurus control.

Hypertext has been touted for its capability to tie together disparate informational units. The connections are not dependent on apparent "logical" ordering or sequencing, but allow the user to define his own logic. This concept fits quite nicely with user-friendly indexes and searching systems.

Future indexing software and searching software must be integrated as a single entity, not as disjoint tools. Separating indexing from searching is illogical. Information retrieval searching techniques are widely taught in library schools. Unfortunately, the students are not taught well to see the relationship between searching and the indexing process itself. In a recent Database article, Lucinda D. Conger (Conger, 1989) points out that "One of the things new searchers are unsure of is how they got the results they did from a simple search...what a searcher needs to know is what the computer is looking at when a simple search request is processed." Perhaps systems can be designed that will actually talk to the user about indexing failure. The system also can collect and analyze terms that fail and terms that succeed and redirect associative links in terminology.

4. THE INDEXING PROCESS AND SOFTWARE

The indexing process can be codified into a step-by-step procedure, although these steps may vary according to whom is doing the indexing and the type of indexing done. However, it might be useful to take a commonly accepted procedure and draw parallels between the steps and the use of microcomputers. The following ideas include both descriptions of the abilities of current software and some speculation on the future.

Cleveland and Cleveland, in their indexing text Introduction to Indexing and Abstracting, 2nd edition, have outlined the following steps for indexing:

STEP NUMBER ONE: The first step is a series of preliminary decisions about 1) what documents will be indexed? 2) what parts will be indexed and what parts passed over? 3) how exhaustive and specific will the indexing be?

At the present time the decision of what documents are to be indexed is made by a human, based on policy. It is not generally a part of a computer decision-making system. But why not include a selective dissemination of information (SDI) segment in the system itself? We generally think of SDI systems as being user defined. Why couldn't an indexing system scan all the SDI profiles and help the indexer make a decision of whether or not that particular document is relevant to the system?

By the same token, the general guidelines for indexing could be built into the indexing soft-ware and a series of simple questions could adjust the system to the subsequent indexing process for a particular document.

STEP NUMBER TWO: The second step is to record the bibliographic data. The clerical error of incorrect bibliographic information is deadly for an index. Most indexing software simply allows a slot for the information. What about the possibilities of interfacing with bibliographic services for correct entries?

STEP NUMBER THREE: The next step is content analysis. Most of the software for microcomputers has ignored the general questions of automatic indexing. The micro-software attempts to be practical and only on occasion brings in the offerings of automatic indexing. There is great deal that could be done here, using successful research results and operational experiences of the past.

STEP NUMBER FOUR: The fourth step is content determination. Content analysis (step three) is primarily a surface scan of the document. But the major problem in indexing is deter-mining what the paper is actually about and in expressing that "aboutness" in terms of potential users of that information. This is the place where expert systems and hypertext have a great potential. Expert indexers can work through the maze of what a document appears to be about and determine what it actually is about. Many-faceted associative memory devices can be used to identify appropriate chains of thought for both the indexer and the user.

STEP NUMBER FIVE: Step number five is the conversion to the indexing language. Here is where associate memory devices are at their finest. Related ideas, sub-terms, generic switching devices, linked by association, could all serve the indexer well.

STEP NUMBER SIX: The next step is to review and reexamine what has been done. Soft-ware, obviously, already exists to check spelling. But why not develop software that would do statistical analysis of terms assigned, show their relationships, and then offer suggestions to the indexer in the revision stage?

STEP NUMBER SEVEN: The final step is to format and display the index. Why couldn't graphics and icons be a part of the display of indexes?

The future for microcomputer indexing lies in the exploration and development of interactive systems. These systems will not be discrete, independent steps in the information retrieval process, but will be integrated systems.

Linda Smith, in a review article (Smith, 1987), summarizing trends, concluded: "The emphasis in interactive systems is likely to continue to be on machine-aided intelligence rather than machine intelligence alone.....The emerging disciplines of cognitive science and knowledge engineering suggest challenging new roles for information scientists in the investigation of human information processing and in the construction of knowledge-based systems."

These systems should include serious graphics. Computer users can move a mouse to an icon, squeeze an "ear" and recall a stored information item. Most of the software engineers who design these systems do not realize that they have created an "index", very close to our profes-sional concept of the term. The challenge to us: Why shouldn't we develop graphic icons for traditional type indexes? Why not use animation and allow a bibliophile PacMan to lead a user through the complexities of MEDLINE? For example, a computer screen might show a human being, then a flip of the mouse could explode the image up into a stomach, and then present the entrance to the stomach. When the user reaches the entity wanted, other icons would allow related concepts to be explored. When the icons finally represent the specific type of information wanted, a flip of the mouse will retrieve both text and graphic information on the subject of interest.

Other useful technology might be applied, such a pattern recognition for the indexing and retrieval of pictorial information. Data compression devices might help save storage space and speed up searching.

There is an entire frontier out there concerning the use of graphics and signs in the retrieval of information. In fact, we have only begun to explore the vast potential of indexing software for the 21st Century.

REFERENCES

Bush, V., "As we may think," Atlantic Monthly, pp. 101-108 (July 1945).

Cleveland, D. and A. Cleveland. Introduction to Indexing and Abstracting. 2nd ed. Englewood, CO: Libraries Unlimited, 1990.

Conger, D., "Back to basics: Basic indexes, the space, the hyphen and all that," Database, 12: 119-123 (October 198..).

Fetters, L. A Guide to Indexing Software. 2nd ed. Washington, D.C.: American Society of Indexers, 1987

Martinez, Clara, et al., "An expert system for machine-aided indexing," Journal of Chemical Information and Computing Science, 27: 158-162 (May 1987).

Salton, Gerard, "Another look at automatic text-retrieval systems," Communications of the ACM, 29 (7): 648-656 (July 1986).

Smith, Linda C., "Artificial intelligence and information retrieval," In: Annual Review of Information Science and Technology, v.22, 1987, p 41-73.

Swanson, D.R., "Searching natural language text by computer," Science, 132: 1099-1104 (1960).