Kari Marklund
Bra Böcker Publishing Group
Höganäs, Sweden
Abstract: The paper briefly reviews the history of encyclopedias and visions of new forms of encyclopedias. The main part will draw the attention to the philoso-phical, technical project issues of developing and using a computerized editing and management system, LOGOS, for producing a new, large encyclopedia. The system is used to edit and administrate all texts but also for keeping records of time schedules for more than 2,000 contributors. The key problem for the application of a large text-handling system will briefly be addressed, e.g. the problem that you need relational technology and full-text retrieval at the same time.
The printed book has been with us for only 500 years, a very short time within the whole history of human communication. But the book has become an essential part of life. No won-der that most of the developments of electronic devices for publications are used to produce conventional print on paper, or to display on a screen a publication that closely resembles print on paper. Although written language today lives in various forms, it is still associated with paper.
From the beginning of recorded history to the present, the concept of an encyclopedia has fascinated man. The desire to authoratively compile, condense, summarize and readily make available to the public and succeeding generations the cumulative knowledge of humanity can be traced back to the ancient Greeks, Romans and even the Chinese, and it has endured and matured to the present time.
2. A SHORT HISTORY OF ENCYCLOPEDIAS
When the term encyclopedia came into common use is under discussion. The first person to use the word was the German scholar Paul Scalic in his work Encyclopaedia: seu orbis disciplinarium, tum sacrarum quam prophanum epistemon. printed in Basel 1559. In the 18th century Ephraim Chambers (1728) and Denis Diderot (1751) used the world in their epoch making compilations. They were also the first to organize their information in a format that we today would accept as encyclopedic.
In the 1760's in the city if Edinburgh three men, Andrew Bell, Colin Maqfarquhar, and William Smellie began their work on a book that was to establish the word encyclopedia solidly. Between 1768 and 1771 the three volume Encyclopedia Britannica was published. Without knowing it the three men launched one of the great institutions of the English language. The Britannica is a compilation with relatively few but long articles which are of a scholarly level. Since then the Brittannica has established a tradition of having most of its articles written by famous scientists, poets etc.
The Chambers-Diderot-Bell line has been regarded as the strongest encyclopedic tradition, but is not the only one. A parallel tradition of lexicons developed in the German States. At the beginning of the 19th century the Konversationslexikon of Renatus Lobel was taken over by the innovate publisher Friedrich Arnold Brockhaus. He founded a family dynasty of encyclopedias that is still prospering. The German tradition is since then to publish encyclopedias where the content is somewhere between a scholarly encyclopedia and a dictionary. The information is split into many smaller articles.
A parallel to Brockhaus was Pierre A. Larousse, who in France published his "Great Encyclopedia" (1864–76) in 15 volumes and 2 supplements.
Whereas the Britannica model has prevailed throughout the English-speaking world, The German Brockhaus became the model for encyclopedias prepared in countries in which English was not widely spoken. This includes Sweden, as Sweden in the 19th century received most of its influences from Germany.
By 1900, the general principles of the form which an encyclopedia should take were universally accepted:
• Contents arranged in alphabetical order.
• Articles of any substance written by specialists.
• Subject specialists employed as subeditors.
• Inclusion of living people's bibliographies.
• Inclusion of illustrations, maps etc.
• Bibliogaraphies appended to the longer articles.
• Provision for the publication of supplements to bring the main work up-to-date.
• Provision of adequate cross-references in the text.
3. DATABASE DESIGN ISSUES
The Swedish National Encyclopedia, NE for short, is the largest publication project in Sweden for many years. The product is a brand new general-purpose encyclopedia with the following characteristics:
• 640 pages per volume
• 170,000 articles
• 100 miljon characters
• 500 different characters in the character set
• 7 years production time.
• It should conform to the standard model described earlier.
These characteristics were put down in a contract between the Swedish
Government (who provided a start up loan) and Bra Böcker Publishing
Group. Together with the NE Scientific Advisory Board we have put down
the rules for style, content, objectivity, subject matters etc.
In the early stages of the project it was realized that the only way to achieve the goals was to develop electronic tools. From the beginning we entered an integrated publishing concept. The electronic tool-box system we developed was named LOGOS.
When producing NE we are working with more than 2,000 authors. NE is composed of articles written by scholars from all over Sweden (and even abroad) using a variety of type-setting tools, transmitted by mail, electronically through a network or via disketts.
The manuscripts normally undergo two finishing operations before they reach the type-setting: editing and design. Editing is concerned with the content of the material, design is concerned with the format and the typographical style. Both of these functions leave their mark on the manuscripts, the later one in form of codes. It is from this tradition we get the term copymarks, the inclusion of codes within a text file to specify the typographical characteristics of some portion of text. A copymark may be a symbol or a string of symbols and this had to be taken into account in the design of our system LOGOS.
From the publishers point of view a machine-readable database is a major benefit as data manipulation (sorting, errorchecking, indexing operations etc.) are greatly facilitated. To achieve an easy access to information is the most important design feature. Another important aspect is that after the printed publication has been generated the machine-readable database is still in existence and can be used to generate other information products and services like New Media, an important option for further markets. The most valuable part of a book, like an encyclopedia, is the content – the information or data. An important decision in the design phase was to commit ourselves to the principle of mainly having a separation of content and form.
The key problem for many computer applications of handling text is the need of both rela-tional technology and full-text retrieval at the same time. Why is this essential? As an editor you want to reach the text through different forms of queries or combinations of queries. For example:
• Select all entries in the field of Computer science that have not been delivered on time by the authors. This is a classical relational query that can easily be handled by a data base management system.
• Find the word injury in the articles within the subject Sport. Such a question requires combined retrieval on structural data and text elements.
• Scaleable technology. The system should support large as well as small instal-lations of LOGOS.
• Simple interface. The system should not require extensive computer experience by the editors.
• Standard text-handling software. The system should allow updating into new text-handling developments.
• Spellchecking. The system should support proof reading. Spellchecking is clearly a help but I must point out its limitations. I also want to point out that a skilled proof reader is checking much else besides spelling. In the future we will need more sophisticated forms of proof reading incorporated into our document preparation system.
A general statement of our experiences is that before implementing, testing, and intro-ducing the system we expected to have several technical problems. Many components of in-house developed software and standard IBM-software had to be integrated. Ironically, most of the problems occurred in areas were those which we did not expected to have, and vis versa.
For example we first believed that activities such as formatting and page makeup were very common tasks that must have been solved already in the IBM world. We should not expect too many problems in this area. Yet, it turned out that we had underestimated the problems of handling all the special character sets we needed. We made a complete list of all the special characters occurring in the encyclopedia. For every character we had to create mapping between the internal representation in LOGOS and CCI, the commercial system bought for typesetting and lay out. Although not very much programming was needed, because the only thing we had to create was the conversion tables, the testing turned out to be very prone to errors and time consuming.
LOGOS has been in operation since early 1988. Six volumes of NE have been edited and delivered to our customers. The seventh volume is under production. Two other encyclopedias that we produce have also been implemented into LOGOS. The philosophy is that the common database makes it easy for the editors to check the text in the various other products. They may, however, only alter the text in the product they are assigned to work with.
A small LOGOS version has been installed at a remote location for the production of a regional encyclopedia. Three spin-off products have been produced and several others are under preparation, books as well as CD-ROMs. The LOGOS system has proved to be very efficient and reliable. We have only had one minor stop in the production and one disk crash.
5. NEW ENCYCLOPEDIAS
The computer aided way of publishing is focused on more effective ways to collect and process information, up to and including pre-press production. The results are cheaper, better, and faster ways to get information into print. The marketplace, however, continually demands more for less. There is an increasing demand for even more efficient and cost-effective ways to handle and deliver information. Having a database on machine-readable form also makes "on-demand-publishing" possible. That is, a publication can be generated and distributed as and when ordered by the reader. It has been suggested that bookstores might keep only one display copy of each item and generate copies for sale when requested. It is also possible to search the whole text of several items in a specific way and create "a book" that only contains information tailor-made to fit the interests of the individual.
Delivery mechanisms such as on-line knowledge bases, CD-ROM, and CD-I are examples of products gaining wider and wider acceptance in the marketplace. Distributed systems and multimedia are the buzzwords of today. The defining features of the encyclopedia of the 20th century, as mentioned earlier in this paper, include language, sequence, authorship, illustra-tions, bibliographies, supplements, and cross references. Multimedia systems are suggestive of how a number of these features for encyclopedias can be redefined in the near future. Sequence need not be only alphabetical as multiple paths can be defined. Illustrations can incorporate animation. Text parts and illustrations can incorporate sound. Bibliographies may be replaced, at least in part, by direct access to the source documents. The cross reference structure can become much richer. To realize this potential, however, much research and particularly development needs to be undertaken.
The mission statement of our company includes the following free translation from Swedish: