TWO MONTHS TO A CD-ROM: WE DID IT AND SO CAN YOU!

Amanda Siewling Lo, Yu-Huei Lainer & Ching-chih Chen

Graduate School of Library & Information Science
Simmons College, Boston, MA 02115, USA

Keywords: CD-ROM, Desktop CD-ROM Publishing, Electronic Publishing, Conference Proceedings, NIT, New Information Technology, KAWare, Know-ledge Access International, DMI, Disc Manufacturing Inc.

Abstract: Since the first introduction of CD-ROM in libraries in January 1985, CD-ROM technology has had substantial impact on the library and information services in libraries. Up until recent months, librarians have been mainly consu-mers of CD products almost all of which have been produced by profit-making companies and organizations. Technology has developed so rapidly that it is possible now for desk-top CD-ROM publishing. Well motivated to provide a real demonstration at NIT '91, two master students at the Graduate School of Library and Information Science, Simmons College, elected an independence study course under Ching-chih Chen on CD-ROM desk-top publishing in the Fall of 1991. In a very short two-month period -- mid-September to mid-November, 1991, we are able to produce a CD-ROM containing full text and images of all the four conference Proceedings in the NIT series. With the contribution of Knowledge Access International, KAware™ Disk Publisher is used to analyze, prepare and index both the text and image files, and KAware2™ Retrieval System is used to provide full-text & image retrieval capabilities. The replication of the CD-ROMs was made possible at the courtesy of the Disc Manufactuing, Inc. (DMI).

This paper will briefly outline the process of our CD-ROM desktop publishing activity, discuss both our excitement and frustration with this involvement , and offer some practical tips for those who are contemplating producing CD-ROMs themselves.

 
1. INTRODUCTION

During the last few years, we have witnessed incredibly fast growth of electronic CD-ROM publishing. Currently there are over 1,500 CD-ROM titles commercially available. Accordingly, the use of CD-ROM titles in libraries and information centers has also grown rapidly (Chen, 1991). Yet, until recently, cost has prohibited individual organizations from considering CD-ROM publishing themselves. Thus few would be interested in creating a CD-ROM.

The recent advent in both PC's hardware and software, particularly the availability of low-cost desktop disk publishers for microcomputers, which are generally available in offices, has made it possible for librarians and information professionals to consider desktop CD-ROM publishing themselves. The potential of this development to libraries and information centers around the world can be easily imagined. CD-ROM has many advantages. This covers the major problems related to cost, access, and distribution, faced by librarians and information professionals. CD-ROM's large storage capacity, its transportability, and its readiness for information access make this an ideal medium for information sharing and distribution. Like-wise, CD-ROM's possibilities for libraries and information centers in developing and less developed countries in enhancing information access and sharing are indeed great!

Two students in the Master Program in Library and Information Science at Simmons College took an intensive course with Professor Chen in the summer of 1991, and had been eager to learn the ins and outs of creating a CD-ROM themselves. With NIT '91 being organiz-ed by their professor, and knowing that most of the previous Proceedings of the series of conferences on new information technology were produced by desktop publishing, the two students considered this a perfect opportunity for them to embark on such an ambitious and useful project. Instead of learning the process more or less in theory, they chose to have hands-on experience in producing this CD-ROM, which combines contents of all four conferences organized by Chen:

The First Pacific Conference on New Information Technology, Bangkok, June 16-18, 1987;

The Second Pacific Conference on New Information Technology, Singapore, May 29-31, 1989;

The Third International Conference on New Information Technology, Guadalajara, Mexico, November 26-28, 1990; and

The Fourth International Conference on New Information Technology, Budapest, December 2-4, 1991.

Although there is a seemingly impossible time line -- with only two months, the project nontheless is considered to be possible since more than half of the previous proceedings are already available in machine readable form.

The idea was conceived in mid-September 1991, and the actual process started immediate-ly the following week.

2. THE PLANNING PROCESS

The planning phase of any project is the most critical phase. Chen and Wiggins (1990) in its multimedia application product introducing the CD-ROM technology and products (Fig. 1), illustrates rather clearly the production chair of CD publishing (Figs. 2-3). Clearly, from start to finish, CD-ROM publishing, whether desktop or otherwise, involves the following steps:





• Data generation, or database development

• Data preparation

- Data conversion

- File preparation

- Indexing

• Assembly and production
- Assembly processed files

- Testing

- Premastering and mastering

- Replication

• Distribution and sales

In order to ensure the success of our project, we needed to have every step of the above under close control. In addition, it was also important to consider numerous logistic matters during the planning process. These include:

• How to budget our project?

• Where to go for possible help and/or contribution?

• What are the necessary and desirable hardware and software for CD-ROM desktop publishing?

• How to choose a Publisher and retrieval software?

• What are the environmental limitations?

Since NIT is a series of non-profit conferences organized by Chen as her contribution to international library and information work, potentially financial limitations can be one of the biggest problems facing us. Fortunately, several software companies contributed to our effort. Since time does not permit us to conduct research in the comparative use of these CD desktop publishing software, Knowledge Access International's KAware™ Disk Publisher was chosen. Furthermore, our financial worry was totally eliminated in mid-process when Disc Manufac-turing, Inc. (DMI), who acquired the Mastering and Replication facilities of Philip Dupont Optical Inc. (PDO), agreed graciously to produce our CD-ROM free of charge as a contribution to international development.

As expected, currently almost all available CD desktop "Publishers" are running on IBM PC or 100% compatible with at least 640KB RAM.

2.1. Data Generation and Database Development

This is the first step in the process of electronic publishing that we need to determine what kind of information is to be published. Information sources may be textual, fielded, or image. During this phase, the involved activities are basically information gathering and electronic management. The former determines the formats to be chosen for inclusion, and the latter considers how to turn all the desired information to electronic forms and also how to manage them.

Potentially, our CD-ROM can include all three types of information. But, aside from information on participants which can come in fielded database format, almost all are pre-dominantly full-text files, with each Proceedings consisting of an average of about 45 images. When the images are included in an article, it is very unsatisfactory not to include those images. Thus, the decision was made from the beginning to include both textual and image information.

A complete inventory was taken of the number of articles in each Proceedings, and also the number of images existing in the three published Proceedings. A running list of the articles of each Conference Proceedings were created, and types of information, such as printed copy or electronic file, are clearly marked. When the electronic files are available, the size of each file is recorded and they are carefully checked for system compatibility (i.e. IBM PC or Mac files) and word processing software used. Word processing software commonly used on the IBM are WordStar Professional, WordPerfect 5.0, Microsoft Word and Word4.0, MacWrite, Word-Perfect for Macintosh. This inventory should not be overlooked as it is a very important step towards consistent conversion of data, since all word processed files have to be converted to ASCII. The papers for the Proceedings of the 4th Conference reached the editor at random intervals and some are as late as a week before the meeting dates, so we processed each one as it arrived.

2.2. Data Preparation

2.2.1. Data Conversion

The existing information for the first three Proceedings was available in both printed and electronic forms. For the printed information, all three popular ways of conversion to an electronic format -- Keyboarding, imaging(scanning) , and optical character recognition -- were used. These activities applied heavily to materials of the First Conference and also the new materials of the Budapest Conference as the papers are facsimiles instead of electronic files on floppy disks.

Since most of our data files available in electronic formats are mostly Macintosh files, we have encountered difficulties in converting both text and images from the Macintosh disks.

• Word processing - A good amount of time has been spent on exploring a few PC based wordprocessing software to ensure the files saved as ASCII did truly have all the control character codes removed. This seemingly easy task turns out to be very tedious, and at times frustrating and boring one. But, the process is a must!

• Images - The difficulties are even more considerable in image file conversion as the KAware™ Disk Publisher recognizes only image files with Tag Image File Formats (TIFF), including TIFF1 or TIFF2 formats. Although many Macintosh and IBM scan and paint/draw software save files as TIFF, these files may not be supported by KAware™ Disk Publisher.

2.2.2. File Preparation

The is one of the most essential steps in CD-ROM data preparation. Once the available information and/or data are assembled and converted to electronic format, we have to index, format, and organize the information files in logical formats that describe the disc layout. In order to be ready for indexing, all files have to be either singularly first, and then in the merged way, prepared and analyzed.

Before running the KAware™ Disk Publisher to index the source file so that it becomes fully retrievable, the source files and image has to go through a very thorough process, such as tagging, preparation of special wordlist, phrase list, and stop words, etc... In our project, each conference proceeding contributes to one single file. Each is analyzed, prepared and indexed separately before we merged the four conference proceedings (merged in WordPerfect) and to run it through the disk publisher at one go.




The preliminary file preparation using the KAware™ Disk Publisher involved TAGGING and MARKING the files. Figure. 4 shows the a sample tagged file with markers. The markers

were essentially there so that the headings such as AUTHOR, TITLE, ABSTRACT will show up in the TABLE OF CONTENTS in KAWARE™ Retrieval system . Once a file is tagged, it can be run in TEXT ANALYZER of KAware™ Disk Publisher to check for any irregularities or problem pages and lines that could cripple the final stage, TEXT INDEXING. The second stage prepares the file for indexing. Figure 5 shows how problems if any, are being specified in the text analyzer report.







2.2.3 Image File Preparation

Image files have to be compressed at the F3 (Other Utility ) in the KAware™ DISK Publisher. The compressed images are created and saved as *.CPR. The compressed image files become retrievable in KAware2™ after the text file has been tagged with an image call. Figure 6 shows that the images are tagged with file names, such as IM=91CHEN05.CPR (this is not visible to the user), and it can be viewed by pressing ALT-I. As mentioned in Section 2.2.1., not all image files with header or extension "TIFF" are automati-cally compatible and compressable in KADI (KAware™ Disk Publisher).

Our project requires us to manage 184 image files in both line art and gray tones and in sizes ranging from 80 KB to 300KB. An IBM-compatible scankit was the solution to these problems of TIFF compression. The low-cost Marstek ScanKit comes with a 4.5 by 4.5 hand held scanner and scanning software. The scanning software allows us to scan images of size 4.5" by 4.5 " and to save the images in various formats (.TIFF, .PIX, TIF 1D, etc...). In our






case, we finally chose the "TIF 1D" after having experimented with several others. This format seemed to work with KADI. After the scanned images are saved as TIF1D with the Marstek ScanKit, the images (*.TIF) are compressed by the IMAGE COMPRESSOR resulting in the corresponding *.CPR. The next step is to use the File mover to MOVE the *.CPR to a sub-directory AA in the root dir C:\ . The *.CPR must reside in a subdirectory of the directory in which the sourcefile is kept. This MOVE is similar to the COPY command in DOS. Figure 7
 
 

shows how KAware™ DISk Publisher prompts you to move your compressed image files.

It is worthnoting that since our scanner is a low-cost one, the resulting scanned images are not of the best quality. Furthermore, time limitation also prevent us to further process the images. Nevertheless, we are happy to have been able to provide the users of our disc the capa-bility of image retrieval, which also include the zooming capability at 4 levels.

2.2.4. Indexing

Once the file is ready and the necessary "helper" files, such as file definition, stopword, and phrase list, have been prepared, we are ready to index the file with the Publisher. Figure 8 shows part of the steps involved in indexing in KAware™:

• Parsing each field to find all words except stopwords

• Sorting these words

• Finding all unique words

• Indexing the words

• Creating the display file

• Creating a template file that instruct the KAware2™ Retrieval System to specify information about the file.

It should be of interest to know that, as shown in Figure 8, the complete NIT file of four conference proceedings consists of 84,466 text lines which yield 44,561 unique words. The total size of the NIT file is about 4 MBs, and the indexing time is 48 minutes and 43 seconds.

In addition to the text file. NIT CD-ROM also contains about 180 image files. Although we used a rather low-cost hand-held scanner, the resolution could be set as high as 400 dpi. However, the originals of many images, specifically those of the Bangkok Conference were of poor quality, thus the scanned images can not be as satisfactory as we would like. The file size of each varies from about 80 KB to 350 KB, and thus the size of the image files before compression totals over 40 MBs. However, after compression, the size has been drastically reduced to about 9 MBs.

2.2.5. Full-text Retrieval

Once we have gone through the analysis, preparation, and indexing processes accurately, we are ready to conduct full-text retrieval by using KAware2™ Retrieval System. Figure 9 illustrates how the the user can view the indexed file. By pressing F3 [Select], a list of searchable fields, such as TABLE OF CONTENTS, CONFERENCE, TITLE, AUTHOR, etc... are displayed. By choosing the TABLE OF CONTENTS, the user will be able to scroll in a new menu that contains a list of articles by TITLE, AUTHOR and ABSTRACT. The full text of the desired article will be displayed when any selectable entry is chosen.




2. 3. Assembly and Production

2.3.1. Assembling Processed Files

When all the files are ready to be sent for premastering, we have to consider alternative storage media to download the files from the hard disk of the IBM PS/2.

The Pipeline System loaned to us by Knowledge Access was not configured to be used on our PS/2. We attempted to install it on a 386, but then there were still numerour booting up problems, which were not solvable quickly even by KAI support staff. Due to a lack of time, we made a decision to download to floppies. Having become resigned to some of the technical difficulties of installation, and having overcome others, files created by the KAware™ Disk Publisher are ready to be shipped to DMI for premastering/mastering and replication. Prior to the shipping, we need to also prepare the following:

• Design of the CD-ROM label artwork; This was quickly done using a Macintosh based graphics software, FreeHand 2.0;.

• Prepare a complete list of necessary files for the distribution of the product and for the manufacturing facility, DMI, Inc.;

• Test the backup and restore process; (downloading from hard disk to 1.2 MB floppies and vice versa);

• Testing the application with the KAware™ Retrieval System with the complete merged file.

2.3.2. Premastering/Mastering and Replication

After many days of long and hard work, we finally packed our processed files via Federal Express to DMI in Wilmington, Delaware on November 14, 1991. The files were loaded at DMI, simulated by their production hardware. A test disc was also prepared by the DMI staff on November 18, 1991, and tested by us on the next day (see Figure 10). Having verified that



there is no existing error, the 9-track tape was forwarded to DMI's plant in Hunsville, Alabama on the same day. Our hard work is finally paid off when the newly pressed CD-ROMs in its proper casing arrive as sche-duled, in time for us to bring to NIT '91 in Budapest on November 28, 1991!

2.4. Distribution

We are proud to say that our desktop published CD-ROM, ready in two months from the beginning of planning, is ready for free distribution to participants at this conference.

3. CONCLUSION

This has not been an easy project since both the technology and the process are so new, and few people have expertise in testing this technology on a low-end desktop platform. Although the software vendor is most generous in willing to provide us technical support whenever needed, most of the time, difficulties and problems -- whether in image processing, or in hardware configuration, or in file merging, etc... -- were new to the vendor's technical support staff as well. Thus, we have managed to solve the difficulties by persistence and trials. In between, we have tested and abandoned many alternatives to our problems. We have learned a great deal, and have marvelled at the potential this application can bring to libraries, informa-tion centers, and archives. With our limited resources, we have accomplished "the impossible," so can you! Take advantage of the tools available to you, and let them work for you!
 
 

ACKNOWLEDGMENTS

A project of this intensive nature can not be completed without the help and contribution

of many people and organizations. We wish to thank Knowledge Access International and its President, Dr. Matilda Butler, for their permission to use KAware™ Disk Publisher and KAware2™ Retrieval System with our product. We would like to thank Fred Cash, and Sue Davy, of Knowledge Access International for spending many hours over the phone to discuss technical problems with us. The generous donation from DMI and its marketing director, Mr. Rusky Caper, is essential for the free distribution of our CD-ROMs to NIT '91 participants. J. Philips Busk of DMI has provided us invaluable technical supports, and Wendy Buyea and Calyton Summer sacrified part of their Saturday for us. We are indeed fortunate to work with these professionals at DMI. We appreciate the support given by Hugh Blackmer of Simmons College. Finally, we would like to thank Lora Lopes for supervising the manuscript process of not only NIT '91, but also part of the other three conferences. Her work and those of two student assistants, Jenn Desormeau and Magi Rami, are very much appreciated.
 
 

REFERENCES

Butler, Matilda, & William Paisley. How to Make Your Own CD-ROM. Mountain View, CA: Knowledge Access International, Inc. 1990.

Chen, Ching-chih. Optical Discs in Libraries: Use & Trends. Medford, NJ: Learned Information, Inc., 1991.

Chen, Ching-chih and Rae Jean N. Wiggins. HyperIntro to CD ROM Technology and Products. Newton, MA: MicroUse Information, Inc., 1990. [An electronic product on floppy discs for Macintosh using HyperCard front-end.]