Han Ximin
Shanghai Medical University Library
Shanghai 200032
Keywords: Data conversion, DBASE II/III, Micro CDS/ISIS, File structuresAbstract: A conversion program which converts databases based on DBASE II/III into Micro CDS/ISIS is described. It uses a new technology to get data fields by reading DBF files directly. It can automatically identify the database file with DBASE II/III, and can also process duplicate fields, subfields and extra fields. The software is noted for the easy selection of the conversion range of records, convenient operation and clarity. It influences the use of standard software for medical information in developing countries.
1. INTRODUCTION
Micro CDS/ISIS is a general software for information retrieval designed by UNESCO. Its major characteristics are to create infinite databases, consisting of totally different data elements, duplicate fields, and subfields and it provides several searching methods and enhanced logic capabilities. The data structure is designed for variable length fields and the searching speed is fast. CDS/ISIS also provides the ability to convert ISO 2709 format. This not only solves the problem for exchanging data with host computers, but it also provides a standard interface to share data resources with different kinds of information retrieval
systems. Recently, CDS/ISIS has extended its use in over 80 countries
and international organizations, and is one of the key products in the
area of information retrieval using microcomputers in the Seventh Five-Year
plan period (1986-1990) of China. DBASE III is a very popular software
in the area of microcomputer information retrieval, especially in medical
sciences in China. With this conversion software all users creating database
in their own system can use CDS/ISIS conveniently and share data resources
with a great many other Micro CDS/ISIS users. It can also promote the process
of standardized information retrieval in China.
1.l. Special Features
• The conversion software is designed in BASIC COMPILER language and can be applied in IBM PC or other compatible computers. The delimiter is defined within the conversion software in the sequence of ^a,^b,^c,^d,^e.
• It gets a data field by reading the DBF file directly and obtains separate relational data from one field automatically.
• According to DBASE II/III's features, the software provides the additional function of an extra field, and the possibility of flexibly selecting the conversion range of records.
• The length of a record is 1275 bytes which satisfies the needs of information retrieval databases created by DBASE II/III.
1.2. Design Principle
The conversion software uses a new technology to obtain data fields
from reading the DBF file directly. It works according to an input description;
it then changes the data field which is selected into effective data (clearing
up any ineffective blank characters); and is then converted to ISO 2709
format.
2. HOW TO GET DATA FIELD FROM READING THE DBF FILE
2.l. DBF File Structure of DBASE II
The DBF file of DBASE II/III consists of two blocks: the control section description field and the data field; next is the data section. In a DBF file of DBASE II, the control section is a fixed length structure of 512 bytes. The first 8 bytes are used for the file description; the other 504 bytes are used for the data field description. Each data field takes
16 bytes and thus a record can be defined at most of 32 data fields.
The end symbol of the data field description uses a retry character. Each
record of the data section is accessed in sequence with no separator character
between two records.
2.2. DBF File Structure of DBASE III
In a DBF file of DBASE III, the control section is a variable length structure of 128 data fields, but it is not the same as DBASE II. There is no reserved space for 128 data fields; only the space for the data field currently defined in the file is taken. This aspect of DBASE III is better than DBASE II.
The first 32 bytes of the DBF file are used for the file description, the others are used for data field descriptions and the data section. Each field description occupies 32 bytes, and can define at most 128 data fields. An end symbol of the data field description uses two characters - retry and check. Each header record of the data section uses a separator character as space.
2.3. Program Design
The DBF file structure of DBASE II/III includes the two blocks mentioned above. Random-file access is adopted when reading the DBF file and the length of the file buffer is defined as 1 record length of the DBF file. The general information retrieval software contains several items sometimes including abstract (more than 200 Chinese characters, or 400 bytes). However a character variable in BASIC contains at most 255 bytes. In order to satisfy the needs to a greater extent, the file buffer is designed using five character variables to average out records. The flow chart is shown in figure 3.
3. ISO 2709 FORMAT CONVERSION
The file record of the ISO 2709 formal includes three logic sections: the header section, direct section, and data section. Access is in sequence and and the sections are of variable length, with a segment length of 80 bytes and an end mark to a record.
3.1. Header Section
ISO 2709 format file's header section is a fixed length of 24 bytes (figure 4).
3.2. Directory Section
The Directory section consists of items which are fixed length and in
variable number. The directory section contains a field description, length
and location of the variable length field.
3.3. Variable Length Data Field
In the data section of the ISO 2709 format file, all the data is variable length. Each data field has a corresponding directory item in the directory section. According to the input field description, you can get the location of the data field from indexing the directory item. The data field is separated by a character and a record end mark has a terminal character.
3.4. Conversion Principle
In the ISO 2709 format file, duplicate fields are processed by inputting the same field description. In the conversion program, each field can have five subfields. The delimiter is defined in the sequence of ^a,^b,^c,^d,^e in the conversion software. The data field is obtained from the subfield's series number accessed in the current field; then it is connected to the corresponding field and subfield. The flow chart is shown in figure 6.
4. EXAMPLE
The software has successfully converted OHSW,0BF (Occupational Health and Safety) created by DBASE III into ISO 2709 format file. The record length of OHSW,DBF is 308 bytes, for all 2052 records - a total of more than 620 KB. Conversion time is only half an hour. The converted data file can be loaded in Micro CDS/ISIS.
The information from the database created by DBASE II/III is displayed on the computer screen automatically, thus users can operate the conversion software conveniently and succinctly.
With the extension and diversification of the computer applications in medical informatics it becomes even more important for all library and information workers to adopt a standard software to build up their own databases. The conversion software described here has been warmly praised as a valuable tool to assist standard software.