QUICK-METRIC: AN INTEGRATED SYSTEM FOR THE METRIC ANALYSIS OF BIBLIOGRAPHIC INFORMATION
 

Alberto Castro Thompson & Salvador Gorbea Portal

University Center for Library Science Research
National Autonomous University of Mexico
04510 Mexico City, Mexico

E-mail: acastro@redvax1.dgsca.unam.mx

Keywords: Bibliometrics, Software, Bibliometric Models, Bibliometric Laws, CD-ROM, Bradford's Law, Information Quality, Metric Studies, Quantitative Analysis, Online Databases, QUICK-METRIC, CLIPPER, Mexico, Integrated System.

Abstract: This paper describes the main functions and structures of a software package, specially designed for the quantitative analysis of bibliographic information stored in both online and CD-ROM databases. QUICK-METRIC allows to friendly apply biblio-metrics, in an integrated and automatic way, through three principal modules. It has been written in CLIPPER using also DGB-Graphics Library and Graphics Server SDK. The first module is used to design databases and permits enter data directly; also creates databases with down-loading data from CD-ROMs, converting their structure to format DRE. The second module is used for frequency and statistical analysis; for the printer output; and for the displaying of several variables in cross-tables and their graphic re-presentation. The third module calculates mathematical models, such as Bradford's Law on scattering and shows the theoretical and methodological content of bibliometric models which are used in the program to introduce librarians to its study. Finally, the paper emphasizes the importance of these studies for these databases quality control, and also the advantages offered by the use of QUICK-METRIC in the quantitative analysis of bibliographic information.

1. INTRODUCTION

Each year, there is an outgrowing interest for knowing the scientific production generated in the several branches of knowledge. So, the studies that identify the properties and regularities of technical and scientific information have considerably increased, from the analysis of the docu-mental information flows, as a fundamental tool for the decision-making process in information and scientific policy, and an essential part of the scientific communication social system.

Zakutina and Priyekikova (1983) pointed out that in the study of documental information flows, three kinds of analysis can be done:

• A quantitative analysis,

• A scientific communication regularities analysis, and

• A quantitative structural systemic analysis.

The third analysis implies the structural behavior of a documental information flow. For this behavior, its is required not only the analysis of each one of the variables which have a part in the study, but also it is required an analysis of the relation degree of the manifestation of these varia-bles in the information flow studied. This is only obtained through processing of large informa-tion volumes and the correlation or variables matching, which, without the computer techniques help, would be practically impossible to achieve with the precision and rigor required in a research process like this.

However, in reviewing the relevant subject literature, one finds no microcomputer software which can facilitate, in a didactic and integral way, the statistical and quantitative analysis of variables, and the development of mathematical models, as well as, at the same time, producing charts and graphics which define the mathematical models and variables behavior.

However, there are some statistical evaluative studies including those conducted in the Latin American region. For example, Morales Garcia and Diaz Garcia (1992) evaluated the periodical publications, Araujo y Gra Rios (1991) studied the frequencies analysis of bibliographic variables from CD-ROM databases. E. R. Hjerppe (1970) published an early paper related to a computer program identifying the significant changes in citation frequency. Other background studies in-clude Brand's study (1980) in partial fulfillment of his Ph. D. thesis, and O'Connor's study (1980), all designed for mainframe computers.

Due to the lack of microcomputer software, the University Center for Library Science Re-search of the National Autonomous University of Mexico has made a research proposal entitled "Theoretical and Methodological Principles of Information Metric Studies," to design a software that permits, in a integral way, quantitative analysis required in the type of studies mentioned above. The objectives of this research proposal are:

• To develop the QUICK-METRIC software and to reveal its structure, as well as the results obtained through the use of this tool in information metric studies.

• To stress the importance of the use of microcomputer in assessing the use of scientific information. These microcomputer-based techniques can provide the rigor and precision required for any quantitative analysis.

• To lay the foundations of the existing relationship among the information metric studies and the quality control of electronic databases on optical or magnetic media.

To achieve these objectives, the methodology and the tools used in the QUICK-METRIC SOFTWARE design are presented, and its structure and the results obtained through its use are also described. Finally, problems presented in the study are also discussed.

2. METHODOLOGY AND TOOLS USED

Generally speaking, sources of current information metrics studies are compiled from data-bases on either optical or magnetic media. Thus, from a methodological point of view, those who are facing this kind of studies find three problems:

• The management of the databases.

• The analysis of the selected variables for the study, through the calculus of occurrence frequencies of each variable, or the matching or correlation between two or more varia-bles, or the graphicic output representing the use behavior.

• The calculus of mathematical models that represent the classical laws in bibliometrics on which the scientific information regularities are identified. These laws are Bradford's law on concentration - scattering of information, and Lotka's law on author's produc-tivity, among others.

With these methodological considerations, it was necessary to set up a system design which would allow one, in a integral way, to carry out studies on the information behavior, based on the desired thematic.

For this objective, a feasible study was carried up to determine the best suited programming language for giving such a solution. In this study, three most market-accepted languages -- C++, Clipper 5.01 and Pascal -- were considered with the help of the information included in the reference manuals of the respective languages (Borland, 1991; Nantucket, 1991; Borland,1992).

In evaluating the selected languages, the following criteria were used:

• Management functions for the creation and modification of databases,

• Language compatibility to others,

• Language stability in the market,

• Capacity for working in a network environment,

• Graphic environment management,

• Capacity and resolution characteristics in the generation of graphics by computer screen and by printer.

Following are results of this evaluation in a matrix form:
 

Guiide:

X = Satisfies the mentioned aspect.

A = It has the need to generate routines to databases management.

B = It lacks market standardization.

C = It is not compatible with another languages from different producers.

Based on the obtained results in this analysis, the Clipper program in its version 5.01 was selected for the following characteristics:

• Maximum elements number in a arrangement: 4096.

• It requires only a minimum of 400 KBs for its functioning.

• It can manage one billion records by file.

• It can manage up to one thousand fields by record.

• It can manage up to 15 open indexes by database.

• The programming technique is an object oriented one.

• Unlimited functions and procedures by program.

These characteristics permit programming with the needed efficiency, the inherent aspects to the database management, as well as the calculus of the selected variables in each database.

For obtaining the graphic output it was necessary to appeal to the CLIPPER libraries, that permit this functions. Among these are:

• DGE GRAPHICS LIBRARY

• GRAPHICS SERVER SDK (WINDOWS GRAPHICS DLL)

• CHART BUILDER(FOR VISUAL BASIC)

For the graphic environment generation, DGE GRAPHICS LIBRARY (Pinnacle Publishing, 1993) was selected because this library works in the MS-DOS environment and carries out gra-phics according to the requirements of the used methods in the program (type X log, Y log, and XY log graphics). For future versions, the second option could be selected should the MS-DOS environment be changed to a WINDOWS one.

3. SYSTEM STRUCTURE AND DESCRIPTION

The program structure consists of three modules, which can be accessed through the use of a menu system that is presented in the computer screen by WINDOWS. These WINDOWS com-mands can be executed through the use of the first letter's option, an illuminated bar upon the option, or rather, through the selection by mouse.

3.1. Structure

• Database manager:

- Creates database structures by its own means for entering records of cites or references.

- Introduce data and its modifications to an already created database.

- Modifies databases structures.

- Converts databases to DBF format (which is the system working format) from the output produced by partial database searching in optical or magnetic support.

• Frequency variables and statistical calculus. - Database variables selection.

- One or more variables frequency calculus.

- Variables statistical calculus.

- Charting output from one or more inputs.

- Output graphic representation of variables. (Both outputs can be generated to the computer screen, to a computer file, or to a printer.)

• Bibliometric models calculus: - Bradford's model (output of the resulting chart from the zones, mathematical model calculus and its graphic representation)

- Lotka's model (output of the productivity authors chart, mathematical model calculus and its verification by the statistical index Kolmogorov-Smirnov and its Graphic represen-tation).

- Price's index (obsolescence analysis).

3.2. Modules functions that integrate the system

• Database management module.

As already pointed out, information metric studies require a large amount of bibliographic references, generally compiled from either electronic databases or printed bibliographic reper-toires. Yet, not always these databases can be obtained in the desired format for carrying out the calculus. Therefore, it is necessary to convert the system format, generally in ASCII, to the select-ed format to carry out the management and calculus in the new database.

This system uses the DBF format for the management and variables calculus. By the uni-versality and acceptance of this format in data management, it can be processed by languages such as C++ or CLIPPER, and such universal managers such as all versions of dBASE and FOXBASE.

In this first module, besides of having the databases output conversion in optical systems such as CD-ROM systems to the format used by the program, it can create databases for specific applications, such as citing analysis, obsolescence studies in collections and documents, and a particular information flow analysis such as on a library collection.

• Module for the frequency variables and statistical calculus.

Once the source database for the study is obtained, the second module permits one to carry out frequency functions calculus, which appear in one or more variables, or the statistical calculus required with the variables, such as the mode, the mean, the median, among others.

In the documental information flows analysis, results of great interest are related to the quan-titative behavior of the documental variables that relate to the references or citations. These variables are basic measure units in this type of analysis, such as date, place of publication, publi-cation language, type and number of authors, subject area and geographical scope of document contents, among others.

For a better appreciation of the results that define the variables behavior, this module facili-tates the calculus output through simple frequencies charts. It also shows the graphic represen-tation of these results, that can be shown on a computer screen, in a printed form, or on disc.

As a by-product to the utilization of this module, it can identify those irregularities which are present in most databases and those conspire against information quality that these databases contain.

• Module for using the bibliometric models.

The previous studies cited in the Introduction section have not contemplate the calculus of the classical bibliometric models in such a way that can be used in successive applications. How-ever, some partial solutions using electronic spreadsheets such as SuperCal 5, (Gorbea-Portal, 1993) for the graphic and chart output do exist.

Through this module, one can obtain the output charts and graphics from the theoretical postulates as well as the calculus of the mathematical models postulated by Bradford, Lotka and Price.

This module also offers the contents of the theoretical postulates in each one of the models used through an online help.

4. GENERAL CONSIDERATIONS

There is an growing marked interest in knowing more about scientifically generated infor-mation production behavior. Knowledge in this has contributed to an augment of the information metric studies, and thus to the development of the theoretical, and methodological principles used in this field.

The growing production of optical and other types of databases has facilitated the informa-tion metric studies.

The documental information flow analysis, compiled from CD-ROM databases, are more time consuming because of the density and volume of information that optical technology can store.

If a software designed for use in the quantitative analysis of documental information flows is efficient, it must satisfy the following three aspects, that, from a methodological point of view, are identified with the methodology of this study:

• Facility for database management, created by the system or imported or converted from another system or CD-ROM database.

• Capability for selecting variables and for statistical calculus, as well as for graphic representations.

• Capacity for calculating the mathematical models that represent the theoretical postulates of the classical laws of bibliometrics, and presenting graphic results.

The software described in this study satisfies the above mentioned requirements, constituting also a didactic model for those beginners in this kind of studies, because it includes a definition of the theoretical postulates of each one of the models used by the system, which can be consulted by its user whenever necessary.

The information metric studies facilitate the quality control of databases. Errors occurred in the process of information input in the database, such as lack of uniformity of author's entries, of titles sources, of publication places, of editors, of descriptors, among others, can be identified.
 
 

REFERENCES

Araujo Ruíz, J. A. y R. Gra Ríos, "Paquete de programas para el análisis informétrico automa-tizado," Investigación Bibliotecológica (México) 5 (11): 50-52, (julio-diciembre, 1991).

Borland. C ++. Ver. 2.0. Programmer's Guide. California: Borland, 1991. 444 p.

Borland. Pascal. Ver. 7.0. With Objects Language Guide. California: Borland, 1992. 371 p.

Brand, S. Computer Programs for Analysis of Bibliometric Distributions. London: City Univer-sity London, 1980. 123 p. MSc. Thesis Disertation, City University, London.

CLIPPER Ver. 5.0. Reference. Los Angeles: Nantucket, 1993.

Gorbea Portal, S. Concentración-dispersión de la Información sobre Bibliotecología, Archivología y Ciencia de la Información relativa a América Latina. 34 p. Ponencia presentada en las XXIV Jornadas Mexicanas de Biblioteconomía del 13 al 15 de septiembre de 1993, Guadalajara, Jal., México.

Hjerppe, E. R. A computer program for the identification of significant changes in citation fre-quency . -- [s.l.] Inst. Tech. Dept. Inf. Proc., May 1970. p. 35

Morales García, Ana María y Díaz García, Alberto EVASOF: Sistema automatizado para deter-minar la idoneidad de las publicaciones seriadas sobre la base del método de RSM. Ciencias de la Información. (La Habana) 23 (4): 273-277, dic. 1992.

O'Connor, J. -- Citing statements recognition by computer and use to improve retrieval. Proceed-ings of the American Society for Information Science, 17: 177-179 (1980).

Technical Reference. dge. A Graphics Library. Ver. 5.0. Washington, DC: Pinnacle Publishing, 1993. 397 p.

Zacutina, G. P., Prijanikova, V. K. Característica y an lisis del flujo de los documentos primarios. La Habana: IDICT, [1983]. 83 p.