TERMINOLOGY AND NEURAL NETWORK SYSTEM: THE EXPERIENCE OF THE BRAIN AND NLRI PROJECT

Yutaka Okaya

Library, Tokyo University of Agriculture and Technology, Japan

Norikazu Ohtake

Bio-oriented Technology Research Advancement Institution
Ohmiyashi, Saitama prefecture, Japan
Toshihiro Arai
Nikkei Information Systems Co.,Ltd., Tokyo, Japan

Keywords: Terminology, Neural Network System, Neural Network, SAVVY, National Language Research Institute, NLRI, Pattern Processing, APRP, Indexing Language, Artificial Intelligence, AI, Adaptive Pattern Recognition Processor, Japan, Oxford Concordance Program, OCP, lNTERCONCEPT.

Abstract: This paper describes briefly the two projects' (BRAIN and NLRl) current major activities and products from the viewpoint of Terminology and neural network system: l) It is necessary to apply Terminology to indexing, especially in the specific field for better retrieval of information, 2) On the other hand indexing languages are necessary for Terminology , especially in the case of general voca-bularies, 3) The neural network system is effective in fast retrieval of information and also in inductive searching, 4) Neural network system which is doing with low-level intelligence as SAVVY does not have the capacity to do semantical searching, 5) It is better for flexible searching to combine the Terminology and neural network system.

1 INTRODUCT'ION

Recently library and information science has been faced with drastic change due to these facts :

• An explosion of informations,

• Extending of interdisciplinary and new scientific fields,

• Internationalizing,

• Advancing of technologies, especially computer facilities.

There has been a breakdown of AI (Artificial Intelligence) which originated in the 1960s.

For the sake of the initial success of AI, i.e. fast, rigid and inductive retrieval of information it is necessary to construct a proper relationship between concepts and also between concept-term.

Under these backgrounds recently the Terminology is noticed. Terminology was origi-nated by Dr. Eugen Wuster (Austria) as an interdisciplinary or boundary (Grenzgewiet) science constructed from logic, ontology, linguistic, information science and other sciences. Now, Terminology has entered the next step, i.e. to construct a Terminology Databank, Terminology Network (Termnet) or AI oriented system. Computers have the Power not only in fast proces-sing of large amount of information but also in inductive i.e. AI oriented processing. For this purpose, Predictive Programming such as Prolog or neural network systems are promoted, which is constructed by learning processing, self organizing network and non-linear mathe-matics. Recently we are involved in two projects, i.e. BRAlN (Bio-oriented Technology Research Institution) NLRI (National Language Research Institute) to update its classification or vocabulary systems. Then theoretically we apply the Terminology and technically we use neural network system for analyzing the concepts or terms and testing it by computer capacities.

2 BRIEF SURVEY OF LITERATURE

Basic problems of traditional classification systems are for example non-rigid or out-of-date ordering of concepts (Mills, 1960). To resolve these problems some trials were attempted. These include the faceted classification (Cleverdon, 1969), switching of classification tables, i.e. BSO (Dahlberg, 1982). It is well known that classification system and keyword system are complementary to each other. Basic problems of thesaurus are criteria of BT (Broader Term), NT (Narrower Term) and especially RT (Related Term). In general the thesaurus is oriented to specialized fields of science To resolve these problems, some research have been conducted: Macro thesaurus (Soergel, 1974), and Root thesaurus (Aitchison, 1969).

To overcome of the problems of indexing languages, Wuster originated the Terminology (Wuster, 1972). Its main model are the Four Field Word Model and Concept Tables based on logic, ontology, information science, linguistic and other sciences. Based on these theories, Terminology work are conducted (Felber, 1984). Now concepts and terms are explosively increasing so it is necessary to construct a Terminology Databank or Terminology Network (TermNet) all over the world (CEC, 1984; Madsen, 1981; Nedobity, 1984). A neural network system was promoted by researching brain or neuron science (Minski 1969). Its basic theories are: learning process, self-organizing of neural network and non-linear mathematical formulas.

Hopfield and Rumelhart developed back propagation type system. (Hopfield, 1982; Rumelhart, 1986). On the other hand, Ecxcaliber Technologies system adopts statistical or Boolean pattern matching system -SAVVY. Each system has its advantage as well as dis-advantage. For example, SAVVY type conducts very speedy retrieval of information but does not have a capacity of semantic searching (Excaliber Technologies, 1991).

3 MAJOR PROJECTS' RESULTS

3.1. Framework

3.1.1. Term Model

Based on Wuster' s Four Field Word Model, concept-word relationship can be established as shown in Figure 1. Based on Concept Table such as that shown in Figure 2, a concept can be analyzed from the viewpoint of logical and ontological relationships (Wuster, 1971, 1974; Ozeki, 1989). With above models, some indexing languages such as UDC or OECD thesaurus have been criticized for their inadequacy (Felber, 1982; Nedobity, 1983).






3.1.2. SAVVY

SAVVY is a pattern matching type neural network system which was developed by Excaliber Technologies. Its main processor is APRP (Adaptive Pattern Recognition Processor) as shown Figure 3. Its main characteristics are Boolean pattern matching, non semantical and pragmatic (high speed and interfacial to conventional computers). SAVVY is activated by VAX series, News and SUN series. Searching times are only a few seconds using 13 Mbytes by MicroVax. Learning times are 12 min /1-Mbyte by MicroVAX.

3.2. Major Results

We use BRAIN and NLRI data on floppy disks firstly to analyze concepts or words of its classification table or index and then to test the original data. These can include the original text such as catalogue, book or newspaper.





3.2.l. BRAIN

BRAIN is a sub-governmental institute for developing, collecting and testing of agricul-tural machinery and also is an OECD testing station. For this purpose, BRAIN must store information of broad fields of sciences and technologies, such as agricultural science, mechanical engineering and related fields; and then supply these information to many users (BRAIN, 1990).

a) Objectives. BRAIN collects Japanese and foreign catalogues and books. Using PC and Rbase 5 (relational database), informations are compiled and transmitted via personal computer network to users. For this purpose, BRAIN is now supplementing its own special classification table.

b) Analysis by Terminology. In the classification system of BRAIN, the concepts are constructed by pair of object (such as crop, domestic animals etc... ) and action (such as harvesting, tillaging etc...). As agricultural working is conducted systematically in the relation to working environments, the classification table must order words (concepts) systematically.

We supplement the classification table by logical and ontological viewpoints as shown in Figure 4. A PC network user informs us that for better information retrieval the Terminological control is necessary. As to orthography we also control synonym or loan words such as (kanji), (hiragana), (katakana).






c) Testing by SAVVY. As total volumes of current data (files of Japanese and foreign books and catalogues) amount to 600 KBs, learning times are very short and searching is conducted at once, as shown in Figure 5. By using similarity scale synonimous words are retrieved at once (see Fig. 6). Even though SAVVY searches letter by letter of one word in English, noice still occurs in information retrieval (see Fig. 7).






3.2.2. NLRI

NLRI is an affiliated organization of Bunkachoo (The Agency for Cultural Affairs) established in 1948. It has many research departments such as language system, language behavior language education and computer system.

a) Objectives. The project supplementing the Bunrui Goihyoo [Word list by semantic principles] started in 1948, assisted with government's subsidy for aiding scientific research (NLRI, 1964). It is monitored by one of the authors, Okaya. Total number of words after enlarging the multi-meaning words, antonyms, synonyms and others amounts to over 60,000. This Word List is controlled by NLRI.

b) Analysis by Terminology. This Word List is uniquely constructed by the groups classified by noun, verb, adjective, conjunction and so with aspects of abstract relation, human activities, products and nature including index, as shown in Figures. 8. Comparing to other classification tables or thesauri, ordering of concepts or grouping of concepts of the list are considered to be more desirable. This list includes not only technical terms but also general words or usages. Thus it is necessary to research considerable linguistical aspects in this list .

This list is supplemented partly from the viewpoints of logic, ontology, linguistic and indexing.
 
 

Figure 8. Part of the Word list by semantic principle of NLRI (Subject is "agriculture")

c) Testing by NLRI. Learning time for using the list is about half an hour for 1.5-MB of data. Searching time is only 2 seconds. It is possible to make an index in high speed by BRAIN. Other statistical data in the use of BRAIN and NLRI is given in Table 1.
 
 

Table 1. Statistical data of BRAIN and NLRI

______________________________________________________________________

BRAIN                                                           NLRI


_______________________________________________________________________

Data Items       4,933 items (Japanese catalog)          60,784 words (running 1991.6)

                                                                                         36,559 words (original)

                          2,200 items (foreign books)

Data Volume    261 KB (Japanese catalog)                1.5 MB

                          214 KB (foreign books)

                          176 KB (Japanese books)

Learning Time     10 minutes                                        30 minutes

Search Time         0.5 second                                        2 seconds


_______________________________________________________________________





4. CONCLUSIONS

In these two projects we tried first to analyze words (vocabularies) by using Terminology and then to test it by neural network system. Major results are fol lowing:

• It is necessary to apply the Terminology to indexing languages in special field.

• In the case of general words, the Terminology, indexing language and linguistic are complementary.

• Neural network system such as SAVVY is effective for fast retrieval of information or inductive searching.

• SAVVY type i.e. pattern matching type neural network has little power for semantical retrieval.

• In order to have more flexible retrieval, one should combine the Terminology and neural network system.

Our next steps are to: Construct a total system of information retrieval using both the Terminology and neural network system. As to the Terminology, we plan to relate it with the citation analysis, full text database analysis using the software of OCP, i.e. Oxford Concordance Program (Okaya, 1990) and lNTERCONCEPT of Unesco. As to neural network system, we plan to research back propagation type, Prolog and further the PDP (Parallel Distribution Processing).
 
 

ACKNOWLEDGMENTS

We would like to thank BRAIN, NLRI, NlS Co. and Mr. Fotos, T. J.
 
 

REFERENCES

Aitchison, L. Thesaurofacet: a Thesaurus and Classification for Engineering and Related Subjects. English Electronics Co, 1969.

Bio-oriented Technology Research Advancement Institution. Prospects of the Institute of Agricultural Machinery. Ohmiyashi, Japan: BRAIN, 1990.

Commission of the European Communities. ESPRIT. Brussels: EEC, 1984.

Cleverdon, C., "The Cranfield Test on Index Language Devices," Aslib Proceedings, 19: 73-194 (1967).

Dahlberg, I., "The Broad System of Ordering (BSO) as a Basis for an Integrated Social Science Thesaurus?" International Classification, 7: 66-72 (1982).

Excaliber Technologies, "Excaliber's PixTex: A Retrieval Alternative," Seybold, 20 (13): 1-3 (1991).

Felber, H., "UDC and Terminology: A Comparison of Their Classification," in Proceeding of the 41st FID Congress, Hong Kong, 1982.

Felber, H. Terminology Manual. Paris: Unesco/Infoterm, 1984.

Hopfield, J. J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," in Proceedings of the National Academy of Science,USA, 79: 2554-2558 (1982).

Madsen, B. N., "Danterm," Sprint, 12: 29-37 (1981).

Minski, M. and Papert, S. Perceptron: An Introduction to Computational Geometry. Cam-bridge, MA: MIT Press, 1969.

Mills, J. A Modern Outline of Library Classification. Chapman & Hall, 1960.

National Language Research Institute. Bunrui Goihyoo [Word List by Semantic Principles]. Shuuei Shuppa, Tokyo: NLRI, 1964.

Nedobity, W., "Terminology and Its Application to Classification, Indexing and Abstracting," Unesco Journal of Information Science, Librarianship and Archives Administration, 5 (4): 227-234 (1983).

Nedobity, W. Die Bedeutung der Systematischen Terminologie arbeit fur den Aufbau von Wissensehaften und Anderen Experten Systemen. Wien: Infoterm, 1984.

Okaya Y., "Computer and Reading," presented at the 19th World Congress of Reading, Stockholm, 1990.

Ozeki, S., "Was ist der Begriff? (What is Concept?)" in Terminology and Knowledge Engineering, ed. by H. Czap and C. Galinski. Frankfurt: Index Verlag, 1989.

Rumelhart, D. E., Hinton, G. E., and Willis, R. J., "Learning Representation by Back Propagation Errors," Nature, 323: 533-536 (1986).

Soergel, D. Indexing Languages and Thesauri :Construction and Maintenance. 1974.

Wuster, E., "Begriffs- und Themaklassifikation. Unterschiede in ihrem Wesen und in ihrer Anwendung," Nachrichten fur Dokumentation, 22 (3): 98-104 (1971).

Wuster, E., "Die Allgemeine Terminologielehre. Ein Grenzgewiet zwi schen Sprachwissenschaft," In Proceedings of the 3rd Congress of AILA, Copenhagen, 1972.

Wuuster, E., "Die Umkehrung einer Begriffsbeziehung mnd ihre Kennzeiehung in Worterbucher," Nachrichten fur Dokmentation, 25 (6): 256-262 (1974).