Johann Gamper and Wolfgang Nejdl and Martin
European Academy Bolzano/Bozen
Scientific Area ``Language and Law''
Weggensteinstr. 12a, 39100 Bozen, Italy
phone: +39 0471 306114, fax: +39 0471 306199
Institut für Rechnergestützte Wissensverarbeitung
University of Hannover
Lange Laube 3, 30159 Hannover, Germany
phone: +49 511 762 9710, fax: +49 511 762 9712
April 7, 1999
Human users usually use natural language to communicate and exchange knowledge. In order to decrease ambiguities, the field of terminology has been concerned with communication in specific domains using specific and/or formal vocabularies.
In computer science and especially in artificial intelligence, knowledge management and transfer based on formal models/ontologies has been studied in various contexts. In order to communicate about (a part of) the real world, the knowledge about this part has to be represented by some kind of knowledge model on which all participating agents agree. The main purpose of these models is to facilitate an effective communication between various agents by providing a method for a concise and unambiguous representation of the knowledge to be transferred. Thus two agents, either humans or artifacts such as computer systems, are able to communicate and possibly argue about the part of the world described by the model. Such knowledge models for a specific domain are often called domain models.
Two types of knowledge models (or domain models) are favored today: ontologies and terminologies [Guarino and Giaretta, 1995]. While ontologies range from simple taxonomies based on inheritance and part-whole knowledge to complex concept systems including common sense knowledge, the scope of terminologies is restricted to natural language definitions for specific domains.
This paper aims to explore common interests as well as differences between ontologies and terminologies and the corresponding research fields. The next two sections give a brief introduction into the two disciplines respectively. In section 4 we first compare ontologies and terminologies, then we show and discuss the advantages of representing terminological knowledge as an ontology in a hyperbook system by means of an example.
Ontology in the philosophical sense is a systematic account of existence as perceived by humans. In the last decade this term has been borrowed by artificial intelligence researchers, which use ontology in the sense of an engineering artifact which specifies what exists for an artificial agent. While several slightly different definitions exist, we refer to the most often cited definition of an AI-ontology provided by [Gruber, 1993]:
An ontology is an explicit specification of a conceptualization.A few remarks are worthwhile. First, the basis for an ontology is a conceptualization1. A conceptualization along with a vocabulary to refer to the entities in the conceptualization is the result of an ontological analysis of a particular domain. The conceptualization consists of the identified concepts (objects, events, states of affairs, beliefs, etc.) and conceptual relationships that are assumed to exist and to be relevant. For example, an ontological analysis in the medical field yields concepts like ``disease'', ``symptom'', ``therapy'' and relationships like ``disease causes symptom'' and ``therapy treats disease''. Second, an ontology specifies a conceptualization explicitly in a formal language, usually as a First Order theory. Hence, the intended meaning of the vocabulary is formally defined. An explicit and formal definition enables artificial agents to reason and to infer new knowledge about the specified part of the world, and it facilitates the construction and maintenance of ontologies and knowledge bases.
In [van Heijst et al., 1997] ontologies are classified along two dimensions: the subject and the structure of a conceptualization. With respect to the subject of the conceptualization, application ontologies, domain ontologies, generic ontologies and representation ontologies are distinguished. Domain ontologies are the most notable examples and provide a view of a particular domain such as medicine or law. Generic ontologies specify very general concepts such as time and space. Representation ontologies specify knowledge representation formalisms, e.g. the frame ontology used in Ontolingua [Gruber, 1993]. Other categories along this dimension are possible. An ontology of problem solving tasks and methods [Chandrasekaran et al., 1998] specifies entities relevant to problem solving such as abduction, deduction, goal, observation, etc. Yet another example is a presentation ontology which specifies entities for the presentation of knowledge/data.
Along the second dimension, the structure of conceptualization, ontologies range from simple taxonomies to highly tangled networks including axioms associated with concepts and relations. In [van Heijst et al., 1997] three categories with increasing complexity are distinguished: terminological ontologies such as lexicons and taxonomies, information ontologies which specify the record structure of databases, and knowledge modeling ontologies which specify conceptualizations of knowledge. One indicator of the complexity of an ontology is the set of conceptual relationships. Specialization/generalization is the most basic relation and should be included in any ontology. This relation orders concepts hierarchically in a taxonomy and allows to apply inheritance mechanisms for a concise and efficient representation. Another standard relation is part-of. Most ontologies have a predefined set of relations such as WordNet [Fellbaum, 1998], others define the set of relations explicitly in the ontology itself [Mahesh and Nirenburg, 1995]. A second indicator of complexity is the granularity of the concepts. Some ontologies are language-dependent and use words as conceptual primitives, while others choose more complex situations/events as the basic building blocks.
Other dimensions for a classification of ontologies have been proposed. In [Fox and Gruninger, 1998] an ontology is defined as a vocabulary plus a specification of the meaning of this vocabulary. This view allows to distinguish ontologies based on the degree of formality in the specification of the meaning. Informal ontologies use natural language, semiformal ontologies provide weak axiomatizations such as taxonomies, and formal ontologies define the semantics of the vocabulary by a complete and sound axiomatization.
There is general agreement that every natural language processing application that seeks to represent and manipulate meanings of texts needs an ontology which serves as a semantic lexicon. Such ontologies are used to represent the text meaning in a language-independent form as well as to resolve ambiguities. Important applications in this area are machine translation and query formulation in natural language. Examples of ontologies which have mainly been developed for natural language processing include WordNet [Fellbaum, 1998], Mirokosmos [Mahesh and Nirenburg, 1995], and SENSUS [Knight and Luk, 1994].
There are several possibilities to enhance information retrieval with a lexical ontology such as WordNet. Most simply, an ontology allows to expand the query to synonyms, hyponyms, and/or semantically related concepts. Another approach is to perform content-matching of queries and documents, which requires to index the documents and the queries with concepts rather than words. A combination of both approaches indexes queries and documents by words but measures the similarity between queries and documents in terms of semantic similarity between the underlying concepts. An example in this field is OntoSeek [Guarino et al., 1998], a system especially designed for the retrieval of resource descriptions (description of products, capabilities, etc.) from the WWW. OntoSeek uses the SENSUS ontology to represent both user queries and resource descriptions. FindUR [McGuinness, 1998] at AT&T is another system for knowledge-enhanced search, which explicitly uses background knowledge organized in ontologies. It has been shown that FindUR considerably increases recall and precision, and that query formulation becomes easier. As final example we mention the (KA) initiative [Benjamins and Fensel, 1998], which focuses on a distributed development of an ontology that models the knowledge acquisition community (its researchers, topics, products, etc.). The resulting ontology is mainly intended for improving the retrieval of relevant information from the WWW 2.
There are many different approaches which use ontologies in database and information systems, e.g. [Benjamins et al., 1998,Bergamschi et al., 1998,Wiederhold and Genesereth, 1997]. The main issue in this field is the technology of so-called mediation services [Wiederhold and Genesereth, 1997]. Mediators link data resources and application programs by accessing and retrieving relevant data from multiple heterogeneous resources, transforming these data into a common representation and transmitting this knowledge to application programs. In this application area we also mention the whole field of knowledge management in enterprises [Fox and Gruninger, 1998] and the emerging technology of organizational memories, where ontologies play a crucial role for structuring, sharing and reusing knowledge [Abecker et al., 1998].
In the field of knowledge engineering the use of ontologies allows to share and reuse knowledge across different applications and domains, hence the development costs could be reduced. Moreover, knowledge-based systems which are based on ontologies are easier to integrate in distributed environments, e.g. the integration of decision support systems into medical record systems. As an example we refer to [Swartout et al., 1996], where the SENSUS ontology has been used to build a new domain ontology. SENSUS serves as a broad coverage, skeletal ontology to which domain-specific terms are linked. The advantage is that building the new ontology is less expensive than building it from scratch, and that ontologies built in this way share the same structure.
Terminology is a relatively young, interdisciplinary research field which has its roots in linguistics and cognitive science. A tentative definition is provided by [Sager, 1994]:
A theory concerned with those aspects of the nature and the functions of language which permit the efficient representation and transmission of items of knowledge in all their complexity of concepts and conceptual relationships.Terminology science studies the representation of knowledge mainly through linguistic sign systems. The core objects underlying terminology research are terms and concepts. Concepts are pieces of knowledge which help humans in cognitive processes to model the world. The importance of concepts is stressed by Wüster [Wüster, 1991], the father of modern terminology, who states that concepts and concept systems are the starting point for any terminological work. Terms are lexical units which denote concepts. An important characteristic of terms is their usage in special languages tailored for specific domains, where terms are used to express highly specialized knowledge in a concise and unambiguous form.
Precise and appropriate terminologies provide important facilities for human communication and are becoming more and more important for several reasons. First, there is an overall increase of knowledge in all fields as well as a growing need for information dissemination which is not limited to specialists but involves non-experts. The overall dissemination of knowledge involves the translation of texts into various languages, where terminological considerations play a crucial role. In fact, bilingual glossaries are an extremely valuable aid for technical translators. Second, the computer as a powerful information processing tool is becoming an important communication partner. Computers are involved in more and more systems we interact with and perform complex and responsible tasks such as monitoring or decision making.
Descriptive terminology is the predominant form of terminological work and has the objective to collect and describe all terms in a specific subject field. The acquisition of terminological data from written and/or spoken language material is a very time-consuming, difficult and error-prone task. Traditionally, descriptive terminology starts from a collection of printed documents which humans scan manually for relevant information. Recent advances in text processing research have favored an increased interest in studying automatic methods for corpus exploration. Modern corpus-based terminology extends the traditional approach at least in two directions. First, a corpus is referred to as a large collection of language material in machine-readable form. The second extension concerns the use of sophisticated computer programs to explore the corpus for terminologically relevant information.
The new form of corpus-based terminology acquisition improves the quality of terminological research and its output by opening new doors for empirical investigations: exhaustive search for new terms, decreased risk of errors or overlooking, possibility to provide more contextual information to the user by providing direct links between the terms in a term bank and the corpus, etc. While a completely automatic term extraction is not realistic with today's technology, tools produce candidate lists for post-editing by human experts, which is a clear improvement over manual scanning of the entire corpus.
A terminology management system should provide tools which aid humans to trace the life cycle of terms, i.e. to acquire, maintain, modify, and disseminate terminological information [Ahmad, 1994]. A central part of any terminology management system is the representation of terminological information in a so-called term bank.
The information stored in terminological databases can roughly be categorized in concept-related information (e.g. definition and classification), administrative data (e.g. responsibility and date), terms and term-related information (e.g. grammatical and lexical information), and concept-related descriptive elements (e.g. notes, cross-references, and bibliographical information). Most term banks in use today adopt a relative simple data model and store the information in a record structure or tabular form, which is not an adequate model to represent all of the above information. In particular, term banks do not provide sufficient mechanisms to represent, maintain, and reason about conceptual knowledge in an explicit way, rather, such information is represented as free text in natural language. This situation reflects the traditional view of terminology science, where conceptual knowledge is described by means of a definition or an explanation in natural language possibly extended with graphic or other nonverbal representation forms [Galinski and Picht, 1997]. A terminology represented in this way is necessarily limited to human users, and providing computational support for the maintenance of and the navigation through the database becomes difficult. A typical example is TRADOS' term bank MultiTerm'95 Plus, which is by far the most frequently used database in practical terminology.
In this section we first compare ontologies and terminologies. Then we show how the KBS hyperbook [Fröhlich et al., 1997,Henze and Nejdl, 1999,Nejdl and Wolpers, 1999] can be used to represent terminological information and to access a traditional term bank by providing a more user-friendly interface.
From the brief introduction into ontologies and terminologies in the previous two sections meeting points as well as differences between the two fields become obvious. To begin with the common interests, ontologies and terminologies serve the same purpose to provide a shared conceptualization about a specific part of the world to different users in order to facilitate an efficient communication of complex knowledge. Both disciplines are based on concept systems representing highly complex knowledge independent of any language.
Concerning the differences, we found at least four aspects worth to be discussed in more detail:
For a few years by now, the European Academy Bolzano/Bozen has been working on an Italian/German legal and administrative terminology for South Tyrol. The acquired terminological information is stored in the term bank Bluterm at http://www2.eurac.edu/.
Bluterm, which uses TRADOS' terminology management system MultiTerm'95 Plus, applies the concept-oriented approach to terminology management and represents one concept and all corresponding information in a single database entry (see the Bluterm Database on the Web), and the result for a search for the concept ``Gesellschaft mit beschränkter Haftung'' (limited company, Ltd). In the multilingual case such as in our example, the entry contains all terms and synonyms in the relevant languages. We have three German and two Italian terms (``Gesellschaft mit beschränkter Haftung'', ``GmbH'', ``GesmbH'', ``società a responsabilità limitata'', and ``s.r.l.''). One of the most important pieces of information is the term definition which is given in natural language. The definition relates the concept to other concepts and determines specific characteristics. For example, the first definition specifies ``Gesellschaft mit beschränkter Haftung'' as a specific type of ``Kapitalgesellschaft''. The attribute ``Fachgebiet'' specifies the subject field in which the term has the defined meaning. Very useful for translators are context examples as well as grammatical information, e.g. the German term ``GmbH'' is a feminine noun and an abbreviation. Another important piece of information is the source of the term/definition/context.
This type of terminological database has several shortcomings, which are mainly a consequence of the simple data model and of implicitly storing most information, including conceptual knowledge, in natural language. First, the usage of such terminologies is limited to humans. Second, conceptual information cannot be accessed to support browsing through the term bank. The only way to retrieve terminological information is sequential in alphabetical order or by searching for specific terms; there is no way to access conceptually related terms. Finally, there are no mechanisms to perform consistency checks when adding new entries to the database.
The KBS Hyperbook system is a system for structuring hypertext collections based on ontologies. These ontologies are expressed in the modeling language O-Telos [Mylopoulos et al., 1990], which is an object-oriented design language with additional deductive rules and constraints. These ontologies are used as meta data, which structure and connect external data (like text referenced by file ids, URLs, etc.) This approach has some relationship to semantic modeling approaches for hypertext (such as RMM [Isakowitz et al., 1995]), but generalizes them by decoupling meta data/ontologies from data/document units referencing actual data (comparable to the idea of indexing discussed in [Niederée et al., 1998]).
A very general representation ontology is used for displaying the units and their relationships described in the hyperbook system. This ontology views the book only as a set of concepts and relations between them, where each concept is described by one or more attributes. Each concept is visualized as shown at http://www.kbs.uni-hannover.de/hyperbook/, where the data associated with each concept attribute are displayed in one browser frame as the right page of the book and the relations are displayed in another frame as the left page of the book.
The basic hyperbook units (figure 1) consist of modeling abstractions similar to ER modeling notations (concepts, relations, and attributes), presentation abstractions (link, index, trail, choice, view) comparable to the ones used in RMM and a taxonomy of data objects (related to the notion of index entries described in [Niederée et al., 1998]). All of these abstractions are stored in the KBS Hyperbook metadata repository, while the actual data (referenced by the data objects) are stored as either files, URLs, etc.
In the following we take about two dozen concepts concentrating on companies, their functions, and constituting organs from the Bluterm database. We use these concepts to show how the KBS Hyperbook system explicitly represents conceptual knowledge and provides an interface for a content-based navigation through Bluterm.
In a first step the various concept classes and conceptual relationships are analyzed. The result of this analysis is represented in a meta-model, which specifies the structure of the domain knowledge and imposes appropriate constraints between various classes of concepts (see figure 2). The top node of the model stands for the class of all concepts. Concepts might be related to other concepts by the relation ``Oberbegriff'', which is the basis for building concept taxonomies at the instance level. The class of all concepts can be divided in three more specific concept classes: concepts belonging to ``Gesellschaft'', ``Organ'', and ``Funktion'' respectively. This specialization is indicated by the is-a relationship at the meta-level. The three specific concept classes are related as follows: companies have functions and companies are constituted of organs.
Now we can take the Bluterm concepts and instantiate the meta-model. The result of this step is an explicit representation of conceptual knowledge as shown in figure 3. There are three separate taxonomies which are indicated by differently colored nodes.
The conceptual information in figure 3 can directly be represented in the KBS Hyperbook system and it can be used to provide a user-friendly interface to Bluterm. Our Bluterm Prototype on the WWW shows the access to the concept ``Gesellschaft mit beschränkter Haftung''. The left-hand side presents conceptual knowledge which is explicitly stored in the hyperbook: ``Gesellschaft mit beschränkter Haftung'' has the more specific concept ``Einmann GmbH'', is a specialization of ``Kapitalgesellschaft'', has four related concepts at the same specialization level, has the function ``Gewinnzweck'', etc. The right-hand side shows the original Bluterm entry, which is retrieved directly from the term bank at the European Academy Bolzano/Bozen.
For terminology science, the main lesson to be learned from ontologies is the use of powerful knowledge representation systems to explicitly formalize conceptual knowledge. The above-mentioned example shows two advantages of this approach. First, the KBS hyperbook system as a meta-data repository allows a content-based navigation through the term bank Bluterm, which is clearly more user-friendly than a sequential data access or a string search. Second, the explicit visualization of conceptual relationships on the left-hand side of the hyperbook allows a faster comprehension of the subject field, which is of particular interest for non-experts.
Another benefit resulting from a formal knowledge representation approach concerns the possibility to perform various types of consistency checks, a fundamental task in any large knowledge base including term banks. In our example, the meta-model in figure 2 allows to detect invalid relationships at the instance level. Another test concerns the completeness of the term definitions in natural language, where the explicitly formalized relations should be reflected. Finally, circular definitions and the reference to unknown terms can be avoided.
While knowledge represented in natural language is limited to human use, formally represented knowledge can be used by artificial agents as well. The integration of conceptual knowledge is a promising approach in the development of smart information systems. The possibility to exploit the vast amount of knowledge concentrated in term banks by artificial agents is both desirable and possible. Hence, representing terminological information in a formal way makes term banks a valuable resource for a new group of users.
In the other direction, ontologies mainly benefit from the exploration of natural language to refer to conceptual information in terminology science. While unique labels suffice to identify objects in computer systems, meaningful natural language terms are valuable for the provision of user-friendly interfaces and, more generally, for the communication of information between humans and computers. This is a very important aspect to improve the acceptance of computer systems. The importance of enhancing ontologies with language-specific, terminological information is also obvious in knowledge-enhanced information retrieval, where user queries are expanded to synonyms, hypernyms, hyponyms and/or coordinated terms.
The use of knowledge representation techniques in terminology management systems has been stressed by other researchers in the past, e.g. [Fischer et al., 1996,Gillam and Ahmad, 1996,Meyer et al., 1992]. The main criticism of traditional term banks concerns their lack of expressive mechanisms to represent, maintain, and reason about complex knowledge in an explicit form.
The main aim of this paper was to analyze ontologies and terminologies in order to find out common interests as well as differences between these two disciplines. Both ontologies and terminologies serve the same purpose, namely to provide a shared conceptualization of a part of the world in order to support an efficient and economical communication of knowledge. Ontologies define the semantics of the vocabulary used to refer to the concepts in a formal way, while terminologies do this in natural language. On the other side, terminologies consider the linguistic part and provide all synonymous terms for the concepts including abbreviations, grammatical information, and context examples, which is not the case in ontologies.
We argued that an integration of ideas and technology from terminologies and ontologies would lead to benefits for both communities. The terminology community benefits from the use of formal methods for knowledge representation, which facilitates the representation, maintenance, and dissemination of terminological data and makes these data reusable by computer systems in various applications. Some of these benefits have been shown in a practical example, where a hyperbook system has been adopted to represent terminological information and to provide an interface for a content-based navigation through a traditional term bank. Ontologies benefit from the natural language component in terminologies, which is crucial for user-interfaces and for information retrieval in the WWW.
To conclude, the main message of this paper is that the combining of ontologies and terminologies leads to a powerful resource to support a flexible, concise and efficient knowledge transfer in a communication network which involves humans and computers.
This document was generated using the LaTeX2HTML translator Version 99.1 beta (March 8, 1999)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation paper_2
The translation was initiated by Wolfgang Nejdl on 1999-04-27