Structure and Usage of Lexicons for Storage of Domain Models

This project is part of the LIKE / LICS umbrella project that pursues effective application of the existing knowledge of natural language analysis to the analysis and design of information systems.

Background

Recent developments in information science indicate that a structured, well-defined NL feedback in addition to the usual graphical representations of (sub)domains enhances the mutual understanding between users and analysts significantly. Especially the NIAM specification method, developed bij Nijssen and Halpin, makes extensive usage of NL during the first stages of the analysis. However, as soon as the graphical representation of the (sub)domain has been created, this NL feedback will be lost for the best part, since NIAM does not provide for the capture of all "hidden" semantics in NL.

Target

My project aims at specifically capturing and storing the structured NL sentences that were used for the NIAM schematics, storing them in a Lexicon together with pre-existing words and sentences, and keeping this information during the course of the whole project. The Lexicon is not a simple store for NL utterances. It contains all information of the equivalent NIAM diagrams, so that it can be used to create NL representations of these diagrams. Additionally, the Lexicon defines all used words in terms of other, more common words, so that it facilitates mutual understanding between different semantical communities. As such, it embraces the data dictionary often found in larger organisations. Because the Lexicon is formal, it can also be queried, and used to detect inconsistencies and overlaps between concepts in an early stage of the analysis.

I try not to tie myself to any formal representation language that needs to be learned in order to work with the Lexicon. This rules out directly using any of the currently available linguistic models, since they all are too complicated and often too expressive. I do not try to capture all underlying semantics of an utterance, since in my opinion much of the knowledge in language is implicitly available in human minds already. I only look for definitions of knowledge that differs between humans. The Lexicon therefore aims at a clear structure that can be readily understood, although not necessarily produced, by the domain specialists, so that they can help the analyst directly with the building of their domain model.

The Lexicon

Basic building blocks of the Lexicon are concepts, facts, lexicals, and names, all of which can be found in NIAM diagrams. Additionally, the Lexicon provides for a feature inheritance mechanism, which cuts out overlap between concepts and facts. Standard NIAM constraints are included, as well as non-standard constraints that NIAM cannot graphically represent. For the storage of constraints and triggers, the Lexicon uses a subset of the linguistic theory of Functional Grammar, which is called CPL. CPL itself is too complex for direct usage, but since it is based on an NL formalism, CPL can easily be mapped to NL for direct feedback.

The Lexicon interface will provide some predefined dialog box-like property menus, that give a clear overview of all the possible features certain words can have, and manipulating this interface will immediately give NL feedback so that the user can view the effect of his/her actions on the Lexicon in plain NL sentences, in his/her own terminology. This interface will shield every user from the formal internals of the Lexicon, but allow rapid entry of new concepts and facts, tuning of their properties, and consistency checks on the contents of the Lexicon.

More advanced users can use the full advantage of the formal internal representations, and use the Lexicon to generate precise information about the structure of the users' domain.

Future Plans

We have developed a formal model for the storage of the necessary entities in the Lexicon, and are currently working on the NL feedback mechanisms. After this, we will include a pre-defined case in the Lexicon to review its expressiveness and to compare it with other projects going on within LIKE/LICS. A thorough comparison of the model as used in the Lexicon to standard ER diagrams, NIAM, and NORM (a recent development that extends NIAM to include subtyping) is scheduled for later this year. A side project, currently conducted at Tilburg University, aims at a prototype of the interface to evaluate its usefulness, especially in group meetings, where the domain specialists and information analysts meet and exchange their knowledge.

My Home Page

This page is always under construction

Jeroen Hoppenbrouwers