nomination for ICT

Project Name: WebKB-2

Synopsis
A better access to knowledge within the enterprise and on the WWW would increase productivity and sells. Typically, employees or clients would like to know what kinds of products, services or methods would answer a certain problem or need, what are their respective advantages, drawbacks, prices and locations/URLs, and what have been the experiences and feedbacks of their users. Such information must be semantically "organized" to permit its retrieval, by navigation or precise queries.

Automatically extracting and organizing precise/conceptual information from free documents is not yet possible. Hence, writing documents may be an easy way to store information but does not generally make it much retrievable.

Storing information in databases (or structured documents) permits some precise queries on it. However, a database can only store a small number of predefined kinds of information and the users must know the database schema to query or add information. Furthermore, the inter-connection of databases is a manual process.

On the other hand, a knowledge base (KB) permits the organized storage of all sorts of facts, rules and categories, and supports queries "by the content". However, representing knowledge is a difficult manual task that requires training. (And the inter-connection of KBs bases is also mainly a manual process).

To permit many persons to enter and share knowledge within a KB, the solution is first to initialize the KB with default knowledge and constraints, and then exploit this KB to guide and ease the entering of new knowledge, and cross-check the representations for correctness, precision, completeness or cooperation purposes.

Advantages/uniqueness
WebKB-2, our KB server has a number of unique advantages: (i) it can efficiently exploit very large KBs (it is implemented above the OODBMS FastDB), (ii) its default general KB is the biggest after CYC (e.g. the natural language ontology WordNet 1.7, the general KB TAP and many top-level ontologies have been corrected and integrated; the KB has currently more than 93,000 categories associated to more than 120,000 nouns or nominal expressions), (iii) it proposes high-level intuitive and expressive input/output notations that also encourage the adoption of our lexical/structural/semantic conventions which lead to easier-to-compare and more precise knowledge representations, (iv) it can exploit the whole KB to generate forms (menus) to guide and ease knowledge entering, (v) it can be used on-line by any person or automated agent (e.g. you can use it at www.webkb.org), (vi) its category naming scheme and its update/cooperation protocols makes it the only system permitting its users to update a same KB without lexical/semantic conflicts nor redundancies and without forcing the users to agree with each other, (vii) it has various query and navigation mechanisms that permit people to explore the KB easily and filter out the knowledge of certain users or kinds of users if necessary, (viii) it permits the use of knowledge representations to "index" any Web-accessible document element (e.g. word, paragraph, section or image of a document, or the whole document itself), and the results of queries may be knowledge representations and/or the document elements they index, (ix) it has a procedural language of commands that permits to combine assertions and queries, and write scripts to solve problems and/or generate documents.

Innovation
WebKB-2 is only 2.5 years old but inherits from our research during the development of WebKB-1 (3 years), and before that, CGKAT (3 years). Our main research subjects have been: (i) the development of a top-level ontology of concept types to organize the categories of WordNet and check their use, accompanied by an ontology of basic relation types sufficient for representing most natural language sentences in a normalized and explicit way, (ii) the design of lexical/structural/semantic conventions and high-level intuitive and expressive notations permitting to represent most natural language sentences in a normalized and explicit way, (iii) the development and implementation of algorithms to check, retrieve and present knowledge representations (and possibly, document elements associated to them), (iv) the above cited cooperation protocols.

As a KB server, WebKB-2 has the triple originality of being able to accomodate and benefit from large amounts of knowledge, a large number of users, and very expressive languages.

All the users share the same KB but every object in it (category, link between categories, definition, fact, ...) has an associated creator and can be semantically related to other objects by other users (the protocols prevent inconsistencies, redundancies and unauthorised removals). This "local" approach to the handling of multiple users encourages and eases knowledge re-use/cross-checking/cross-linking/combination (both from the user's and the developper's viewpoints). The other KB servers use a "global" approach, that is, the users either create different KBs (which therefore are loosely connected with each other and hence difficult to exploit together), or update a same KB as if they were the same user. It should also be noted that very few KB servers can handle large KBs (one of the exceptions is Parka-DB).

Quality
As opposed to other works, especially on less knowledge-oriented approaches, we have focused on the quality (consistency, precision and coverage) of the KB and of the knowledge representation notations. For example, one of the KB export options allows to generate a very modular and readable text file which, if no update has been done via the interface, is exactly the file that we used for initializing the KB. Any discrepancy between the two files highlights an error in the input parser, the export procedure or in the KB. This very fruitful test does not seem to be much used in other projects.

RDF and RDF/XML, i.e. respectively the model and linear notation designed by the W3C for the "Semantic Web", are a good example of a model and notation that are inadequate for knowledge representation and exchange: (i) they are low-level and complex (hence difficult to write and exploit, and leading different users to represent similar knowledge in many incomparable ways), (ii) poorly expressive (hence they arbitrarily limit what can be expressed and lead users to write ad-hoc knowledge representations), and (iii) permissive (e.g. link cycles and forward references and permitted, thus seriously reducing possibilities of semantic checks). WebKB-2 can use of RDF/XML as input/ouput formats but imposes minimal semantic constraints.

Recognition
We have detailed WebKB-2 and the knowledge oriented approach it supports, in three conference articles, one book chapter and one journal article that is still under review. We have detailed WebKB-1 and its more restricted but more document-related approach, in five conference articles and one journal article. During the last three years, we also published four other articles to propose and compare knowledge representation conventions and notations. Our earlier work on CGKAT, WordNet and top-level ontologies has also been published in seven articles and a PhD thesis. All these documents are accessible from http://www.phmartin.info/cv/cv.html#Publications. WebKB-1 and WebKB-2 are well known in two conmunities: the "Conceptual Structures" community (nine publications) and the "Semantic Web" community (four publications), and referred from several sites in these communities.

Company/Organisation/Personal Profile
The DSTC (Distributed Systems Technology Centre; www.dstc.edu.au) is a National IT Research and Development Centre in Australia that focuses on the needs of the Government, Defence, Health, Telecommunications, Finance and Education Sectors.

Created in 1992, the DSTC now has over 100 experienced scientists and an international standing as a centre of excellence in distributed systems: (i) four IT&T Awards for research excellence; (ii) over 300 international conference papers, journal articles and books; (iii) high standing on international standards bodies such as OMG, ISO, MPEG, IETF, and W3C (it is the Australian Office of the World Wide Web Consortium; see http://www.w3c.dstc.edu.au); (iv) over 12,000 customers have attended DSTC seminars, workshop and conferences; (v) its technology is used world wide and its commercial revenue flowing from research and development has already exceeded $10 million.

The DSTC has launched two spin-off companies: (i) Wedgetail Pty Ltd (2001), a leading provider of security products for mobile and embedded devices, with offices in Brisbane, Sydney and San Francisco, (ii) ELVIN Pty Ltd (2002), a potential leading provider of content based messaging and notification, wich has signed significant licensing deals with major international companies.

The DSTC currently focuses its research in the following areas: Knowledge and Resource Management, Organisational Policies and Security, Enterprise Processes and Work Practice Support, Enterprise Modelling, Component System Engineering.

Examples of products are: CosNotification (OMG CORBA service), MOF and XMI (OMG CORBA service), MetaSuite (metadata tools), FlowMake (workflow modelling tool), Breeze (component-based workflow engine) and WebKB-2 (large-scale knowledge-base server).

Potential
WebKB is particularly useful when information of various kinds need to be interlinked or compared to each other, and/or easily retrieved with a question-answering system. This is the case for some corporate memories and structured catalogs, or more generally, for "knowledge" repositories. WebKB-2 is suited as a support for brokering systems that must permit people to provide information, comment on it or refine it. As a support for national/domain Yellow-Pages where people and firms would represent their products or services, or comment on products they have used, WebKB-2 would be a particularly useful tool to store and organize information, and permit people to find and compare products relevant to their needs.