General documentation of WebKB-2


Just a warning before you use WebKB-2: please avoid aborting queries (for example by using the buttons "stop" or "back" of your browser or by exiting it, or by double-clicking the reload button) as this might not kill the called process and might keep the knowledge base locked (for 8 seconds if the lock can be automatically removed), thus temporarily preventing further requests. If that happens, use the following hyperlink:  click here if WebKB-2 does not answer anymore, even to simple queries.

To get an idea of how you can enter and exploit knowledge with WebKB-2 (i.e. to know what you can do apart from simply using the search/update interfaces), read Section 1 then read and test
- syntactic examples: adding, removing or searching categories and statements;
- ontological examples: representing states, processes, descriptions, characteristics, time, space, ...;
- more real-life examples.


Table of contents

  1. Structures and terminology
  2. How the KB has been initialized
    1. Reuse of WordNet
    2. Top-level ontology
  3. How to enter/update knowledge
    1. Use files to test/enter/store your knowledge
    2. Lexical recommendations: use English singular nouns
    3. Logical/semantic recommendations: be precise, contextualize, re-use and complement


Separate documents complement this introduction to WebKB-2:

  1. A long article/documentation on WebKB-2   (its sections 3, 4 and 5 are referred below)
  2. Grammars of the commands or languages of WebKB-2       (underlying data model here)
  3. Various example files
  4. Rationales of the lexical and semantic recommendations
  5. Description and rationales of the top-level ontology  (rather outdated now)
  6. My publications on WebKB-2 (from 2001) and WebKB-1.


1.  Structures and terminology

User. Person or program using WebKB-2. To be allowed to add or remove knowledge in the shared Knowledge Base of WebKB-2, a user must provide a user identifier, a password and, the first time, a URL or e-mail address.

Knowledge Base (KB). A KB is composed of a set of categories and a set of statements (e.g. facts or hypothesis) expressed using the category identifiers. WebKB-2 permits its users to have their personal KB or to share a single KB. Statements may be for example used for specifying links between categories, e.g. specialization links. The shared KB has been initialized with the content of the lexical database WordNet 1.7: 108,000 nouns and 74,500 categories referred by nouns (in accordance with our lexical conventions, we ignored information regarding verbs, adverbs and adjectives).

Ontology. The "ontology" part of a KB is the list of its categories plus the statements giving some formal meaning to these categories (i.e some of their characteristics or inter-relations, rules or constraints of use). These statements permit some semantic checking on the manual introduction of new statements and some logical inferencing to generate new statements, e.g. for information retrieval purposes. These statements may be partial/complete definitions for these categories (i.e. necessary and/or sufficient conditions for being instance of these categories) or statements that use them with a universal quantifier. Links between categories are partial/complete definitions. As opposed to a database schema, the ontology part of a KB is accessible and updatable by the users, like any other part.
WordNet is a general lexical ontology (more details here). To help users representing knowledge, i.e. enter new categories and statements, and to permit some automatic checking of these statements, we also complemented the top-level categories of WordNet by a top-level ontology of about 100 concept types and 140 basic relation types (see next section).

Category. A category (or "formal term") is a reference to a particular kind of object (e.g. Animal, Justice, Eating), or a particular kind of relation between objects (e.g. Or, In, Until), or a particular object (e.g. USA, CIA, Clinton). In the first case, it is called a concept type, in the second a relation type, and in the third, an individual. Each category associated with a unique identifier that permits WebKB-2 to distinguish it from other categories, and one or several names.

Category name. A name is a word, or composition of words, that may have one or several meanings. Therefore, it can be associated to various categories: one for each meaning. Conversely, the names of a category share one meaning: they are synonyms in a certain context. Thanks to WordNet, we have initialized the KB with categories representing meanings of English nouns. Conforming to our recommendations for knowledge representation, retrieval and sharability, we have not included verbs, adverbs and categories for them, and we ask you to use only English nouns for naming categories (and, whenever possible, singular nouns).

Category identifier. Try a search (using an English noun) if you have not done so:
   
A category identifier may have the form of an e-mail address (e.g. spamOnly@phmartin.info) or an absolute URL. However, more generally, it is composed of an identifier for the user that has created the category, and a "key name" given by the user to distinguish it from other categories s/he has created. For instance, in the identifier pm#IR_system, "pm" is the identifier of the creator and "IR_system" the key name chosen by the creator to refer to this category. "IR_system" may be used by another user, say "joe", as a key name for another category: joe#IR_system. However, these two categories are distinct. Assuming that pm#IR_system has been created first and that "joe" is a responsible user following our guidelines about sharability and reuse, we can conclude that the two categories represent different kinds of objects.
A category identifier may include not just the "key name" but other names that the creator considers synonyms in a certain context. Examples: the concept type pm#IR_system__information_retrieval_system__document_retrieval_system , the relation type pm#implication__then__therefore, and the individual pm#Venus__morning_star__evening_star. Adding these other names into the category identifier is a way to specify the association between the category and the names, in addition to refer to the category. These other names are/should be ordered by frequency of use. The key name is therefore repeated if it is not the most common name for referring to what the category represents, e.g. wn#domestic_dog__dog__domestic_dog__Canis_familiaris. "wn" means that the category comes from WordNet. Given that more than 95% of categories in the shared KB come from WordNet, "wn" has been made optional. Thus, #domestic_dog represents the same category.

Statement. A statement is entered by a user to represent a fact, an opinion, a rule, etc. It is expressed using a specialized language or notation in order to reduce ambiguities and support automatic treatment. To permit the writing and combination of statements or queries, WebKB-2 proposes the FS language of commands which includes several sub-languages.
To highligh the fact that a representation is (or can be converted into) a graph (i.e. quantified typed concept nodes connected by typed relations), using a graph-based notation such as FCG or CGIF, a statement can also, by extension, be called a "graph". Similarly, to highlight the fact that a representation is (or can be converted into) a logic formula, a statement can also, by extension, be called a "formula" (or a "sentence").

Query. The FS language permits various kinds of search or removal on categories or statements. For example, a user may remove a category or statement s/he has created, or search for the specializations of any graph or statement. The interfaces are intended to ease the use of the FS language or make this use transparent.

Command. Statement, query or control command. A group of commands (or "script") is also considered a command. Commands may be manually entered via the provided interfaces. They may also be directly and programatically sent to the WebKB-2 CGI servers used by the interfaces. Both the GET and POST protocols may be used to send commands and other parameters to these servers. The interfaces show both ways: the text fields next to the "submit" buttons show how parameters are encoded with the GET protocol but, when a separate window is generated for showing the results, the POST protocol is actually used (see the HTML/Javascript sources of the interfaces for more details).

The FS language of commands. In FS ("For Structuration"), sequential commands are separated by semicolons. FS is composed of several kinds of commands (or sub-languages): FC ("For Control") provides parsing control commands (e.g. "if", "while", "load", "load mode", "use names", "default creators:"), FL ("For Links") provides a simple notation to declare, interlink and search categories, and FCG ("Frame-CGs" or "For Conceptual Graphs") provides an expressive and concise notation to express statements or queries. Other sub-languages will be added: FE ("Formalized English" or "For English") which is an English-like version of FCG. (These sub-languages are described in a separate document).
Finally, we may add other alternatives to FCG that are less readable or expressive but more common, e.g. CGLF (Conceptual Graph Linear Format), CGIF (Conceptual Graph Interchange Format) and RDF (Resource Description Format).

Concept node (or "concept"). A graph (statement) is composed of 1 or more concepts connected by relations. A concept is mainly composed of a category and a quantifier. See the FCG language for more details. An existentially quantified concept asserts the existence of (an individual of) the category. Examples in FCG: [a #cat] (there exists at least one individual of type #cat) and [the #cat pm#Tom] (there exists a particular individual of type #cat identified by pm#Tom).

Relation. To normalize knowledge representations, most relations in the ontology of WebKB-2 are binary (i.e. they connect two concept nodes). All relations are oriented. Here are some FCG examples using the relation pm#part (we use names instead of identifiers in these examples): [a cat, part: 3 legs] (there is at least 1 cat which has exactly 3 legs), [most cats, part: 4 legs] (most cats have exactly 4 legs), [any rectangle, part: 4 sides] (any rectangle has exactly 4 sides). These last two examples are universally quantified graphs/statements: they use the universal quantifiers most and any. (Note: in traditional logic terminology, "universal quantifier" only refers to "any").

Link. In matters related to WebKB, the word "link" is NEVER used to refer to an hyperlink, it is used to refer to a relation connecting two categories or a category and a string. In WebKB-2, links are oriented and a special notation is proposed to describe them, the FL ("For Links") notation. In FL, a link L from a category X to a category/string Y may be represented under the form "X L Y;", L being restricted to 1 character. The main links with their reserved characters in FL are: subtype (<), instance (:), exclusion (!), inverse (-), equal (=), location (l), member (m), substance (s), spatial_part/subprocess (p), object (o), url (u) and tool/technique (t). The characters for their inverse links are respectively  >, ^, !, -, =, L, M, S, U and T.  More details are given in the following paragraphs.

Name link. As explained above, the creator of a category may specify several names for it using the "__"separator. However, if this user wants to specify a linguistic context (or "community") for the link, or if another user wants to add a name, the name link ('_') has to be used. Example: #object  _ objet (pm French); ("pm" asserts that "objet" is a name for #object for the French community).
Note that pm#object__objet is equivalent to pm#object _ chose (pm).
The FL grammar permits any link to be contextualized by a creator, a community and a creation date. However, the WebKB-2 CGI scripts automatically specify the creator and creation date, and forbid the users (from the Web) to specify a creation date and a creator identifier that is not their user identifier. (There is no such restriction for WebKB-2 applications called from the command line on the server machine).

Instance link. If an instance link goes from a category A to a category B, then A is a "type" (or a "class"), B is an "instance of A" (or an object "of type A") and must have all the characteristics associated to A via the use of the quantifier "any". For example, if you have asserted that "Tom is an instance of cat" and that "any cat has 4 legs", then Tom cannot happen to have 3 legs. If this is the case, at least one of the two assertions is false. Instance links are not transitive, e.g. "Tom is a cat", "Cat is a class" but "Tom is not a class". In WebKB-2, instance links and their associated inverses (the "type" links) may be represented:
- using the character ':' as in  #domestic_cat : pm#Tom (oc);
- using the character '^' as in  pm#Tom ^ #domestic_cat (oc);
- perhaps in the future, using a relation of type pm#kind  as in  [pm#Tom, pm#kind: #true_cat](oc).
In these examples, (oc) means that oc is the creator of the link. If pm had created these links, he could have simply written:  pm#Tom ^ #domestic_cat;  and  #domestic_cat : pm#Tom; because the creator of the link does not need to be specified if it is the same as the creator of the source category, or if the source category is a WordNet category. For readability reasons, WebKB-2 always uses these two rules when presenting links.

Subtype link. If a subtype link goes from a category A to a category B, then any instance of B has to be an instance of A. For example, if "a cat is a mammal" and "Tom is a cat", then "Tom is a mammal". Subtype links are transitive, e.g. "Siamese cats are cats", "cats are mammals" and "Siamese cats are mammals". In WebKB-2, subtype links and their associated inverses (the "supertype" links) may be represented:
- using the character '>' as in  #mammal > #Siamese_cat;
- using the character '<' as in  #Siamese_cat < #mammal;
- using a relation of type  pm#kind  as in  [any #Siamese_cat, pm#kind: #mammal];
- perhaps in the future, using the relation of type  pm#supertype as in  [#Siamese_cat, pm#supertype: #mammal]

Exclusion link. If an exclusion link goes from a category A to a category B (or B to A since exclusion links are symmetric), then A and B cannot have common subtypes or instances, they are called "exclusive types". For example, here are pairs of exclusive types: pm#state and pm#process, pm#spatial_entity and pm#nonspatial_entity, pm#thing and pm#relation. In WebKB-2, exclusion links may be represented:
- using the character '!' as in   pm#state ! pm#process;
- perhaps in the future, using a relation of type  pm#exclusive_class  as in [pm#state, pm#exclusive_class: pm#process].
Exclusion links between types that have a same direct supertype may also be represented with a "subtype partition". If the partition is "closed", i.e. if no other type can belong to the partition, then the construct {(...)} must be used, otherwise the construct {...} must be used; example: #color > {(#chromatic_color #achromatic_color)} (pm)  {#red #yellow pm#blue (oc)} (pm);
The (pm) after the partitions means that their creator is pm and therefore that the exclusion links have this creator. In this example, the subtype links are from WordNet, except the subtype link from #color to pm#blue which has been created by oc.

Closed exclusion link. This link is a specialization of the exclusion link and means that either the two linked types form a closed (complete) subtype partition for some type, or they have no supertype and are respectively identical to pm#thing and pm#nothing (in this last case, this link is a complementOf link as described in DAML+OIL: if a complementOf link goes from a category A to a category B (or B to A), then A and B are exclusive and any object is either of type A or of type B). In WebKB-2, closed exclusion links may be represented:
- using the character '/' as in   pm#entity / pm#situation;
- perhaps in the future, using a relation of type  pm#closed_exclusion  as in [pm#entity, pm#closed_exclusion: pm#situation].

Inverse (or reverse) link. If an inverse link goes from a relation type R1 to a relation type R2, x R1 y implies y R2 x and conversely. For example, pm#instance - rdf#type. (However, the search and consistency mechanisms implemented in WebKB-2 does not yet take inverse links into account). An inverse link may only join relation types. In WebKB-2, inverse links may be represented:
- using the character '-' as in  pm#instance - rdf#type;
- perhaps in the future, using a relation of type  pm#inverse  as in [pm#instance#state, pm#inverse: rdf#type].

Equal link. If an equal link goes from a category A to a category B, then the two categories are identical. This is useful to represent that categories coming from different ontologies actually represent the same thing(s). However, whenever possible, it is preferable to avoid using this link and instead re-use the identifiers of the categories already existing in the KB and add new names to them. In WebKB-2, equivalent categories are implemented in the following way: all but one (the first entered) are marked as "artificial" and have no other link than equal links (if they had links, these links were transfered to the first category). Then, when they must be handled, the WebKB-2 procedures follow their equal links and instead use the first category. In WebKB-2, equal links may be represented:
- using the character '=' as in  pm#inverse = owl#inverse_of;
- perhaps in the future, using a relation of type  pm#equal  as in [pm#inverse, pm#equal: owl#inverse_of].

Closely similar link. WordNet has lots of such links between categories representing the meaning of adjectives. However, they are not currently included in the ontology of WebKB-2. Currently, '~' links (which may be described as "closely similar from an Information Retrieval viewpoint") are only used between categories for Greek gods and their Roman counterparts (e.g. #Clotho and #Nona), and between some types from the 3D (endurantist) approach and their counterparts from the 4D (perdurantist) approach or the ?D (vague/unspecified) approach).

Content-oriented links. The above links are often included in language ontologies (e.g. OWL). Lexical ontologies such as WordNet also use other links such as member, part and object. We have distinguished the WordNet part links into more precise ones and added other content-oriented links. Here are their abbreviations in FL: m (member), l (location), s (substance), p (spatial/physical part, subprocess or subdomain depending on the connected categories), u (url), o (state/process object/experiencer or domain object, depending on the connected categories), t (tool/technique/instrument), a (state/process agent, i.e. its do-er), i (process input), r (process result/consequence). Uppercases are used to refer to the inverses of these links. To specify the semantics of these content-oriented links, cardinalities should be used. For example, #body p #arm [1,*]; means that an animal body may or may not have many arms, and each arm belongs to only one body. Instead of [1,*], we could have written [1..1,0..*] (as an equivalent) or more restrictively, [1,0..2]. In the absence of cardinalities, we have chosen the following interpretation (since it often works in WordNet): a link of the form "X L Y", with L being a lowercase character in FL, means that X (or actually, if X is a type, most of its instances) has a relation of type L to Y (or actually, if Y is a type, some individual of type Y). For example, #bird p #wing means that "a bird may have for part a wing". In FCG, this can be written as [a #bird, may have for pm#spatial_part: a #wing].



2.  How the KB has been initialized

2.1.  Reuse of WordNet

Click here for details on this integration of WordNet.


2.2.  Top-level ontology

Below is the description in FCG of important top-level concept types with some relations that can be associated to their instances. (Since all theses categories used have been created by the user "pm", the prefix "pm#" has been left implicit here). This description provides a synthetic view of the ontological model we propose. The ontological recommendations given in Section 3.4 come from this model. Click here if you want a more detailed explanation of this model (note: RDF lexical conventions have been used in that explanation).
An indented list of all our top-level concept types can be obtained by asking for the "pm" subtypes of pm#thing in the "Category search" tool (or simply click here) and an indented list of all our relation types can be obtained by asking for the subtypes of pm#relation (or simply click here).
Click here for examples using this model.


[any situation,                 //any situation (state or process)
   place     : a spatial_entity,//  happens at a place (even an imaginary one)
   time      : a time_measure,  //  happens at a time
   duration <= a time_measure,  //  may have a duration (events are processes
                                //                   considered instantaneous)
   from_time : a time_measure,  //  has a beginning
   until_time: a time_measure,  //  has an end
   later_situation: a situation,//  follows (at least) another situation
   result     <= a thing,       //  may have a result, ...
   experiencer<= a conscious_agent,   recipient  <= an agent,
   agent      <= an entity,           initiator  <= a goal_directed_agent,
   instrument <= an entity,           object     <= a thing
]
[any process, //(e.g. an action, a problem solving process, an event)
   triggering_event<= an event,       ending_event  <= an event,
   ending          <= a state,        ending of     <= a state,
   precondition    <= a state,        postcondition <= a state,
   sub_process<= a process,           purpose    <= a situation,
   method     <= a description,       to_place   <= a spatial_entity,
   via_place  <= a spatial_entity,    from_place <= a spatial_entity
]
[any description,                     //any statement ("proposition" in logic)
   description_object of : a thing,   //  may be connected to what it describes
   description_instrument: a description_medium, //(e.g. a symbol, a language)
   description_container : a container_of_description, //(e.g. a file, a video)
   author        : 1 causal_entity,   //  has a unique author
   believer     <= a cognitive_agent, //  may have one or several believers
   modality     <= a modality,        //  may be contextualized
   logical_relation      <=a description,//(e.g. "implication", "or")
   rhetorical_relation   <=a description //(e.g. "opposition")
   argumentation_relation<=a description //(e.g. "proof", "contradiction")
]
[any spatial_entity, //(e.g. a point, an area, a volume, a physical_entity)
   on_location  <= a spatial_entity,  above_location   <= a spatial_entity,
   in_location  <= a spatial_entity,  interior_location<= a spatial_entity,
   out_location <= a spatial_entity,  exterior_location<= a spatial_entity,
   near_location<= a spatial_entity,  before_location  <= a spatial_entity   
]
[any collection, //(e.g. a bag, a set, a sequence, a social_group)
   size: a number,            member <= a thing, 
   minimal_size <= a number,  subcollection <= a collection
   maximal_size <= a number,  overlapping_collection <= a collection,
   average <= a number,       collection_complement  <= a collection
]



3.  How to enter/update knowledge

We now assume that the documentation on FC, FL and FCG has been read.

3.1.  Use files to test/enter/store your knowledge

Although you may enter knowledge directly through the interfaces, it is easier to edit, test, re-organize or re-use your knowledge representations or scripts of commands if you put them into one or several files that you run/load into WebKB-2 via the commands run or load (with a URL as parameter). Entering knowledge is like programming. Text editors and textual notations offer much more possibilities and ease-of-use than any graphical editor can offer. They permit synthetic/consise views, modularity/structuration/versionning, easy update, search and replace, etc. They also provide backups and templates. File inclusions may be done via the command include. When testing, put the command  no storage; at the beginning of your files. WebKB-2 will execute the commands but the assertions and modifications will not be commited when the interpretation of the file is finished. Thus, you won't have to remove categories, links or graphs from the shared KB if you want to make modifications. It also helps to reduce the pollution of the KB with incorrect or temporary knowledge. When you are satisfied with your representations, remove the  no storage; command, upload the file(s) one last time, and add it to your list of "committed" files. WebKB-2 will issue a warning message whenever you try to re-enter a same category/link/graph.


3.2.  Lexical recommendations: use English singular nouns

Please click here to read Section 3 of an article on WebKB-2.


3.3.  Logical/semantic recommendations: be precise, contextualize, re-use and complement

Please click here and read Section 4 and Section 5 of an article on WebKB-2.

Before adding/re-using a new category, make sure you connect to/re-use the most adequate and precise category possible. To do so, look for categories via various names, look at the supertypes and the subtypes to check the meaning of the listed categories, e.g. check if it is a state, a process, a description, an attribute or a physical object. If you are in doubt, prefer a process, state or physical object to the other categories since then you are more likely to find an adequate relation type in the current ontology to represent the object you have in mind. Also prefer a category which has already been used by other users in their statements (this is indicated when you browse categories).



Philippe A. MARTIN