1 Introduction
2 Language syntax
2.1 The clause list
2.2 The head
2.3 The tail
2.4 Rule files
3 Query evaluation
3.1 Evaluating references
3.1.1 References to information items
3.1.2 Predicate references
3.2 Clauselist evaluation
3.2.1 Operator clauses
3.2.2 Or clauses
3.2.3 Not clauses
3.2.4 Predicate clauses
3.2.4.1 Rule predicates
3.2.4.2 Association predicates
3.2.5 Evaluating expressions
3.2.6 Evaluating topic references
4 Topic map representation
5 Built-in predicates
5.1 Generative predicates
5.2 association
5.3 association-role
5.4 base-locator
5.5 occurrence
5.6 reifies
5.7 resource
5.8 role-player
5.9 scope
5.10 source-locator
5.11 subject-identifier
5.12 subject-locator
5.13 topic
5.14 topic-name
5.15 topicmap
5.16 type
5.17 value
5.18 variant
5.19 direct-instance-of
5.20 instance-of
5.21 value-like
This document provides a specification for tolog 1.0 in the form of a precise and rigorous description of the language syntax and semantics based on the new ISO 13250.
Note:
This document is not finished! At the moment it is only an incomplete draft. Comments are very much wanted.
$Revision: 1.3 $
The formal syntax of the language is given below. Note that keywords
are case-insensitive. So 'SELECT' will match all of
select
, SELECT
, and SeLeCt
.
Whitespace is allowed between tokens everywhere, and not required
anywhere.
Ed. Note:
Character encoding handling and character set issues.
[1] | query | ::= | head? clauselist tail? '?' |
This defines the overall structure of queries.
The clause list is the main part of the query, and is defined as follows.
[2] | clauselist | ::= | clause (',' clause)* |
A clause list consists of individual clauses.
[3] | clause | ::= | predclause | opclause | orclause | notclause |
Clauses come in four kinds, each of which is defined below.
[4] | predclause | ::= | ref '(' pair (',' pair)* ')' |
[5] | pair | ::= | expr (':' ref)? |
[6] | opclause | ::= | expr operator expr |
[7] | operator | ::= | '/=' |
[8] | orclause | ::= | '{' clauselist ('|' clauselist)* '}' |
[9] | notclause | ::= | 'not' '(' clauselist ')' |
Note how double quotes inside strings are escaped by repeating them; two double quotes parse into a single double quote.
Ed. Note:
URI resolution. Scope rules. Duplicate rules. Identifier presedence rules.
Ed. Note:
Allow rule defs here.
The tail is defined as follows.
Tolog predicates may be defined in rule files, which have the syntax defined below. How these rule files are made available to the processor is not defined in this version of tolog.
[38] | ruleset | ::= | (usingpart | importpart)* rule+ |
[39] | rule | ::= | IDENT '(' varlist ')' ':-' clauselist '.' |
[40] | varlist | ::= | variable (',' variable)* |
Input to query evaluation is
a topic map (an instance of the Topic Maps Data Model),
a set of predicate bindings ([[[what form?]]]),
a set of parameter value bindings (in the form of a mapping from parameter identifiers to the values), and
a query.
Query results consist of a list of sets of (variable, value) bindings.
Query evaluation proceeds as follows:
evaluate all references in the query,
produce a query result from the clauselist,
process the selectpart (if given), and
process the offsetpart and limitpart (if given).
If an error occurs query evaluation stops and no query result is produced.
References can be evaluated in two ways: as references to TMDM information items or as references to predicates.
An INDICATOR token is evaluated by finding
a topic item in the topic map whose [subject identifiers] property
contains a locator item with [notation] set to "URI"
and
[reference] set to the same string as is found between the quotes of
the INDICATOR token. It is an error if no
such topic item is found.
An ADDRESS token is evaluated by finding a
topic item in the topic map whose [subject locator] property contains
a locator item with [notation] set to "URI"
and
[reference] set to the same string as is found between the quotes of
the ADDRESS token. It is an error if no such
topic item is found.
A SRCLOC token is evaluated by finding an
information item in the topic map whose [source locators] property
contains a locator item with [notation] set to "URI"
and
[reference] set to the same string as is found between the quotes of
the SRCLOC token. It is an error if no such
information item is found.
A QNAME token is evaluated as follows:
if there is no ':' character in the QNAME a URI is created by concatenating the value of the topic map item's [base locator] property, a '#' character, and the QNAME token. The resulting string is then looked up as though it had come from a SRCLOC token.
if there is a ':' character the QNAME is split into a prefix (the part before the colon) and a localname (the part after).
Ed. Note:
Finish QNAME definition.
Ed. Note:
OBJID
The clauselist is evaluated by processing each clause in order. The input to the first clause is a query result consisting of a single empty variable binding set. All the following clauses receive the query result produced by the previous clause as input, and the result of the entire clauselist is that produced by the last clause.
Operator clauses produce no new bindings and so are evaluated by scanning the input query result and removing variable binding sets which do not evaluate as true according to the semantics of the operator. Evaluation is done by applying the operator to the values produced by the expressions on either side.
The /=
operator evaluates to true when the topics
produced by the two expressions are different topics, and false when
the topics are the same.
Operator clauses require all variables used in the expressions to be bound already. If they are not this constitutes an error.
Issue (tolog-predicate-order):
Are implementations allowed to change the order of predicates to avoid the error that would occur if, say, an operator clause were placed before the clauses that bind the variables used?
Or clauses are evaluated by evaluating each clauselist in the or clause separately, with the input query result as input to each one. The output is produced by concatenating the output result from each clauselist.
Not clauses are evaluated by evaluating the clauselist in the not clause with the input query result as input. The output query result is then produced by removing every variable binding set in the input query result for which there is a matching variable binding set in the output from the clauselist.
A variable binding set matches another if, for every tuple (variable, value) in the first an equal tuple can be found in the latter.
Ed. Note:
This is too hard to read. Clarify.
The first step is to evaluate the ref before the parentheses in the predicate clause. If this is an IDENT it is matched against the names of rule definitions, and if a match is found, the predicate is evaluated as a rule predicate. If no match is found, the predicate is evaluated as an association predicate.
Rule predicates are evaluated in three steps:
First, the input result set is rewritten to the variables used internally in the query,
Then, the rewritten result set is used as input to the clauselist that is the rule body, and that clauselist evaluated, and
Finally, the result set produced by the clauselist is rewritten back to the external variables used outside the query, and this is returned as the output from the predicate.
Ed. Note:
Specify rewriting.
This section describes the topic map representation used in this specification, based on [TMDM].
Let I
be the set of all identifiers, where identifiers
are tokens which have no other property than being distinguishable
from one another. Let S
be the set of all strings, and
U
the set of all URIs. Then L = S ∪
U
and A = L ∪ I
.
A quadset is a subset of I x I x I x A
.
The following are elements of I
, used here to represent
TMDM instances.
TOPICMAP
BASE_LOCATOR
SOURCE_LOCATOR
TOPIC
CHARACTERISTIC_TYPE
TOPIC_NAME
OCCURRENCE
ROLE
SUBJECT_IDENTIFIER
SUBJECT_LOCATOR
SCOPE
VARIANT
TYPE
Let id(i)
be a function which from a TMDM information
item produces an element of I
.
A quadset can be produced from a TMDM instance by following the procedure described below.
Let the topic map item be tm
, and assert the following:
(id(tm), TOPICMAP, *, id(tm))
(id(tm), BASE_LOCATOR, *, tm.[base locator].[reference])
For each source locator s
in the [source
locators] property of tm
, assert
(id(tm), SOURCE_LOCATOR, *, s.[reference])
For each topic item t
in tm.[topics]
, assert
the following:
(id(tm), TOPIC, *, id(t))
This section defines the predicates built in to tolog 1.0.
A number of the built-in predicates are defined in terms of a rule that generates all possible values that make the predicate true.
Ed. Note:
Add text explaining how to evaluate this.
The predicate association($ASSOC)
generates the matches
produced by traversing the set of association items in the
[associations] property of the topic map item and binding
$ASSOC to each one in turn.
The predicate association-role($ASSOC, $ROLE)
generates
the matches produced by traversing the set of association items in the
[associations] property of the topic map item and then for each
association role item in the [roles] property of each association item
producing a match where $ASSOC is bound to the association
item, and $ROLE to the association role item.
The predicate base-locator($LOC)
generates no matches if
the [base locator] property of the topic map item is null. If it is
not a single match is produced where $LOC is bound to the
locator item in that property.
The predicate occurrence($TOPIC, $OCC)
generates the
matches produced by traversing the set of topic items in the [topics]
property of the topic map item and then for each occurrence item in
the [occurrences] property of each topic item producing a match where
$TOPIC is bound to the topic item, and $OCC to
the occurrence item.
The predicate reifies($REIFIER, $REIFIED)
generates the
matches produced by traversing the set of topic items in the [topics]
property of the topic map item and then for each information item in
the [reified] property of each topic item producing a match where
$REIFIER is bound to the topic item, and
$REIFIED to the information item.
The direct-instance-of predicate can be defined using the following rule.
direct-instance-of($INSTANCE, $CLASS) :- i"http://psi.topicmaps.org/sam/1.0/#type-instance"( $INSTANCE : i"http://psi.topicmaps.org/sam/1.0/#instance", $CLASS : i"http://psi.topicmaps.org/sam/1.0/#class"). |
In implementations where the type-instance relationship is represented directly implementations must evaluate this as if the relationship were always represented using associations.
The instance-of predicate can be defined in terms of the direct-instance-of predicate using the following rules file.
instance-of($INSTANCE, $CLASS) :- { direct-instance-of($INSTANCE, $CLASS) | direct-instance-of($INSTANCE, $DCLASS), descendant-of($DCLASS, $CLASS) }. descendant-of($DESC, $ANC) :- { super-sub($DESC, $ANC) | super-sub($DESC, $INT), descendant-of($INT, $ANC) }. super-sub($SUB, $SUPER) :- i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass"( $SUB : i"http://www.topicmaps.org/xtm/1.0/core.xtm#subclass", $SUPER : i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass"). |
Of course, the descendant-of
and super-sub
predicates are not visible to tolog queries; they are just included in
order to make it possible to define the instance-of predicate.