tolog 1.0

1 Introduction

This document provides a specification for tolog 1.0 in the form of a precise and rigorous description of the language syntax and semantics based on the new ISO 13250.

Note:

This document is not finished! At the moment it is only an incomplete draft. Comments are very much wanted.

$Revision: 1.3 $

2 Language syntax

The formal syntax of the language is given below. Note that keywords are case-insensitive. So 'SELECT' will match all of select, SELECT, and SeLeCt. Whitespace is allowed between tokens everywhere, and not required anywhere.

Ed. Note:
Character encoding handling and character set issues.

Queries

[1] query ::= head? clauselist tail? '?'

This defines the overall structure of queries.

2.1 The clause list

The clause list is the main part of the query, and is defined as follows.

Clause lists

[2] clauselist ::= clause (',' clause)*

A clause list consists of individual clauses.

Clauses

[3] clause ::= predclause | opclause | orclause | notclause

Clauses come in four kinds, each of which is defined below.

Predicate clauses

[4]	`predclause`	::=	`ref '(' pair (',' pair)* ')'`
[5]	`pair`	::=	`expr (':' ref)?`

Operator clauses

[6]	`opclause`	::=	`expr operator expr`
[7]	`operator`	::=	`'/='`

Or clauses

[8] orclause ::= '{' clauselist ('|' clauselist)* '}'

Not clauses

[9] notclause ::= 'not' '(' clauselist ')'

Language particles

[10]	`expr`	::=	`variable \| ref \| value \| parameter`
[11]	`variable`	::=	`'$' IDENT`
[12]	`parameter`	::=	`'%' IDENT '%'`
[13]	`ref`	::=	`OBJID \| QNAME \| uriref`
[14]	`uriref`	::=	`INDICATOR \| ADDRESS \| SRCLOC`
[15]	`value`	::=	`STRING`
[16]	`OBJID`	::=	`'@' [A-Za-z0-9]+`
[17]	`QNAME`	::=	`[A-Za-z_] [A-Za-z0-9_.-]* (':' [A-Za-z0-9_.-]+)?`
[18]	`IDENT`	::=	`[A-Za-z_] [A-Za-z0-9_.-]*`
[19]	`INDICATOR`	::=	`'i' URL`
[20]	`ADDRESS`	::=	`'a' URL`
[21]	`SRCLOC`	::=	`'s' URL`
[22]	`URL`	::=	`'"' [^"]* '"'`
[23]	`STRING`	::=	`'"' ([^"] \| '"' '"')* '"'`

Note how double quotes inside strings are escaped by repeating them; two double quotes parse into a single double quote.

Ed. Note:
URI resolution. Scope rules. Duplicate rules. Identifier presedence rules.

2.2 The head

Ed. Note:
Allow rule defs here.

Heads

[24]	`head`	::=	`(usingpart \| importpart)* selectpart?`
[25]	`usingpart`	::=	`'using' IDENT 'for' uriref`
[26]	`importpart`	::=	`'import' URI 'as' IDENT`
[27]	`selectpart`	::=	`'SELECT' selectlist 'FROM'`
[28]	`selectlist`	::=	`selpart (',' selpart)*`
[29]	`selpart`	::=	`variable \| aggfun '(' variable ')`
[30]	`aggfun`	::=	`'count'`

2.3 The tail

The tail is defined as follows.

Tails

[31]	`tail`	::=	`orderpart? limitpart? offsetpart?`
[32]	`orderpart`	::=	`'ORDER' 'BY' orderlist`
[33]	`orderlist`	::=	`ordpart (',' ordpart)*`
[34]	`ordpart`	::=	`variable ('ASC' \| 'DESC')?`
[35]	`limitpart`	::=	`'limit' INTEGER`
[36]	`offsetpart`	::=	`'offset' INTEGER`
[37]	`INTEGER`	::=	`[0-9]+`

2.4 Rule files

Tolog predicates may be defined in rule files, which have the syntax defined below. How these rule files are made available to the processor is not defined in this version of tolog.

Rule sets

[38]	`ruleset`	::=	`(usingpart \| importpart)* rule+`
[39]	`rule`	::=	`IDENT '(' varlist ')' ':-' clauselist '.'`
[40]	`varlist`	::=	`variable (',' variable)*`

3 Query evaluation

Input to query evaluation is

a topic map (an instance of the Topic Maps Data Model),
a set of predicate bindings ([[[what form?]]]),
a set of parameter value bindings (in the form of a mapping from parameter identifiers to the values), and
a query.

Query results consist of a list of sets of (variable, value) bindings.

Query evaluation proceeds as follows:

evaluate all references in the query,
produce a query result from the clauselist,
process the selectpart (if given), and
process the offsetpart and limitpart (if given).

If an error occurs query evaluation stops and no query result is produced.

3.1 Evaluating references

References can be evaluated in two ways: as references to TMDM information items or as references to predicates.

3.1.1 References to information items

An INDICATOR token is evaluated by finding a topic item in the topic map whose [subject identifiers] property contains a locator item with [notation] set to "URI" and [reference] set to the same string as is found between the quotes of the INDICATOR token. It is an error if no such topic item is found.

An ADDRESS token is evaluated by finding a topic item in the topic map whose [subject locator] property contains a locator item with [notation] set to "URI" and [reference] set to the same string as is found between the quotes of the ADDRESS token. It is an error if no such topic item is found.

A SRCLOC token is evaluated by finding an information item in the topic map whose [source locators] property contains a locator item with [notation] set to "URI" and [reference] set to the same string as is found between the quotes of the SRCLOC token. It is an error if no such information item is found.

A QNAME token is evaluated as follows:

if there is no ':' character in the QNAME a URI is created by concatenating the value of the topic map item's [base locator] property, a '#' character, and the QNAME token. The resulting string is then looked up as though it had come from a SRCLOC token.
if there is a ':' character the QNAME is split into a prefix (the part before the colon) and a localname (the part after).

Ed. Note:
Finish QNAME definition.

Ed. Note:
OBJID

3.1.2 Predicate references

Ed. Note:
Fill in this, making sure to get precedence right.

3.2 Clauselist evaluation

The clauselist is evaluated by processing each clause in order. The input to the first clause is a query result consisting of a single empty variable binding set. All the following clauses receive the query result produced by the previous clause as input, and the result of the entire clauselist is that produced by the last clause.

3.2.1 Operator clauses

Operator clauses produce no new bindings and so are evaluated by scanning the input query result and removing variable binding sets which do not evaluate as true according to the semantics of the operator. Evaluation is done by applying the operator to the values produced by the expressions on either side.

The /= operator evaluates to true when the topics produced by the two expressions are different topics, and false when the topics are the same.

Operator clauses require all variables used in the expressions to be bound already. If they are not this constitutes an error.

Issue (tolog-predicate-order):

Are implementations allowed to change the order of predicates to avoid the error that would occur if, say, an operator clause were placed before the clauses that bind the variables used?

3.2.2 Or clauses

Or clauses are evaluated by evaluating each clauselist in the or clause separately, with the input query result as input to each one. The output is produced by concatenating the output result from each clauselist.

3.2.3 Not clauses

Not clauses are evaluated by evaluating the clauselist in the not clause with the input query result as input. The output query result is then produced by removing every variable binding set in the input query result for which there is a matching variable binding set in the output from the clauselist.

A variable binding set matches another if, for every tuple (variable, value) in the first an equal tuple can be found in the latter.

Ed. Note:
This is too hard to read. Clarify.

3.2.4 Predicate clauses

The first step is to evaluate the ref before the parentheses in the predicate clause. If this is an IDENT it is matched against the names of rule definitions, and if a match is found, the predicate is evaluated as a rule predicate. If no match is found, the predicate is evaluated as an association predicate.

3.2.4.1 Rule predicates

Rule predicates are evaluated in three steps:

First, the input result set is rewritten to the variables used internally in the query,
Then, the rewritten result set is used as input to the clauselist that is the rule body, and that clauselist evaluated, and
Finally, the result set produced by the clauselist is rewritten back to the external variables used outside the query, and this is returned as the output from the predicate.

Ed. Note:
Specify rewriting.

3.2.4.2 Association predicates

Ed. Note:
Not looking forward to this bit.

3.2.5 Evaluating expressions

Ed. Note:
Yes, please.

3.2.6 Evaluating topic references

Ed. Note:
Uh, yeah.

4 Topic map representation

This section describes the topic map representation used in this specification, based on [TMDM].

Let I be the set of all identifiers, where identifiers are tokens which have no other property than being distinguishable from one another. Let S be the set of all strings, and U the set of all URIs. Then L = S ∪ U and A = L ∪ I. A quadset is a subset of I x I x I x A.

The following are elements of I, used here to represent TMDM instances.

TOPICMAP
BASE_LOCATOR
SOURCE_LOCATOR
TOPIC
CHARACTERISTIC_TYPE
TOPIC_NAME
OCCURRENCE
ROLE
SUBJECT_IDENTIFIER
SUBJECT_LOCATOR
SCOPE
VARIANT
TYPE

Let id(i) be a function which from a TMDM information item produces an element of I.

A quadset can be produced from a TMDM instance by following the procedure described below.

Let the topic map item be tm, and assert the following:

(id(tm), TOPICMAP, *, id(tm))
(id(tm), BASE_LOCATOR, *, tm.[base locator].[reference])
For each source locator s in the [source locators] property of tm, assert (id(tm), SOURCE_LOCATOR, *, s.[reference])

For each topic item t in tm.[topics], assert the following:

(id(tm), TOPIC, *, id(t))

5 Built-in predicates

This section defines the predicates built in to tolog 1.0.

5.1 Generative predicates

A number of the built-in predicates are defined in terms of a rule that generates all possible values that make the predicate true.

Ed. Note:
Add text explaining how to evaluate this.

5.2 association

The predicate association($ASSOC) generates the matches produced by traversing the set of association items in the [associations] property of the topic map item and binding $ASSOC to each one in turn.

5.3 association-role

The predicate association-role($ASSOC, $ROLE) generates the matches produced by traversing the set of association items in the [associations] property of the topic map item and then for each association role item in the [roles] property of each association item producing a match where $ASSOC is bound to the association item, and $ROLE to the association role item.

5.4 base-locator

The predicate base-locator($LOC) generates no matches if the [base locator] property of the topic map item is null. If it is not a single match is produced where $LOC is bound to the locator item in that property.

5.5 occurrence

The predicate occurrence($TOPIC, $OCC) generates the matches produced by traversing the set of topic items in the [topics] property of the topic map item and then for each occurrence item in the [occurrences] property of each topic item producing a match where $TOPIC is bound to the topic item, and $OCC to the occurrence item.

5.6 reifies

The predicate reifies($REIFIER, $REIFIED) generates the matches produced by traversing the set of topic items in the [topics] property of the topic map item and then for each information item in the [reified] property of each topic item producing a match where $REIFIER is bound to the topic item, and $REIFIED to the information item.

5.7 resource

5.8 role-player

5.9 scope

5.10 source-locator

5.11 subject-identifier

5.12 subject-locator

5.13 topic

5.14 topic-name

5.15 topicmap

5.16 type

5.17 value

5.18 variant

5.19 direct-instance-of

The direct-instance-of predicate can be defined using the following rule.

direct-instance-of($INSTANCE, $CLASS) :-
  i"http://psi.topicmaps.org/sam/1.0/#type-instance"(
    $INSTANCE : i"http://psi.topicmaps.org/sam/1.0/#instance",
    $CLASS    : i"http://psi.topicmaps.org/sam/1.0/#class").

In implementations where the type-instance relationship is represented directly implementations must evaluate this as if the relationship were always represented using associations.

5.20 instance-of

The instance-of predicate can be defined in terms of the direct-instance-of predicate using the following rules file.

instance-of($INSTANCE, $CLASS) :- {
  direct-instance-of($INSTANCE, $CLASS) |
  direct-instance-of($INSTANCE, $DCLASS), descendant-of($DCLASS, $CLASS)
}.

descendant-of($DESC, $ANC) :- {
  super-sub($DESC, $ANC) |
  super-sub($DESC, $INT), descendant-of($INT, $ANC)
}.

super-sub($SUB, $SUPER) :-
  i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass"(
    $SUB   : i"http://www.topicmaps.org/xtm/1.0/core.xtm#subclass",
    $SUPER : i"http://www.topicmaps.org/xtm/1.0/core.xtm#superclass").

Of course, the descendant-of and super-sub predicates are not visible to tolog queries; they are just included in order to make it possible to define the instance-of predicate.