Steve Pepper <pepper@ontopia.net>
Ontopia, August 2002
(This document updates and supercedes Topic maps and RDF: A first cut, first published in June 2000.)
For a more in-depth examination of the relationship between RDF and Topic Maps, see Living with Topic Maps and RDF by Lars Marius Garshol, to whom I am indebted for many of the ideas that are summarized here.
Topic maps and RDF originate from two standards organizations, ISO and the W3C respectively, that have traditionally been regarded as competitors. This accounts to some extent for the tendency among the uninformed to regard topic maps and RDF as competitors. Our position is that it makes more sense to regard topic maps and RDF as complementary, and to look for ways of realizing the potential synergies between the two. Ontopia has clearly demonstrated this potential through its use of RDF (under the covers, as it were) in the automated generation of topic maps.
Topic maps and RDF have a number of similarities. They both attempt to alleviate the same general problem of infoglut by applying knowledge representation techniques to information management. They both define abstract models and interchange syntaxes based on XML and both have models that are simple and elegant at one level but extremely powerful at another: In topic maps, most things are topics (not just the topics themselves); in RDF, the value of a resource's property may itself be a resource which in turn has properties of its own.
Topic mapping has its roots in traditional finding aids such as back-of-book indexes, glossaries and thesauri. RDF has its roots in formal logic and mathematical graph theory. Topic mapping is knowledge representation applied to information management from the perspective of humans. RDF is knowledge representation applied to information management from the perspective of machines. This accounts for some of the critical differences between the two.
RDF is resource-centric, whereas topic maps are subject-centric. In RDF one starts with information resources and attaches metadata structures to them; in topic maps, the primary focus is the subjects that the information is "about". So in one sense RDF and topic maps have diametrically opposed points of view. (To some extent, this difference in focus parallels that between document languages, such as the Anglo-American Cataloguing Rules, AACR, and subject languages, such as the Library of Congress Subject Headings, LCSH, in the domain of library science.) However, "resource" in RDF and "subject" in topic maps can be regarded as synonyms, since information resources can (also) be "subjects" in topic maps and "resources" in RDF do not have to be addressable information resources – so the difference is dialectical rather than diametrical.
RDF is more "low-level" than the topic maps. In RDF resources have properties that have values (which may be other resources); end of story. In topic maps, topics have characteristics of various kinds: names, occurrences and roles played in associations with other topics. The essential semantic distinction between these different kinds of characteristic is absent in RDF. Experience has shown that humans find this level of semantics very intuitive – which is not surprising considering that topic mapping is essentially a formalization (and generalization) of the age-old concepts of back-of-book indexes. (This difference in level of semantics also accounts for the fact that a useful generic browser – like Ontopia's Omnigator – can be built for topic maps, but not for RDF.)
The models of topic maps and RDF are sufficiently similar that it is possible to define generic mappings between the two in either directions. However, doing so does not yield useful results in terms of the target paradigm. An RDF triple can in theory be mapped to at least six different topic map constructs, but without knowledge of the semantics of the predicate, an optimal choice cannot be made. Likewise, topic characteristics can be mapped generically to RDF triples but without an RDF schema for topic maps the higher level of semantics are lost; and even with such a schema, the results are totally inadequate from the point of view of RDF processing.
At the level of the schema, on the other hand, it is possible to describe two-way mappings that are extremely useful. Once the semantics of a particular RDF predicate are known, the choice of what kind of topic map construct to map it to becomes easy. Similarly, semantics that might otherwise be lost when mapping from topic maps to RDF can be expressed in an RDF schema. This suggests that the chances of unifying the two models in the short term are very slight. The immediate goal should rather be interoperability.
The subject of every assertion (or statement) in an RDF model is a resource, identified by a URI. The subject of every assertion in a topic map is a topic, representing a subject, which may be addressable or non-addressable. Addressable subjects are identified by their URIs (as in RDF); non-addressable subjects are identified by the URIs of (one or more) subject indicators. This important distinction is not present in RDF.
In RDF, assertions have a direction. The statement buttered(Tom, the bread) (1) is different from the statement buttered(the bread, Tom) (2), which in turn is different from was-buttered-by(the bread, Tom) (3). This leads to the tendency to create redundant, inverse relationships (of which (1) and (3) are examples). The ability (provided by DAML-OIL) to state explicitly that buttered and was-buttered-by are inverse relationships does not solve the redundancy problem. In topic maps, it is not possible to assert that "Tom buttered the bread" without also asserting that "The bread was buttered by Tom" – they are one and the same association. This additional expressivity is made possible by the notion of association roles, which make clear the kind of role played by each participant in a relationship.
Association roles also make it possible to go beyond binary relationships. In RDF, assertions are always binary. An RDF statement, consisting of a subject, a predicate, and an object, expresses a relationship between subject and object and corresponds to the subject-verb-object construct in natural language – e.g. "Tom buttered the bread", or buttered(Tom, the bread). In topic maps, assertions are n-ary. An association may have any number of roles and can thus easily express more complex relationships – e.g. "Tom buttered the bread with a knife", or buttered(Tom, the bread, a knife) – and even simple statements like "the bread was buttered", buttered(the bread).
Topic maps solve the problem of capturing contextual validity through the concept of scope. The context within which a name, occurrence or association role is considered valid can be easily stated by scoping the relevant assertion. In RDF, the same semantic can only be expressed through a cumbersome process of reifying the statement and then making it the subject of a new statement whose significance has to be described in a schema. Multilingual support, through the ability to give topics multiple names in different languages, is one important application of scope.
Topic maps were designed from the start for ease of merging. The duality of "subject" and "topic", the concept of subject identity and the ability to establish a topic's identity through a subject address and/or multiple subject indicators are key to this capability. In particular, the notion of published subject indicators (PSIs) promotes interoperability across applications. RDF has none of this machinery. However, since PSIs are based on URIs, they are general enough to solve the interoperability problem for both topic maps and RDF – and make it easier to exploit the synergies between the two.
October 2002: Refined some arguments and fixed the numbering. (The notion that it is possible to have Ten Theses, numbered 1 to 12, is indirectly attributable to Daniel Rivers-Moore, who otherwise bears no responsibility for the content of this document :-)