Steve Pepper <pepper@ontopia.net>
Ontopia, August 2002
(This document updates and supecedes Topic maps and RDF: A first cut, first published in June 2000.)
Topic maps and RDF originate from two standards organizations, ISO and the W3C respectively, that have traditionally been regarded as competitors. This accounts to some extent for the tendency among the uninformed to regard topic maps and RDF as competitors. Our position is that it makes more sense to regard topic maps and RDF as complementary, and to look for ways of realizing the potential synergies between the two. Ontopia has clearly demonstrated this potential through its use of RDF (under the covers, as it were) in the automated generation of topic maps.
Topic maps and RDF have a number of similarities. They both attempt to alleviate the same general problem of infoglut by applying knowledge representation techniques to information management. They both define abstract models and interchange syntaxes based on XML and both have models that are simple and elegant at one level but extremely powerful at another: In topic maps, most things are topics (not just the topics themselves); in RDF, the value of a resource's property may itself be a resource which in turn has properties of its own.
Topic mapping has its roots in traditional finding aids such as back-of-book indexes, glossaries and thesauri. RDF has its roots in formal logic and mathematical graph theory. Topic mapping is knowledge representation applied to information management from the perspective of humans. RDF is knowledge representation applied to information management from the perspective of machines. This accounts for some of the critical differences between the two.
Topic maps are subject-centric, whereas RDF is resource-centric, so in one sense they have diametrically opposed points of view. (To some extent, this difference in focus parallels that between subject languages, such as the Library of Congress Subject Headings, LCSH, and document languages, such as the Anglo-American Cataloguing Rules, AACR, in the domain of library science. For many years there existed a rivalry between these two kinds of bibliographic language; today there is a consensus that both have their place.) However, "resource" in RDF and "subject" in topic maps can be regarded as synonyms, since information resources can (also) be "subjects" in topic maps and "resources" in RDF do not have to be addressable information resources, so the difference is dialectical rather than diametrical.
RDF is more "low-level" than the topic maps. In RDF resources have properties that have values (which may be other resources); end of story. In topic maps, topics have characteristics of various kinds: names, occurrences and roles played in associations with other topics. The essential semantic distinction between these different kinds of characteristic is absent in RDF. Experience has shown that humans find this level of semantics very intuitive – which is not surprising considering that topic mapping is essentially a formalization (and generalization) of the age-old concepts of back-of-book indexes. (This difference in level of semantics also accounts for the fact that a useful generic browser – like Ontopia's Omnigator – can be built for topic maps, but not for RDF.)
The models of topic maps and RDF are sufficiently similar that it is possible to define generic mappings between the two in either directions. However, doing so does not yield useful results in terms of the target paradigm. An RDF triple can in theory be mapped to at least six different topic map constructs, but without knowledge of the semantics of the predicate, an optimal choice cannot be made. Likewise, topic characteristics can be mapped generically to RDF triples but without an RDF schema for topic maps the higher level of semantics are lost; and even with such a schema, the results are totally inadequate from the point of view of RDF processing.
At the level of the schema, on the other hand, it is possible to describe two-way mappings that are extremely useful. Once the semantics of a particular RDF predicate are known, the choice of what kind of topic map construct to map it to becomes easy. Similarly, semantics that might otherwise be lost when mapping from topic maps to RDF can be expressed in an RDF schema. This suggests that the chances of unifying the two models in the short term are very slight. The immediate goal should rather be interoperability.
The subject of every assertion (or statement) in an RDF model is a resource, identified by a URI. The subject of every assertion in a topic map is a topic, representing a subject, which may be addressable or non-addressable. Addressable subjects are identified by their URIs (as in RDF); non-addressable subjects are identified by the URIs of (one or more) subject indicators. This important distinction is not present in RDF.
In RDF, assertions are always binary. An RDF statement, consisting of a subject, a predicate, and an object, expresses a relationship between subject and object and corresponds to the subject-verb-object construct in natural language – e.g. "Tom buttered the bread", or buttered(Tom, the bread). In topic maps, assertions are n-ary. An association may have any number of roles and can thus easily express more complex relationships – e.g. "Tom buttered the bread with a knife", or buttered(Tom, the bread, a knife) – and even simple statements like "the bread was buttered", buttered(the bread).
In RDF, assertions have direction. Given the statement buttered(Tom, the bread), it is possible to answer the question "What did Tom butter?", but not the question "Who buttered the bread?". In topic maps, associations do not have a direction (or else are multidirectional). In topic maps, it is not possible to assert that "Tom buttered the bread" without also asserting that "The bread was buttered by Tom". This additional expressivity is possible because topic maps have the concept of association roles, which make clear the kind of role played by each participant in a relationship.
Topic maps solve the problem of capturing contextual validity through the concept of scope. The context within which a name, occurrence or association role is considered valid can be easily stated by scoping the relevant assertion. In RDF, the same semantic can only be expressed through a cumbersome process of reifying the statement and then making it the subject of a new statement whose significance has to be described in a schema.
Topic maps were designed from the start for ease of merging. The duality of "subject" and "topic", the concept of subject identity and the ability to establish a topic's identity through a subject address and/or multiple subject indicators are key to this capability. In particular, the notion of published subject indicators (PSIs) promotes interoperability across applications. RDF has none of this machinery. However, since PSIs are based on URIs, they are general enough to solve the interoperability problem for both topic maps and RDF.