RDF and Linked Data

The Resource Description Framework (RDF) is really a data model for exchange of knowledge representation. Linked Data conforms to the same model: triplets of (subject, relationship, object), where all three terms are URIs (Uniform Resource Identifiers), so that if two triplets use the same URI for the same subject then the properties from both triples apply to it, and so on.

RDF is not a model for documents. One can extract triplets from documents, but if it is to be useful the extraction is normally lossy: a list of the characters and places mentioned in a novel such as Star Wars is a common example, together with externally-derived triplets describing relationships such as was born at or loves or caused to become frozen. The list does not include every occurrence of every word in the novel, and the book cannot be reconstructed from the list. This example is very typical in its relationship to information, to the original book. Query languages such as GraphQL and SPARQL can then be used to explore the data and to support applications. Note that Linked Data may in many cases represent a complete data set, but even there the data is normally unordered, unlike words and paragraphs in writing.

A weakness of RDF is that triplets are not natively labeled as to their source. In practice a triple store (a content management system for RDF) will normally extend the model to add permissions and information about who added each triple, but this is not normally accessible to the standard query languages. RDF stores may also support federated queries, in which case the additional information is not passed between systems, potentially leading to trust and security issues.

The RDF model is also used for the Really Simple Syndication format (RSS), so that RSS files are simple XML representations of an RDF data model.

RDF values are simple (atomic) strings: there is no direct support for mixed content such as running text in HTML, and text fragments containing element markup, such as one finds in RSS, have to be escaped and are treated by RDF systems as simple strings. RDF query languages tend to be weak in string processing, exacerbating difficulties here: you can’t generally use SPARQL to find all RSS feed items containing an HTML link to a given resource, for example.

Linked Data in all of its forms provides ways to think about property-based reasoning, and can be very useful when working with metadata, especially in conjunction with other document formats.