1. Introduction

In this tutorial, we’ll introduce the concept of Resource Description Framework (RDF) and its main characteristics, how to use it, and for what. RDF is a model to represent data about physical objects and abstract concepts. It’s a model to express relations between entities using a graph format. In the following sections, we’ll explain the RDF model in more detail, how to represent it, how to use it, and why.

2. Resource Description Framework (RDF) Representation

RDF allows describing anything: persons, animals, objects, and concepts of any kind. They are considered resources. RDF represents meaningful information for software applications. Though, humans could read and use RDFs.

We represent information by statements in the following format:

<subject> <predicate> <object>

Those statements express a relation between the subject and the object. Both, the subject, and the object are resources.

Let’s see some RDF examples in pseudocode:

<John> <is a> <person>

<John> <is a friend of> <Jane>

<John> <is born on> <the 10th of May 2000>

<John> <is interested in> <the Rosetta Stone>

<the sculpture of David> <is in> <British Museum>

Here, we can see several statements referencing the same resource. The same resource could play different roles in different statements. This way, we could relate those statements and find connections between resources in different statements.

RDFs are usually visualized by directed graphs:

img 627aadca4bc97

Which is a simple and clear way to do it.

RDF statements are also known as triples. In the next section, we explain the data types used in the RDFs.

3. Resource Description Framework (RDF) Data Types

The components of an RDF could be IRIs, literals, and blank nodes.

IRI (Internationalized Resource Identifier) is a protocol standard that expands the Uniform Resource Identifier (URI) one. URI standard only uses the US-ASCII character set. IRI allows containing characters from Unicode character set. IRI allows using Chinese, Japanese, Korean, and Cyrillic characters. IRIs could appear in all the positions of an RDF.

For example, the IRI for the Rosetta Stone is:

https://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FRosetta_stone&sid=4560

and the IRI for the British Museum is:

https://dbpedia.org/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ABritish_Museum&sid=4560

RDF allows combining information from different datasets like Wikidata, DBpedia, and WordNet.

Literals are basic values including strings, dates, and numbers, among others. They could not appear in the subject or predicate positions, just in the object one.

The basic materials to define RDFs are the IRIs and the literals. But sometimes is convenient to refer to resources without a global identifier, considered anonymous resources. They indicate the existence of something without specifying it in detail. They can only be used in the subject or object positions.

4. A Query Language for RDF

SPARQL is a query language that requests and retrieves data that use the RDF format. Thus, the database is a set of items with the format subject-predicate-object previously explained. SPARQL allows using query operations such as JOIN, SORT, AGGREGATE, along with others.

The following example permits obtaining the names and emails of all the persons in the FOAF (Friend Of A Friend) dataset:

PREFIX foaf: 
SELECT ?name 
       ?email
WHERE
  {
    ?person  a          foaf:Person .
    ?person  foaf:name  ?name .
    ?person  foaf:mbox  ?email .
  }

The PREFIX clause declares the label foaf representing the URI indicated between angle brackets. The SELECT clause joins together all the RDFs where the predicate a corresponds to a person according to the foaf dataset, and the person’s name and mailbox.

The join result consists of a set of rows with the name and email of each person in the dataset. As a person may have multiple names and mailboxes, then the returned set of results could contain multiple rows for the same person.

5. RDF Datasets

RDF graphs can be organized in datasets. They should contain a distinguished default graph and zero or more named graphs. A formal semantic does not exist for those datasets. SPARQL uses RDF Datasets for querying. Named graphs are quads where three components correspond to the RDF triple and the fourth one is the name of the graph. The default graph doesn’t require to have a name.

DBpedia project extracts structured content in RDF format from Wikipedia. The DBpedia dataset is interlinked with various other Open Data datasets on the Web. There are more than 45 million links between DBpedia and other datasets. We can use SPARQL to access DBpedia data.

6. Applications Using RDF

Applications using the RDF framework obtain many benefits. RDF provides a standard framework for data and metadata exchange. This framework is open and interoperable. RDF standard syntax allows the software to use metadata and exchange information more efficiently. RDF graphs provide much more information about entity relationships than relational databases.

6.1. Applications Using RDF

Following we introduce some real-life applications which use RDF triples. IBM DB2 Enterprise Server Edition allows storing and querying RDF graphs on databases. DB2 databases let applications use SPARQL for retrieving RDF data. DB2 supports JENA framework APIs loading RDF data into user tables.

Amazon Neptune is a graph database, that supports RDF and SPARQL. It is available in 22 AWS regions. Some of the customers of Amazon Neptune are Samsung Electronics, Pearson, Siemens, AstraZeneca, and Amazon Alexa.

Apache Jena is a framework for Java with an API to use RDF graphs. Those graphs can contain data from files, databases, and URLs. Jena allows using SPARQL to query RDF triples. Jena also provides support for Web Ontology Language (OWL), a family of languages for knowledge representation using ontologies. Jena allows converting RDF graphs to relational databases, between other formats.

6.2. Services Using RDF

Let’s present some services that use RDF graphs. Open Calais, a Thomson Reuters service, uses links to DBpedia. It extracts RDF triples identifying entities, facts, and events from unstructured texts. Calais is available by Refinitiv, a subsidiary of the London Stock Exchange Group. Refinitiv provides financial market data. Calais is also used to tag blog articles and for organizing museum collections.

The BBC Learning-Open Lab uses DBpedia for tagging content and avoiding confusion between those tags. Using DBpedia allows them to tag text referring to ‘aeroplane’, ‘airplane’, and ‘aircraft’ under the same tag. They can easily differentiate Turkey (country) and Turkey (bird).

7. Conclusion

In this tutorial, we explained the concept of Resource Description Framework (RDF) and its format. We also described how to represent the components of an RDF triple, how to query an RDF dataset, and mentioned some important RDF datasets.


» 下一篇: 手写识别算法