SPARQL stands for SPARQL Protocol and RDF Query Language which is the standard query language and protocol for Linked Open Data on the web and for RDF databases. Designed by the W3C, SPARQL allows users to query information from any database or data source that can be mapped to RDF, empowering them to focus on what information they need rather than how the data is internally organized.
SPARQL vs SQL
Similar to how SQL provides query and modification capabilities for relational databases, SPARQL serves the same function for RDF graph databases like GraphDB. A key advantage of SPARQL lies in its versatility: it can be executed on any database that can be viewed as RDF, often through middleware like Relational Database to RDF (RDB2RDF) mapping software.
This makes SPARQL a powerful language offering robust capabilities for computation, filtering, aggregation, and subquery functionality.
In contrast to SQL, SPARQL queries are not constrained to working within one database: federated queries can access multiple data stores (endpoints). This is technically possible because SPARQL is more than just a query language. It is also an HTTP-based transport protocol, where any SPARQL endpoint can be accessed via a standardized transport layer. RDF results can be returned in several data-interchange formats and RDF entities are identified by Universal Resource Identifiers (URIs).
Forging data with URIs allows data to be unambiguously referenced across applications and overcomes the constraints posed by local search. Consequently, additional application-specific APIs can be developed and can refer to that data.
These design choices – enabling queries over distributed sources on non-uniform data – are not accidental. SPARQL is designed to enable Linked Data for the Semantic Web. Its goal is to enrich data by linking it to other global semantic resources, thus sharing, merging and reusing data in a more meaningful way.
As a result, the power of SPARQL together with the flexibility of RDF can lower development costs by making it easier to merge results from multiple data sources.
SPARQL from within
SPARQL sees your data as a directed, labeled graph, that is internally expressed as triples consisting of subject, predicate and object.
Correspondingly, a SPARQL query consists of a set of triple patterns in which each element (the subject, predicate and object) can be a variable (wildcard). Solutions to the variables are then found by matching the patterns in the query to triples in the dataset.
SPARQL has four types of queries. It can be used to:
- ASK whether there is at least one match of the query pattern in the RDF graph data;
- SELECT all or some of those matches in a tabular form (including aggregation, sampling and pagination through OFFSET and LIMIT);
- CONSTRUCT an RDF graph by substituting the variables in these matches in a set of triple templates;
- DESCRIBE the matches found by constructing a relevant RDF graph.
The leading semantic graph databases that support SPARQL have intuitive SPARQL editors with autocomplete, explorer and many other features that facilitate building powerful SPARQL queries.
Query pattern examples
The greatest strength of SPARQL is navigating relationships in RDF graph data through graph pattern matching. In this process, simple patterns can be combined into more complex ones, which explore more elaborate relationships in the data.
The relationships can be explored by using basic patterns, pattern joins, unions, by adding optional patterns that may extend the information about the found solutions, etc. In addition, property paths allow sequential composition (sequencing), parallel composition (alternatives), iterations (Kleene star), inversion, and more.
As already mentioned, the basic graph pattern consists of a triple in which each element (subject, predicate and object) can be a variable (wildcard).
Let’s see an example.
The pattern ‘John’ (a subject)->‘has son’ (a predicate)->X (a wildcard object) will have as a solution each triple in the RDF graph that matches the subject and the predicate, and has any object.
So if John has two sons – Bob and Michael, the triples ‘John’->‘has son’->‘Bob’ and ‘John’->‘has son’->‘Michael’ will be the results of the SPARQL query.
For example, the union of the patterns ‘John’->‘has son’->X and ‘John’->‘has daughter’->X will have as solutions all of John’s sons and all of John’s daughters.
The sons of John’s daughters, however, will not be returned because the first basic pattern in the query, namely ‘John’->‘has son’->Y, will not be matched by a triple in the data such as ‘John’->‘has daughter’->‘Anna’.
So even if, ‘Anna’->‘has son’->‘Timmy’, Timmy will not show up as a solution of the above join. Luckily, an alternative graph pattern and a group graph pattern can easily be combined. So a union of ‘John’->‘has son’->Y and ‘John’->‘has daughter’->Y grouped with Y->‘has son’->Z will find all of John’s grandsons.
Extensions of SPARQL
SPARQL is not just a query language, but a comprehensive set of specifications. SPARQL UPDATE includes queries to delete data, insert data and manipulate graphs. In general, the SPARQL Protocol defines how to access SPARQL endpoints and result formats and can be further extended to leverage the uniqueness of various data types.
Standardized extensions include GeoSPARQL for querying geospatial data. Custom extensions supported by GraphDB include full-text search, making queries against external full-text and faceting engines (Lucene, SOLR, ElasticSearch), RDFRank for ordering, SPARQL MM for multimedia, and more.
SPARQL-Star
RDF databases have often been criticized because they don’t allow for descriptions or properties to be attached to the edges in the graph (when a set of triples are joined together, they they form a natural graph, where the predicates are interpreted as edges, and the subjects and objects are the nodes). This has been perceived by some as a disadvantage compared to Labeled Property Graphs. However, this concern has been addressed with RDF-Star, which allows one to make statements about other statements and this way to attach metadata to the edges in the graph.
Therefore, SPARQL has been extended accordingly with SPARQL-star to accommodate the RDF-Star updates in the RDF model (to allow for querying metadata about edges in the graph).
Why use SPARQL?
As you can see, there is a wide variety of graph patterns that can be matched through SPARQL queries, which reflects the variety of the data that SPARQL was designed to query. As a result, it can efficiently extract information hidden in non-uniform data and stored in various formats and sources.
As the inventor of the World Wide Web, the creator and advocate of the Semantic Web and W3C Director, Sir Tim Berners-Lee, puts it:
“Trying to use the Semantic Web without SPARQL is like trying to use a relational database without SQL. SPARQL makes it possible to query information from databases and other diverse sources in the wild, across the Web.”