SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions | |
---|---|
SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
| |
Bibliographical Metadata | |
Subject: | Querying Distributed RDF Data Sources |
Year: | 2011 |
Authors: | Olaf Gorlitz, Steffen Staab |
Venue | COLD |
Content Metadata | |
Problem: | SPLENDID |
Contents
Abstract
In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions.
Conclusion
SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.
Future work
As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.
Approach
Positive Aspects: {{{PositiveAspects}}}
Negative Aspects: {{{NegativeAspects}}}
Limitations: {{{Limitations}}}
Challenges: {{{Challenges}}}
Proposes Algorithm: {{{ProposesAlgorithm}}}
Methodology: {{{Methodology}}}
Requirements: {{{Requirements}}}
Limitations: {{{Limitations}}}
Implementations
Download-page: {{{Download-page}}}
Access API: {{{API}}}
Information Representation: {{{InfoRepresentation}}}
Data Catalogue: {{{Catalogue}}}
Runs on OS: {{{OS}}}
Vendor: {{{vendor}}}
Uses Framework: {{{Framework}}}
Has Documentation URL: {{{DocumentationURL}}}
Programming Language: {{{ProgLang}}}
Version: {{{Version}}}
Platform: {{{Platform}}}
Toolbox: {{{Toolbox}}}
GUI: No
Research Problem
Subproblem of: {{{Subproblem}}}
RelatedProblem: {{{RelatedProblem}}}
Motivation: {{{Motivation}}}
Evaluation
Experiment Setup: {{{ExperimentSetup}}}
Evaluation Method : {{{EvaluationMethod}}}
Hypothesis: {{{Hypothesis}}}
Description: {{{Description}}}
Dimensions: {{{Dimensions}}}
Benchmark used: {{{Benchmark}}}
Results: {{{Results}}}
Access API | {{{API}}} + |
Event in series | COLD + |
Has Challenges | {{{Challenges}}} + |
Has DataCatalouge | {{{Catalogue}}} + |
Has Description | {{{Description}}} + |
Has Dimensions | {{{Dimensions}}} + |
Has DocumentationURL | http://{{{DocumentationURL}}} + |
Has Downloadpage | http://{{{Download-page}}} + |
Has EvaluationMethod | {{{EvaluationMethod}}} + |
Has ExperimentSetup | {{{ExperimentSetup}}} + |
Has GUI | No + |
Has Hypothesis | {{{Hypothesis}}} + |
Has InfoRepresentation | {{{InfoRepresentation}}} + |
Has Limitations | {{{Limitations}}} + |
Has NegativeAspects | {{{NegativeAspects}}} + |
Has PositiveAspects | {{{PositiveAspects}}} + |
Has Requirements | {{{Requirements}}} + |
Has Results | {{{Results}}} + |
Has Version | {{{Version}}} + |
Has abstract | In order to leverage the full potential of … In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques
for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions. ical data obtained from voiD descriptions. + |
Has authors | Olaf Gorlitz + and Steffen Staab + |
Has conclusion | SPLENDID allows for transparent query fede … SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source
allow for more efficient query execution. +selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. |
Has future work | As next steps, we plan to investigate whet … As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. allow for more efficient query execution. + |
Has motivation | {{{Motivation}}} + |
Has platform | {{{Platform}}} + |
Has problem | SPLENDID + |
Has subject | Querying Distributed RDF Data Sources + |
Has vendor | {{{vendor}}} + |
Has year | 2011 + |
Proposes Algorithm | {{{ProposesAlgorithm}}} + |
Title | SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions + |
Uses Framework | {{{Framework}}} + |
Uses Methodology | {{{Methodology}}} + |
Uses Toolbox | {{{Toolbox}}} + |