Difference between revisions of "SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions"

Latest revision as of 21:31, 11 July 2018

SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions
Bibliographical Metadata
Subject:	Querying Distributed RDF Data Sources
Year:	2011
Authors:	Olaf Gorlitz, Steffen Staab
Venue	COLD
Content Metadata
Problem:	SPARQL Query Federation
Approach:	query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions
Implementation:	SPLENDID
Evaluation:	query execution performance evaluation

Abstract

In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions.

Conclusion

SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.

Future work

As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution.

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: https://github.com/semagrow/fork-splendid-server

Access API: No data available now.

Information Representation: RDF

Data Catalogue: VoID

Runs on OS: OS independent

Vendor: Open source

Uses Framework: -

Has Documentation URL: No data available now.

Programming Language: Java

Version: 1.0

Platform: Sesame

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: Querying Distributed RDF Data Sources

RelatedProblem: retrieve and join the result tuples

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: Due to the unpredictable availability and latency of the original SPARQL endpoints of the benchmark dataset we used local copies of them which were hosted on five 64bit Intel(R) Xeon(TM) CPU 3.60GHz server instances running Sesame 2.4.2 with each instance providing the SPARQL endpoint for one life science and for one cross domain dataset. The evaluation was performed on a separate server instance with 64bit Intel(R) Xeon(TM) CPU 3.60GHz and a 100Mbit network connection.

Evaluation Method : The goal of the evaluation is to show that SPLENDID is able to achieve good query execution performance for real world federation scenarios.

Hypothesis: -

Description: we investigated how the information from the VOID descriptions effect the accuracy of the source selection. For each query, we look at the number of sources selected and the resulting number of requests to the SPARQL endpoints. We tested three different source selection approaches, based on 1) predicate index only (no type information), 2) predicate and type index, and 3) predicate and type index and grouping of sameAs patterns as described in Section 4.2.

Dimensions: Performance

Benchmark used: FedBench

Results: AliBaba and DARQ fail to return results for six out of the 14 queries for different reasons. AliBaba generates malformed sub queries for CD3, CD5, LS6, and LS7. DARQ can not handle the unbound predicate in CD1 and LS2. For CD3 and CD5 DARQ opens too many connections to GeoNames. All other unsuccessful queries take longer than the time limit of five minutes. Overall, FedX has the best query evaluation performance. The reason is its novel and efficient query execution based on block transmission of result tuples and parallelization of joins. However, there is only a significant difference between FedX and SPLENDID for CD6, CD7, LS3, LS5-7. For the other queries SPLENDID is close to FedX and for CD3 and CD4 even slightly faster, which indicates that SPLENDID, indeed, generates better query execution plans.

@@ Line 16: / Line 16: @@
 |Implementation=SPLENDID
 |Evaluation=query execution performance evaluation
-|Download-page=No data available now.
+|Download-page=https://github.com/semagrow/fork-splendid-server
 |API=No data available now.
 |InfoRepresentation=RDF
+|Catalogue=VoID
 |OS=OS independent
 |vendor=Open source
 |Framework=-
 |DocumentationURL=No data available now.
-|ProgLang=No data available now.
+|ProgLang=Java
 |Version=1.0
-|Platform=No data available now.
+|Platform=Sesame
 |Toolbox=No data available now.
 |GUI=No
+|Subproblem=Querying Distributed RDF Data Sources
+|RelatedProblem=retrieve and join the result tuples
 |ExperimentSetup=Due to the unpredictable availability and latency of the original SPARQL endpoints
 of the benchmark dataset we used local copies of them which were hosted on five 64bit

Access API	No data available now. +
Event in series	COLD +
Has Benchmark	FedBench +
Has Challenges	{{{Challenges}}} +
Has DataCatalouge	VoID +
Has Description	we investigated how the information from t … we investigated how the information from the VOID descriptions effect the accuracy of the source selection. For each query, we look at the number of sources selected and the resulting number of requests to the SPARQL endpoints. We tested three different source selection approaches, based on 1) predicate index only (no type information), 2) predicate and type index, and 3) predicate and type index and grouping of sameAs patterns as described in Section 4.2. meAs patterns as described in Section 4.2. +
Has Dimensions	Performance +
Has DocumentationURL	http://No data available now. +
Has Downloadpage	https://github.com/semagrow/fork-splendid-server +
Has Evaluation	Query execution performance evaluation +
Has EvaluationMethod	The goal of the evaluation is to show that SPLENDID is able to achieve good query execution performance for real world federation scenarios. +
Has ExperimentSetup	Due to the unpredictable availability and … Due to the unpredictable availability and latency of the original SPARQL endpoints of the benchmark dataset we used local copies of them which were hosted on five 64bit Intel(R) Xeon(TM) CPU 3.60GHz server instances running Sesame 2.4.2 with each instance providing the SPARQL endpoint for one life science and for one cross domain dataset. The evaluation was performed on a separate server instance with 64bit Intel(R) Xeon(TM) CPU 3.60GHz and a 100Mbit network connection. 3.60GHz and a 100Mbit network connection. +
Has GUI	No +
Has Hypothesis	- +
Has Implementation	SPLENDID +
Has InfoRepresentation	RDF +
Has Limitations	{{{Limitations}}} +
Has NegativeAspects	{{{NegativeAspects}}} +
Has PositiveAspects	{{{PositiveAspects}}} +
Has Requirements	{{{Requirements}}} +
Has Results	AliBaba and DARQ fail to return results fo … AliBaba and DARQ fail to return results for six out of the 14 queries for different reasons. AliBaba generates malformed sub queries for CD3, CD5, LS6, and LS7. DARQ can not handle the unbound predicate in CD1 and LS2. For CD3 and CD5 DARQ opens too many connections to GeoNames. All other unsuccessful queries take longer than the time limit of five minutes. Overall, FedX has the best query evaluation performance. The reason is its novel and efficient query execution based on block transmission of result tuples and parallelization of joins. However, there is only a significant difference between FedX and SPLENDID for CD6, CD7, LS3, LS5-7. For the other queries SPLENDID is close to FedX and for CD3 and CD4 even slightly faster, which indicates that SPLENDID, indeed, generates better query execution plans. d, generates better query execution plans. +
Has Subproblem	Querying Distributed RDF Data Sources +
Has Version	1.0 +
Has abstract	In order to leverage the full potential of … In order to leverage the full potential of the Semantic Web it is necessary to transparently query distributed RDF data sources in the same way as it has been possible with federated databases for ages. However, there are significant differences between the Web of (linked) Data and the traditional database approaches. Hence, it is not straightforward to adapt successful database techniques for RDF federation. Reasons are the missing cooperation between SPARQL endpoints and the need for detailed data statistics for estimating the costs of query execution plans. We have implemented SPLENDID, a query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions. ical data obtained from voiD descriptions. +
Has approach	query optimization strategy for federating SPARQL endpoints based on statistical data obtained from voiD descriptions +
Has authors	Olaf Gorlitz + and Steffen Staab +
Has conclusion	SPLENDID allows for transparent query fede … SPLENDID allows for transparent query federation over distributed SPARQL endpoints. In order to achieve a good query execution performance, data source selection and query optimization is based on basic statistical information which is obtained from VOID descriptions. The utilization of open semantic web standards, like VOID and SPARQL endpoints, allows for flexible integration of various distributed and linked RDF data sources. We have described in detail the implementation of the data source selection and the join order optimization. The evaluation shows that our approach can achieve good query performance and is competitive compared to other state-of-the-art federation implementations. In our analysis of the source selection we came to the conclusion that at least predicate and type statistics should be included in VOID description for RDF datasets. The use of 3rd party sameAs links, however, can significantly increase the number of requests and thus, hamper the efficiency of query execution plans. The comparison of the two employed physical join implementations has shown that the network overhead plays an important role. Both hash join and bind join can significantly reduce the query processing time for certain types of queries. With SPLENDID we also like to advocate the adoption of VOID statistics for Linked Data. As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. allow for more efficient query execution. +
Has future work	As next steps, we plan to investigate whet … As next steps, we plan to investigate whether VOID descriptions can easily be extended with more detailed statistics in order to allow for more accurate cardinality estimates and, thus, better query execution plans. On the other hand, the actual query execution has not yet been optimized in SPLENDID. Therefore, we plan to integrate optimization techniques as used in FedX. Moreover, the adoption of the SPARQL 1.1 federation extension will also allow for more efficient query execution. allow for more efficient query execution. +
Has motivation	{{{Motivation}}} +
Has platform	Sesame +
Has problem	SPARQL Query Federation +
Has relatedProblem	Retrieve and join the result tuples +
Has subject	Querying Distributed RDF Data Sources +
Has vendor	Open source +
Has year	2011 +
ImplementedIn ProgLang	Java +
Proposes Algorithm	{{{ProposesAlgorithm}}} +
RunsOn OS	OS independent +
Title	SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions +
Uses Framework	- +
Uses Methodology	{{{Methodology}}} +
Uses Toolbox	No data available now. +

Difference between revisions of "SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions"

Latest revision as of 21:31, 11 July 2018

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools