Querying Distributed RDF Data Sources with SPARQL

Querying Distributed RDF Data Sources with SPARQL
Querying Distributed RDF Data Sources with SPARQL
Bibliographical Metadata
Subject:	Querying Distributed RDF Data Sources
Year:	2008
Authors:	Bastian Quilitz, Ulf Leser
Venue	ESWC
Content Metadata
Problem:	SPARQL Query Federation
Approach:	decompose a query into sub-queries, each of which can be answered by an individual service.
Implementation:	DARQ
Evaluation:	Evaluate the performance of the DARQ query engine.

Abstract

DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution.

Conclusion

DARQ offers a single interface for querying multiple, distributed SPARQL end-points and makes query federation transparent to the client. One key feature of DARQ is that it solely relies on the SPARQL standard and therefore is compatible to any SPARQL endpoint implementing this standard. Using service descriptions provides a powerful way to dynamically add and remove endpoints to the query engine in a manner that is completely transparent to the user. To reduce execution costs we introduced basic query optimization for SPARQL queries. Our experiments show that the optimization algorithm can drastically improve query performance and allow distributed answering of SPARQL queries over distributed sources in reasonable time. Because the algorithm only relies on a very small amount of statistical information we expect that further improvements are possible using techniques. An important issue when dealing with data from multiple data sources are differences in the used vocabularies and the representation of information. In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs.

Future work

In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs.

Approach

Positive Aspects: Query rewriting and cost-based query optimization to speed-up query execution.

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: http://darq.sf.net/

Access API: {{{API}}}

Information Representation: RDF

Data Catalogue: Service Description

Runs on OS: Linux SunOS 5.10

Vendor: Open Source

Uses Framework: ARQ

Has Documentation URL: http://darq.sf.net/

Programming Language: Java

Version: 1.0

Platform: Jena

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: Querying Distributed RDF Data Sources

RelatedProblem: transparent query federation

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: we split all data over two Sun-Fire-880 machines (8x sparcv9 CPU, 1050Mhz, 16GB RAM) running SunOS 5.10. The SPARQL endpoints were provided using Virtuoso Server 5.0.37 with an allowed memory usage of 8GB . Note that, although we use only two physical servers, there were five logical SPARQL endpoints. DARQ was running on Sun Java 1.6.0 on a Linux system with Intel Core Duo CPUs, 2.13 GHz and 4GB RAM. The machines were connected over a standard 100Mbit network connection.

Evaluation Method : evaluate the performance of the DARQ query engine.

Hypothesis: -

Description: In this section we evaluate the performance of the DARQ query engine. The prototype was implemented in Java as an extension to ARQ5. We used a subset of DBpedia6. DBpedia contains RDF information extracted from Wikipedia. The dataset is offered in different parts.

Dimensions: Performance

Benchmark used: subset of DBpedia.

Results: The experiments show that our optimizations significantly improve query evaluation performance. For query Q1 the execution times of optimized and unoptimized execution are almost the same. This is due to the fact that the query plans for both cases are the same and bind joins of all sub-queries in order of appearance is exact the right strategy. For queries Q2 and Q4 the unoptimized queries took longer than 10 min to answer and timed out, whereas the execution time of the optimized queries is quiet reasonable. The optimized execution of Q1 and Q2 takes almost the same time because Q2 is rewritten into Q1.

Access API	{{{API}}} +
Event in series	ESWC +
Has Benchmark	Subset of DBpedia. +
Has Challenges	{{{Challenges}}} +
Has DataCatalouge	Service Description +
Has Description	In this section we evaluate the performanc … In this section we evaluate the performance of the DARQ query engine. The prototype was implemented in Java as an extension to ARQ5. We used a subset of DBpedia6. DBpedia contains RDF information extracted from Wikipedia. The dataset is offered in different parts. The dataset is offered in different parts. +
Has Dimensions	Performance +
Has DocumentationURL	http://darq.sf.net/ +
Has Downloadpage	http://darq.sf.net/ +
Has Evaluation	Evaluate the performance of the DARQ query engine. +
Has EvaluationMethod	evaluate the performance of the DARQ query engine. +
Has ExperimentSetup	we split all data over two Sun-Fire-880 ma … we split all data over two Sun-Fire-880 machines (8x sparcv9 CPU, 1050Mhz, 16GB RAM) running SunOS 5.10. The SPARQL endpoints were provided using Virtuoso Server 5.0.37 with an allowed memory usage of 8GB . Note that, although we use only two physical servers, there were five logical SPARQL endpoints. DARQ was running on Sun Java 1.6.0 on a Linux system with Intel Core Duo CPUs, 2.13 GHz and 4GB RAM. The machines were connected over a standard 100Mbit network connection. ver a standard 100Mbit network connection. +
Has GUI	No +
Has Hypothesis	- +
Has Implementation	DARQ +
Has InfoRepresentation	RDF +
Has Limitations	{{{Limitations}}} +
Has NegativeAspects	{{{NegativeAspects}}} +
Has PositiveAspects	Query rewriting and cost-based query optimization to speed-up query execution. +
Has Requirements	{{{Requirements}}} +
Has Results	The experiments show that our optimization … The experiments show that our optimizations significantly improve query evaluation performance. For query Q1 the execution times of optimized and unoptimized execution are almost the same. This is due to the fact that the query plans for both cases are the same and bind joins of all sub-queries in order of appearance is exact the right strategy. For queries Q2 and Q4 the unoptimized queries took longer than 10 min to answer and timed out, whereas the execution time of the optimized queries is quiet reasonable. The optimized execution of Q1 and Q2 takes almost the same time because Q2 is rewritten into Q1. same time because Q2 is rewritten into Q1. +
Has Subproblem	Querying Distributed RDF Data Sources +
Has Version	1.0 +
Has abstract	DARQ provides transparent query access to … DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution. optimization to speed-up query execution. +
Has approach	decompose a query into sub-queries, each of which can be answered by an individual service. +
Has authors	Bastian Quilitz + and Ulf Leser +
Has conclusion	DARQ offers a single interface for queryin … DARQ offers a single interface for querying multiple, distributed SPARQL end-points and makes query federation transparent to the client. One key feature of DARQ is that it solely relies on the SPARQL standard and therefore is compatible to any SPARQL endpoint implementing this standard. Using service descriptions provides a powerful way to dynamically add and remove endpoints to the query engine in a manner that is completely transparent to the user. To reduce execution costs we introduced basic query optimization for SPARQL queries. Our experiments show that the optimization algorithm can drastically improve query performance and allow distributed answering of SPARQL queries over distributed sources in reasonable time. Because the algorithm only relies on a very small amount of statistical information we expect that further improvements are possible using techniques. An important issue when dealing with data from multiple data sources are differences in the used vocabularies and the representation of information. In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs. and identity relationships across graphs. +
Has future work	In further work, we plan to work on mappin … In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs. and identity relationships across graphs. +
Has motivation	{{{Motivation}}} +
Has platform	Jena +
Has problem	SPARQL Query Federation +
Has relatedProblem	Transparent query federation +
Has subject	Querying Distributed RDF Data Sources +
Has vendor	Open Source +
Has year	2008 +
ImplementedIn ProgLang	Java +
Proposes Algorithm	{{{ProposesAlgorithm}}} +
RunsOn OS	Linux SunOS 5.10 +
Title	Querying Distributed RDF Data Sources with SPARQL +
Uses Framework	ARQ +
Uses Methodology	{{{Methodology}}} +
Uses Toolbox	No data available now. +

Querying Distributed RDF Data Sources with SPARQL

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools