Querying Distributed RDF Data Sources with SPARQL
Querying Distributed RDF Data Sources with SPARQL | |
---|---|
Querying Distributed RDF Data Sources with SPARQL
| |
Bibliographical Metadata | |
Subject: | Querying Distributed RDF Data Sources |
Year: | 2008 |
Authors: | Bastian Quilitz, Ulf Leser |
Venue | ESWC |
Content Metadata | |
Problem: | SPARQL Query Federation |
Approach: | decompose a query into sub-queries, each of which can be answered by an individual service. |
Implementation: | DARQ |
Evaluation: | Evaluate the performance of the DARQ query engine. |
Contents
Abstract
DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution.
Conclusion
DARQ offers a single interface for querying multiple, distributed SPARQL end-points and makes query federation transparent to the client. One key feature of DARQ is that it solely relies on the SPARQL standard and therefore is compatible to any SPARQL endpoint implementing this standard. Using service descriptions provides a powerful way to dynamically add and remove endpoints to the query engine in a manner that is completely transparent to the user. To reduce execution costs we introduced basic query optimization for SPARQL queries. Our experiments show that the optimization algorithm can drastically improve query performance and allow distributed answering of SPARQL queries over distributed sources in reasonable time. Because the algorithm only relies on a very small amount of statistical information we expect that further improvements are possible using techniques. An important issue when dealing with data from multiple data sources are differences in the used vocabularies and the representation of information. In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs.
Future work
In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs.
Approach
Positive Aspects: Query rewriting and cost-based query optimization to speed-up query execution.
Negative Aspects: {{{NegativeAspects}}}
Limitations: {{{Limitations}}}
Challenges: {{{Challenges}}}
Proposes Algorithm: {{{ProposesAlgorithm}}}
Methodology: {{{Methodology}}}
Requirements: {{{Requirements}}}
Limitations: {{{Limitations}}}
Implementations
Download-page: http://darq.sf.net/
Access API: {{{API}}}
Information Representation: RDF
Data Catalogue: Service Description
Runs on OS: Linux SunOS 5.10
Vendor: Open Source
Uses Framework: ARQ
Has Documentation URL: http://darq.sf.net/
Programming Language: Java
Version: 1.0
Platform: Jena
Toolbox: No data available now.
GUI: No
Research Problem
Subproblem of: Querying Distributed RDF Data Sources
RelatedProblem: transparent query federation
Motivation: {{{Motivation}}}
Evaluation
Experiment Setup: we split all data over two Sun-Fire-880 machines (8x sparcv9 CPU, 1050Mhz, 16GB RAM) running SunOS 5.10. The SPARQL endpoints were provided using Virtuoso Server 5.0.37 with an allowed memory usage of 8GB . Note that, although we use only two physical servers, there were five logical SPARQL endpoints. DARQ was running on Sun Java 1.6.0 on a Linux system with Intel Core Duo CPUs, 2.13 GHz and 4GB RAM. The machines were connected over a standard 100Mbit network connection.
Evaluation Method : evaluate the performance of the DARQ query engine.
Hypothesis: -
Description: In this section we evaluate the performance of the DARQ query engine. The prototype was implemented in Java as an extension to ARQ5. We used a subset of DBpedia6. DBpedia contains RDF information extracted from Wikipedia. The dataset is offered in different parts.
Dimensions: Performance
Benchmark used: subset of DBpedia.
Results: The experiments show that our optimizations significantly improve query evaluation performance. For query Q1 the execution times of optimized and unoptimized execution are almost the same. This is due to the fact that the query plans for both cases are the same and bind joins of all sub-queries in order of appearance is exact the right strategy. For queries Q2 and Q4 the unoptimized queries took longer than 10 min to answer and timed out, whereas the execution time of the optimized queries is quiet reasonable. The optimized execution of Q1 and Q2 takes almost the same time because Q2 is rewritten into Q1.
Access API | {{{API}}} + |
Event in series | ESWC + |
Has Benchmark | Subset of DBpedia. + |
Has Challenges | {{{Challenges}}} + |
Has DataCatalouge | Service Description + |
Has Description | In this section we evaluate the performanc … In this section we evaluate the performance of the DARQ query engine. The
The dataset is offered in different parts. +prototype was implemented in Java as an extension to ARQ5. We used a subset of DBpedia6. DBpedia contains RDF information extracted from Wikipedia. The dataset is offered in different parts. |
Has Dimensions | Performance + |
Has DocumentationURL | http://darq.sf.net/ + |
Has Downloadpage | http://darq.sf.net/ + |
Has Evaluation | Evaluate the performance of the DARQ query engine. + |
Has EvaluationMethod | evaluate the performance of the DARQ query engine. + |
Has ExperimentSetup | we
split all data over two Sun-Fire-880 ma … we
ver a standard
100Mbit network connection. +split all data over two Sun-Fire-880 machines (8x sparcv9 CPU, 1050Mhz, 16GB RAM) running SunOS 5.10. The SPARQL endpoints were provided using Virtuoso Server 5.0.37 with an allowed memory usage of 8GB . Note that, although we use only two physical servers, there were five logical SPARQL endpoints. DARQ was running on Sun Java 1.6.0 on a Linux system with Intel Core Duo CPUs, 2.13 GHz and 4GB RAM. The machines were connected over a standard 100Mbit network connection. |
Has GUI | No + |
Has Hypothesis | - + |
Has Implementation | DARQ + |
Has InfoRepresentation | RDF + |
Has Limitations | {{{Limitations}}} + |
Has NegativeAspects | {{{NegativeAspects}}} + |
Has PositiveAspects | Query rewriting and cost-based query optimization to speed-up query execution. + |
Has Requirements | {{{Requirements}}} + |
Has Results | The experiments show that
our optimization … The experiments show that
same time
because Q2 is rewritten into Q1. +our optimizations significantly improve query evaluation performance. For query Q1 the execution times of optimized and unoptimized execution are almost the same. This is due to the fact that the query plans for both cases are the same and bind joins of all sub-queries in order of appearance is exact the right strategy. For queries Q2 and Q4 the unoptimized queries took longer than 10 min to answer and timed out, whereas the execution time of the optimized queries is quiet reasonable. The optimized execution of Q1 and Q2 takes almost the same time because Q2 is rewritten into Q1. |
Has Subproblem | Querying Distributed RDF Data Sources + |
Has Version | 1.0 + |
Has abstract | DARQ provides transparent query access to … DARQ provides transparent query access to multiple SPARQL services, i.e., it gives the user the impression to query one single RDF graph despite the real data being distributed on the web. A service description language enables the query engine to decompose a query into sub-queries, each of which can be answered by an individual service. DARQ also uses query rewriting and cost-based query optimization to speed-up query execution. optimization to speed-up query execution. + |
Has approach | decompose a query into sub-queries, each of which can be answered by an individual service. + |
Has authors | Bastian Quilitz + and Ulf Leser + |
Has conclusion | DARQ offers a single interface for queryin … DARQ offers a single interface for querying multiple, distributed SPARQL end-points and makes query federation transparent to the client. One key feature of DARQ is that it solely relies on the SPARQL standard and therefore is compatible to any SPARQL endpoint implementing this standard. Using service descriptions provides a powerful way to dynamically add and remove endpoints to the query engine in a manner that is completely transparent to the user. To reduce execution costs we introduced basic query optimization for SPARQL queries. Our experiments show that the optimization algorithm can drastically improve query performance and allow distributed answering of SPARQL queries over distributed sources in reasonable time. Because the algorithm only relies on a very small amount of statistical information we expect that further improvements are possible using techniques. An important issue when dealing with data from multiple data sources are differences in the used vocabularies and the representation of information. In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs. and identity relationships across graphs. + |
Has future work | In further work, we plan to work on mappin … In further work, we plan to work on mapping and translation rules between the vocabularies used by different SPARQL endpoints. Also, we will investigate generalizing the query patterns that can be handled and blank nodes and identity relationships across graphs. and identity relationships across graphs. + |
Has motivation | {{{Motivation}}} + |
Has platform | Jena + |
Has problem | SPARQL Query Federation + |
Has relatedProblem | Transparent query federation + |
Has subject | Querying Distributed RDF Data Sources + |
Has vendor | Open Source + |
Has year | 2008 + |
ImplementedIn ProgLang | Java + |
Proposes Algorithm | {{{ProposesAlgorithm}}} + |
RunsOn OS | Linux SunOS 5.10 + |
Title | Querying Distributed RDF Data Sources with SPARQL + |
Uses Framework | ARQ + |
Uses Methodology | {{{Methodology}}} + |
Uses Toolbox | No data available now. + |