Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins | |
---|---|
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
| |
Bibliographical Metadata | |
Subject: | Querying Distributed RDF Data Sources |
Keywords: | SPARQL, RDF, Distributed Querying |
Year: | 2008 |
Authors: | Jan Zemánek, Simon Schenk, Vojtěch Svátek, Abraham Bernstein |
Venue | ISWC |
Content Metadata | |
Problem: | SPARQL Query Federation |
Approach: | No data available now. |
Implementation: | Distributed SPARQL |
Evaluation: | Performance Evaluation |
Contents
Abstract
With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time.
Conclusion
We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs.
Future work
We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint. In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose.
Approach
Positive Aspects: No data available now.
Negative Aspects: No data available now.
Limitations: No data available now.
Challenges: No data available now.
Proposes Algorithm: No data available now.
Methodology: No data available now.
Requirements: No data available now.
Limitations: No data available now.
Implementations
Download-page: No data available now.
Access API: No data available now.
Information Representation: No data available now.
Data Catalogue: -
Runs on OS: No data available now.
Vendor: No data available now.
Uses Framework: No data available now.
Has Documentation URL: No data available now.
Programming Language: Java
Version: No data available now.
Platform: Sesame
Toolbox: No data available now.
GUI: No
Research Problem
Subproblem of: No data available now.
RelatedProblem: No data available now.
Motivation: No data available now.
Evaluation
Experiment Setup: No data available now.
Evaluation Method : No data available now.
Hypothesis: No data available now.
Description: No data available now.
Dimensions: {{{Dimensions}}}
Benchmark used: No data available now.
Results: No data available now.
Access API | No data available now. + |
Event in series | ISWC + |
Has Benchmark | No data available now. + |
Has Challenges | No data available now. + |
Has DataCatalouge | - + |
Has Description | No data available now. + |
Has Dimensions | {{{Dimensions}}} + |
Has DocumentationURL | http://No data available now. + |
Has Downloadpage | http://No data available now. + |
Has Evaluation | Performance Evaluation + |
Has EvaluationMethod | No data available now. + |
Has ExperimentSetup | No data available now. + |
Has GUI | No + |
Has Hypothesis | No data available now. + |
Has Implementation | Distributed SPARQL + |
Has InfoRepresentation | No data available now. + |
Has Limitations | No data available now. + |
Has NegativeAspects | No data available now. + |
Has PositiveAspects | No data available now. + |
Has Requirements | No data available now. + |
Has Results | No data available now. + |
Has Subproblem | No data available now. + |
Has Version | No data available now. + |
Has abstract | With the ever-increasing amount of data on … With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time. querying the whole Semantic Web at a time. + |
Has approach | No data available now. + |
Has authors | Jan Zemánek +, Simon Schenk +, Vojtěch Svátek + and Abraham Bernstein + |
Has conclusion | We briefly presented our Sesame extension … We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs. at https://launchpad.net/networkedgraphs. + |
Has future work | We would like to further improve the query … We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint.
t where it could be used for this purpose. +In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose. |
Has keywords | SPARQL, RDF, Distributed Querying + |
Has motivation | No data available now. + |
Has platform | Sesame + |
Has problem | SPARQL Query Federation + |
Has relatedProblem | No data available now. + |
Has subject | Querying Distributed RDF Data Sources + |
Has vendor | No data available now. + |
Has year | 2008 + |
ImplementedIn ProgLang | Java + |
Proposes Algorithm | No data available now. + |
RunsOn OS | No data available now. + |
Title | Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins + |
Uses Framework | No data available now. + |
Uses Methodology | No data available now. + |
Uses Toolbox | No data available now. + |