Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins

From Openresearch
Jump to: navigation, search
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Bibliographical Metadata
Subject: Querying Distributed RDF Data Sources
Keywords: SPARQL, RDF, Distributed Querying
Year: 2008
Authors: Jan Zemánek, Simon Schenk, Vojtěch Svátek, Abraham Bernstein
Venue ISWC
Content Metadata
Problem: SPARQL Query Federation
Approach: No data available now.
Implementation: Distributed SPARQL
Evaluation: Performance Evaluation

Abstract

With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time.

Conclusion

We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs.

Future work

We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint. In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: -

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: Java

Version: No data available now.

Platform: Sesame

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: No data available now.

Results: No data available now.

Access APINo data available now. +
Event in seriesISWC +
Has BenchmarkNo data available now. +
Has ChallengesNo data available now. +
Has DataCatalouge- +
Has DescriptionNo data available now. +
Has Dimensions{{{Dimensions}}} +
Has DocumentationURLhttp://No data available now. +
Has Downloadpagehttp://No data available now. +
Has EvaluationPerformance Evaluation +
Has EvaluationMethodNo data available now. +
Has ExperimentSetupNo data available now. +
Has GUINo +
Has HypothesisNo data available now. +
Has ImplementationDistributed SPARQL +
Has InfoRepresentationNo data available now. +
Has LimitationsNo data available now. +
Has NegativeAspectsNo data available now. +
Has PositiveAspectsNo data available now. +
Has RequirementsNo data available now. +
Has ResultsNo data available now. +
Has SubproblemNo data available now. +
Has VersionNo data available now. +
Has abstractWith the ever-increasing amount of data on
With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time.
querying the whole Semantic Web at a time. +
Has approachNo data available now. +
Has authorsJan Zemánek +, Simon Schenk +, Vojtěch Svátek + and Abraham Bernstein +
Has conclusionWe briefly presented our Sesame extension
We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs.
at https://launchpad.net/networkedgraphs. +
Has future workWe would like to further improve the query
We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint.

In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets

(http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose.
t where it could be used for this purpose. +
Has keywordsSPARQL, RDF, Distributed Querying +
Has motivationNo data available now. +
Has platformSesame +
Has problemSPARQL Query Federation +
Has relatedProblemNo data available now. +
Has subjectQuerying Distributed RDF Data Sources +
Has vendorNo data available now. +
Has year2008 +
ImplementedIn ProgLangJava +
Proposes AlgorithmNo data available now. +
RunsOn OSNo data available now. +
TitleOptimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins +
Uses FrameworkNo data available now. +
Uses MethodologyNo data available now. +
Uses ToolboxNo data available now. +