Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins

Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins
Bibliographical Metadata
Subject:	Querying Distributed RDF Data Sources
Keywords:	SPARQL, RDF, Distributed Querying
Year:	2008
Authors:	Jan Zemánek, Simon Schenk, Vojtěch Svátek, Abraham Bernstein
Venue	ISWC
Content Metadata
Problem:	SPARQL Query Federation
Approach:	No data available now.
Implementation:	Distributed SPARQL
Evaluation:	Performance Evaluation

Abstract

With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time.

Conclusion

We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs.

Future work

We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint. In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: -

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: Java

Version: No data available now.

Platform: Sesame

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: {{{Dimensions}}}

Benchmark used: No data available now.

Results: No data available now.

Access API	No data available now. +
Event in series	ISWC +
Has Benchmark	No data available now. +
Has Challenges	No data available now. +
Has DataCatalouge	- +
Has Description	No data available now. +
Has Dimensions	{{{Dimensions}}} +
Has DocumentationURL	http://No data available now. +
Has Downloadpage	http://No data available now. +
Has Evaluation	Performance Evaluation +
Has EvaluationMethod	No data available now. +
Has ExperimentSetup	No data available now. +
Has GUI	No +
Has Hypothesis	No data available now. +
Has Implementation	Distributed SPARQL +
Has InfoRepresentation	No data available now. +
Has Limitations	No data available now. +
Has NegativeAspects	No data available now. +
Has PositiveAspects	No data available now. +
Has Requirements	No data available now. +
Has Results	No data available now. +
Has Subproblem	No data available now. +
Has Version	No data available now. +
Has abstract	With the ever-increasing amount of data on … With the ever-increasing amount of data on the Web available at SPARQL endpoints the need for an integrated and transparent way of accessing the data has arisen. It is highly desirable to have a way of asking SPARQL queries that make use of data residing in disparate data sources served by multiple SPARQL endpoints. We aim at providing such a capability and thus enabling an integrated way of querying the whole Semantic Web at a time. querying the whole Semantic Web at a time. +
Has approach	No data available now. +
Has authors	Jan Zemánek +、Simon Schenk +、Vojtěch Svátek +和Abraham Bernstein +
Has conclusion	We briefly presented our Sesame extension … We briefly presented our Sesame extension Distributed SPARQL which aims at providing an integrated way of querying data sources scattered across multiple SPARQL endpoints. We shortly described its implementation and optimization used so far and outlined the direction for its future development. Distributed SPARQL is a part of Networked Graphs project and is publicly available at https://launchpad.net/networkedgraphs. at https://launchpad.net/networkedgraphs. +
Has future work	We would like to further improve the query … We would like to further improve the query evaluation performance by introducing a distributed join-aware join reordering. We will make use of the current Sesame optimization techniques for local queries and add our own component which will be re-ordering joins according to their relative costs. The costs will be based on statistics taking into account a sub-query selectivity combined with the distinction whether a triple pattern is supposed to be evaluated locally or at a remote SPARQL endpoint. In addition to join re-ordering we would like to make use of statistics about SPARQL endpoints in order to optimize queries even further. Hopefully the recent initiative called Vocabulary of Interlinked Datasets (http://community.linkeddata.org/MediaWiki/index.php?VoiD) will get to a point where it could be used for this purpose. t where it could be used for this purpose. +
Has keywords	SPARQL, RDF, Distributed Querying +
Has motivation	No data available now. +
Has platform	Sesame +
Has problem	SPARQL Query Federation +
Has relatedProblem	No data available now. +
Has subject	Querying Distributed RDF Data Sources +
Has vendor	No data available now. +
Has year	2008 +
ImplementedIn ProgLang	Java +
Proposes Algorithm	No data available now. +
RunsOn OS	No data available now. +
Title	Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins +
Uses Framework	No data available now. +
Uses Methodology	No data available now. +
Uses Toolbox	No data available now. +

Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-joins

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools