ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints

From Openresearch
Revision as of 09:59, 22 April 2017 by Sahar (talk | contribs)
Jump to: navigation, search
ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints
ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints
Bibliographical Metadata
Authors: Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo
Content Metadata
Problem: SPARQL Query Federation

Query Execution

Source Selection
Property "Has problem" (as page type) with input value "SPARQL Query Federation

Query Execution

Source Selection" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
Approach: Querying Distributed RDF Data Sources
Implementation: Linux

Abstract

Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; read more.,

Conclusion

{{{Conclusion}}}

Future work

{{{Future work}}}

Approach

Positive Aspects: {{{PositiveAspects}}}

Negative Aspects: {{{NegativeAspects}}}

Limitations: {{{Limitations}}}

Challenges: {{{Challenges}}}

Proposes Algorithm: {{{ProposesAlgorithm}}}

Methodology: {{{Methodology}}}

Requirements: {{{Requirements}}}

Limitations: {{{Limitations}}}

Implementations

Download-page: {{{Download-page}}}

Access API: {{{API}}}

Information Representation: {{{InfoRepresentation}}}

Data Catalogue: {{{Catalogue}}}

Runs on OS: {{{OS}}}

Property "RunsOn OS" (as page type) with input value "{{{OS}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Vendor: {{{vendor}}}

Uses Framework: {{{Framework}}}

Has Documentation URL: {{{DocumentationURL}}}

Programming Language: {{{ProgLang}}}

Property "ImplementedIn ProgLang" (as page type) with input value "{{{ProgLang}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Version: {{{Version}}}

Platform: {{{Platform}}}

Toolbox: {{{Toolbox}}}

GUI: {{{GUI}}}

Research Problem

Subproblem of: {{{Subproblem}}}

Property "Has Subproblem" (as page type) with input value "{{{Subproblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

RelatedProblem: {{{RelatedProblem}}}

Property "Has relatedProblem" (as page type) with input value "{{{RelatedProblem}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Motivation: {{{Motivation}}}

Evaluation

Experiment Setup: {{{ExperimentSetup}}}

Evaluation Method : {{{EvaluationMethod}}}

Hypothesis: {{{Hypothesis}}}

Description: {{{Description}}}

Dimensions: {{{Dimensions}}}

Benchmark used: {{{Benchmark}}}

Property "Has Benchmark" (as page type) with input value "{{{Benchmark}}}" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Results: {{{Results}}}


Abstract

Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-then execute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traÆc is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.

Technology

Suports SPARQL 1.1


ANAPSID Query Processing Engine

The ANAPSID query engine provides a set of operators able to gather data from different endpoints. Opportunistically, these operators produce results by joining tuples previously received even when endpoints become blocked. Additionally, the physicaloperators implement main memory replacement policies to move previously computed matches to secondary memory, ensuring no duplicate generation. Each join operatormaintains a data structure called Resource Join Tuple (RJT), that records for each instantiation of the join variable(s), the tuples that have already matched. Suppose that or the instantiation of the variable ?X with the resource r, the tuples {T1, ..., Tn} have matched, then the RJT will be the pair (r, {T1, ..., Tn}), where the first argument, head of the RJT, corresponds to the resource and the second, tail of the RJT, is the list of tuples.

Conclusion

We have defined ANAPSID, an adaptive query processing engine for RDF Linked Data accessible through SPARQL endpoints. ANAPSID provides a set of physical operators and an execution engine able to adapt the query execution to the availability of the endpoints and to hide delays from users. Reported experimental results suggest that our proposed techniques reduce execution times and are able to produce answers when other engines fail. Also, depending on the selectivity of the join operator and the data transfer delays, ANAPSID operators may overcome state-of-the-art Symmetric Hash Join operators. In the future we plan to extend ANAPSID with more powerful and lightweight operators like Eddy and MJoin, which are able to route received responses through different operators, and adapt the execution to unpredictable delays by changing the order in which each data item is routed.

Access API{{{API}}} +
Has Challenges{{{Challenges}}} +
Has DataCatalouge{{{Catalogue}}} +
Has Description{{{Description}}} +
Has Dimensions{{{Dimensions}}} +
Has DocumentationURLhttp://{{{DocumentationURL}}} +
Has Downloadpagehttp://{{{Download-page}}} +
Has EvaluationMethod{{{EvaluationMethod}}} +
Has ExperimentSetup{{{ExperimentSetup}}} +
Has GUI{{{GUI}}} +
Has Hypothesis{{{Hypothesis}}} +
Has ImplementationLinux +
Has InfoRepresentation{{{InfoRepresentation}}} +
Has Limitations{{{Limitations}}} +
Has NegativeAspects{{{NegativeAspects}}} +
Has PositiveAspects{{{PositiveAspects}}} +
Has Requirements{{{Requirements}}} +
Has Results{{{Results}}} +
Has Version{{{Version}}} +
Has abstractFollowing the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; read more., +
Has approachQuerying Distributed RDF Data Sources +
Has authorsMaribel Acosta +, Maria-Esther Vidal +, Tomas Lampo + and Julio Castillo +
Has conclusion{{{Conclusion}}} +
Has future work{{{Future work}}} +
Has motivation{{{Motivation}}} +
Has platform{{{Platform}}} +
Has vendor{{{vendor}}} +
Proposes Algorithm{{{ProposesAlgorithm}}} +
TitleANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints +
Uses Framework{{{Framework}}} +
Uses Methodology{{{Methodology}}} +
Uses Toolbox{{{Toolbox}}} +