ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints
ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints | |
---|---|
ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints
| |
Bibliographical Metadata | |
Keywords: | Adaptive Query Processing, ANAPSID, Linked Data |
Year: | 2011 |
Authors: | Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo |
Venue | ISWC |
Content Metadata | |
Problem: | SPARQL Query FederationQuery ExecutionSource Selection, |
Approach: | Querying Distributed RDF Data Sources, |
Implementation: | ANAPSID |
Contents
Abstract
Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-then execute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traÆc is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.
Conclusion
We have defined ANAPSID, an adaptive query processing engine for RDF Linked Data accessible through SPARQL endpoints. ANAPSID provides a set of physical operators and an execution engine able to adapt the query execution to the availability of the endpoints and to hide delays from users. Reported experimental results suggest that our proposed techniques reduce execution times and are able to produce answers when other engines fail. Also, depending on the selectivity of the join operator and the data transfer delays, ANAPSID operators may overcome state-of-the-art Symmetric Hash Join operators. In the future, we plan to extend ANAPSID with more powerful and lightweight operators like Eddy and MJoin, which are able to route received responses through different operators and adapt the execution to unpredictable delays by changing the order in which each data item is routed.
Future work
In the future we plan to extend ANAPSID with more powerful and lightweight operators like Eddy and MJoin, which are able to route received responses through different operators, and adapt the execution to unpredictable delays by changing the order in which each data item is routed.
Approach
Positive Aspects: - decompose the query into simple sub-plans that can be executed by the remote endpoints. - propose a set of physical operators that gather data generated by the endpoints, and quickly produce responses. - an execution engine able to adapt the query execution to the availability of the endpoints and to hide delays from users.
Negative Aspects: -
Limitations: -
Challenges: Query Decomposition, Query Optimization, and Query adaptation.
Proposes Algorithm: -
Methodology: Lightweight wrappers translate SPARQL sub-queries into calls to endpoints as well as convert endpoint answers into ANAPSID internal structures. Mediators maintain information about endpoint capabilities, statistics that describe their content and performance, and the ontology used to describe the data accessible through the endpoint. The Local As View (LAV) approach is used to describe endpoints in terms of the ontology used in the endpoint dataset. Further, mediators implement query rewriting techniques, decompose queries into sub-queries against the endpoints, and gather data retrieved from the contacted endpoints. Currently, only SPARQL queries comprised of joins are considered; however, the rewriting techniques have been extended to consider all SPARQL operators, but this piece of work is out of the scope of this paper. Finally, mediators hide delays, and produce answers as quickly as data arrives.
Requirements: {{{Requirements}}}
Limitations: -
Implementations
Download-page: https://github.com/anapsid/anapsid
Access API: -
Information Representation: RDF
Data Catalogue: {{{Catalogue}}}
Runs on OS: Linux CentOS
Vendor: -
Uses Framework: Twisted Network framework
Has Documentation URL: https://github.com/anapsid/anapsid
Programming Language: Python 2.6.5
Version: 1
Platform: -
Toolbox: -
GUI: No
Research Problem
Subproblem of: query processing on Linked Data
RelatedProblem: decompose queries into sub-queries that can be executed by the selected endpoints
Motivation: distrinution of RDF datastores
Evaluation
Experiment Setup: {{{ExperimentSetup}}}
Evaluation Method : {{{EvaluationMethod}}}
Hypothesis: {{{Hypothesis}}}
Description: {{{Description}}}
Dimensions: {{{Dimensions}}}
Benchmark used: FedBench
Results: {{{Results}}}
Access API | - + |
Event in series | ISWC + |
Has Benchmark | FedBench + |
Has Challenges | Query Decomposition, Query Optimization, and Query adaptation. + |
Has DataCatalouge | {{{Catalogue}}} + |
Has Description | {{{Description}}} + |
Has Dimensions | {{{Dimensions}}} + |
Has DocumentationURL | https://github.com/anapsid/anapsid + |
Has Downloadpage | https://github.com/anapsid/anapsid + |
Has EvaluationMethod | {{{EvaluationMethod}}} + |
Has ExperimentSetup | {{{ExperimentSetup}}} + |
Has GUI | No + |
Has Hypothesis | {{{Hypothesis}}} + |
Has Implementation | ANAPSID + |
Has InfoRepresentation | RDF + |
Has Limitations | - + |
Has NegativeAspects | - + |
Has PositiveAspects | - decompose the query into simple sub-plan … - decompose the query into simple sub-plans that can be executed by the remote endpoints.
e endpoints and to hide delays from users. +- propose a set of physical operators that gather data generated by the endpoints, and quickly produce responses. - an execution engine able to adapt the query execution to the availability of the endpoints and to hide delays from users. |
Has Requirements | {{{Requirements}}} + |
Has Results | {{{Results}}} + |
Has Subproblem | Query processing on Linked Data + |
Has Version | 1 + |
Has abstract | Following the design rules of Linked Data, … Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-then execute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traÆc is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude. ases, in more than one order of magnitude. + |
Has approach | Querying Distributed RDF Data Sources, + |
Has authors | Maribel Acosta +, Maria-Esther Vidal +, Tomas Lampo + and Julio Castillo + |
Has conclusion | We have defined ANAPSID, an adaptive query … We have defined ANAPSID, an adaptive query processing engine for RDF Linked Data accessible through SPARQL endpoints. ANAPSID provides a set of physical operators and an execution engine able to adapt the query execution to the availability of the endpoints and to hide delays from users. Reported experimental results suggest that our proposed techniques reduce execution times and are able to produce answers when other engines fail. Also, depending on the selectivity of the join operator and the data transfer delays, ANAPSID operators may overcome state-of-the-art Symmetric Hash Join operators. In the future, we plan to extend ANAPSID with more powerful and lightweight operators like Eddy and MJoin, which are able to route received responses through different operators and adapt the execution to unpredictable delays by changing the order in which each data item is routed. e order in which each data item is routed. + |
Has future work | In the future we plan to extend ANAPSID wi … In the future we plan to extend ANAPSID with more powerful and lightweight operators like Eddy and MJoin, which are able to route received responses through different operators, and adapt the execution to unpredictable delays by changing the order in which each data item is routed. e order in which each data item is routed. + |
Has keywords | Adaptive Query Processing, ANAPSID, Linked Data + |
Has motivation | distrinution of RDF datastores + |
Has platform | - + |
Has problem | SPARQL Query FederationQuery ExecutionSource Selection, + |
Has relatedProblem | Decompose queries into sub-queries that can be executed by the selected endpoints + |
Has vendor | - + |
Has year | 2011 + |
ImplementedIn ProgLang | Python 2.6.5 + |
Proposes Algorithm | - + |
RunsOn OS | Linux CentOS + |
Title | ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints + |
Uses Framework | Twisted Network framework + |
Uses Methodology | Lightweight wrappers translate SPARQL sub- … Lightweight wrappers translate SPARQL sub-queries into calls to endpoints as well as convert endpoint answers into ANAPSID internal structures. Mediators maintain information about endpoint capabilities, statistics that describe their content and performance, and the ontology used to describe the data accessible through the endpoint.
roduce answers as quickly as data arrives. +The Local As View (LAV) approach is used to describe endpoints in terms of the ontology used in the endpoint dataset. Further, mediators implement query rewriting techniques, decompose queries into sub-queries against the endpoints, and gather data retrieved from the contacted endpoints. Currently, only SPARQL queries comprised of joins are considered; however, the rewriting techniques have been extended to consider all SPARQL operators, but this piece of work is out of the scope of this paper. Finally, mediators hide delays, and produce answers as quickly as data arrives. |
Uses Toolbox | - + |