Querying the Web of Interlinked Datasets using VOID Descriptions
Querying the Web of Interlinked Datasets using VOID Descriptions | |
---|---|
Querying the Web of Interlinked Datasets using VOID Descriptions
| |
Bibliographical Metadata | |
Year: | 2012 |
Authors: | Ziya Akar, Tayfun Gökmen Halaç, Erdem Eser Ekinci, Oguz Dikenelli |
Venue | LDOW |
Content Metadata | |
Problem: | SPARQL Query Federation |
Approach: | analyzing query structure with respect to the metadata of datasets |
Implementation: | WoDQA |
Evaluation: | No evaluation exists. |
Contents
Abstract
Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint.
Conclusion
In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets. The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers. Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed.
Future work
Developing a tool which extracts well-defined VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-defined VOIDs are constructed.
Approach
Positive Aspects: {{{PositiveAspects}}}
Negative Aspects: {{{NegativeAspects}}}
Limitations: {{{Limitations}}}
Challenges: {{{Challenges}}}
Proposes Algorithm: {{{ProposesAlgorithm}}}
Methodology: {{{Methodology}}}
Requirements: {{{Requirements}}}
Limitations: {{{Limitations}}}
Implementations
Download-page: Https://sourceforge.net/projects/wodqa/&action=edit&redlink=1
Access API: -
Information Representation: RDF
Data Catalogue: {{{Catalogue}}}
Runs on OS: OS independent
Vendor: Open source
Uses Framework: {{{Framework}}}
Has Documentation URL: https://sourceforge.net/projects/wodqa/
Programming Language: {{{ProgLang}}}
Version: 1.0
Platform: -
Toolbox: -
GUI: No
Research Problem
Subproblem of: Query processing on Linked Data
RelatedProblem: missing results and large document retrieval.
Motivation: No data available now.
Evaluation
Experiment Setup: -
Evaluation Method : -
Hypothesis: -
Description: -
Dimensions: -
Benchmark used: -
Results: -
Access API | - + |
Event in series | LDOW + |
Has Benchmark | - + |
Has Challenges | {{{Challenges}}} + |
Has DataCatalouge | {{{Catalogue}}} + |
Has Description | - + |
Has Dimensions | - + |
Has DocumentationURL | https://sourceforge.net/projects/wodqa/ + |
Has Downloadpage | Https:sourceforge.net/projects/wodqa/&action=edit&redlink=1 + |
Has Evaluation | No evaluation exists. + |
Has EvaluationMethod | - + |
Has ExperimentSetup | - + |
Has GUI | No + |
Has Hypothesis | - + |
Has Implementation | WoDQA + |
Has InfoRepresentation | RDF + |
Has Limitations | {{{Limitations}}} + |
Has NegativeAspects | {{{NegativeAspects}}} + |
Has PositiveAspects | {{{PositiveAspects}}} + |
Has Requirements | {{{Requirements}}} + |
Has Results | - + |
Has Subproblem | Query processing on Linked Data + |
Has Version | 1.0 + |
Has abstract | Query processing is an important way of ac … Query processing is an important way of accessing data on the Semantic Web. Today, the Semantic Web is characterized as a web of interlinked datasets, and thus querying the
ofWoDQA is available as a SPARQL endpoint. +web can be seen as dataset integration on the web. Also, this dataset integration must be transparent from the data consumer as if she is querying the whole web. To decide which datasets should be selected and integrated for a query, one requires a metadata of the web of data. In this paper, to enable this transparency, we introduce a federated query engine called WoDQA (Web of Data Query Analyzer) which discovers datasets relevant with a query in an automated manner using VOID documents as metadata. WoDQA focuses on powerful dataset elimination by analyzing query structure with respect to the metadata of datasets. Dataset and linkset descriptions in VOID documents are analyzed for a SPARQL query and a federated query is constructed. By means of linkset concept of VOID, links between datasets are incorporated into selection of federated data sources. Current version ofWoDQA is available as a SPARQL endpoint. |
Has approach | analyzing query structure with respect to the metadata of datasets + |
Has authors | Ziya Akar +, Tayfun Gökmen Halaç +, Erdem Eser Ekinci + and Oguz Dikenelli + |
Has conclusion | In this paper, we have introduced a query … In this paper, we have introduced a query federation engine called WoDQA that discovers related datasets in a VOID store for a query and distributes the query over these datasets.
ble when well-dened VOIDs are constructed. +The novelty of our approach is exhaustive dataset selection mechanism which includes analysis of triple pattern relations and links between datasets besides analyzing datasets for each triple pattern. WoDQA focuses on discovering relevant datasets and eliminating irrelevant ones using a rule-based approach introduced in this paper. Our approach requires query the dataset, reflect actual content of the dataset completely and accurately, and include linksets between datasets to select datasets ectively. WoDQA allows users to construct raw queries without the need to know how query will divide into sub-queries and where sub-queries are executed. Query results are complete under the assumption of available, accurate and complete VOID descriptions of datasets. The initial version of WoDQA which is introduced in this paper has some disadvantages arising from query federation approach which WoDQA builds upon. As mentioned previously, follow-your-nose has some problems such as missing results and large document retrieval. Similar problems may occur for query federation. Firstly, to find complete results to queries, it is required that metadata of all datasets must be well-defined and accurate. But, to provide such an accurate dataset metadata an automated mechanism which continuously updates the metadata is required. However, even there would be a tool which implements this requirement, providing accurate dataset metadata via such a tool is the responsibility of dataset publishers. Another problems of query federation are high latency and low selectivity of datasets which are similar to retrieval of large documents in follow-your-nose. Query optimization can be a solution for these problems of query federation. Grouping triple patterns to lter more triples on an endpoint can prevent high latency (required processing time) and changing query evaluation order according to dataset selectivity statistics can prevent retrieving large result sets. To make WoDQA functioning in the wild, optimization step of query federation is required to be implemented. We plan to incorporate triple pattern selectivity into query reorganization using VOID properties about statistics. On the other hand, we could not make an evaluation of our approach in this paper, since VOID documents in current VOID stores are not well-dened. Since SPARQL endpoint denitions, linkset descriptions or vocabularies are missing in most of VOID documents, we could not nd a chance to execute comprehensive scenarios. Developing a tool which extracts well-dened VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-dened VOIDs are constructed. |
Has future work | Developing a tool which
extracts well-defi … Developing a tool which
e when well-defined VOIDs are
constructed. +extracts well-defined VOID descriptions of datasets, and by this means evaluating our approach is a required future work to confirm applicability of WoDQA on linked open data. Also, evaluating the analysis cost of WoDQA for a large VOID store will be possible when well-defined VOIDs are constructed. |
Has motivation | No data available now. + |
Has platform | - + |
Has problem | SPARQL Query Federation + |
Has relatedProblem | Missing results and large document retrieval. + |
Has vendor | Open source + |
Has year | 2012 + |
Proposes Algorithm | {{{ProposesAlgorithm}}} + |
RunsOn OS | OS independent + |
Title | Querying the Web of Interlinked Datasets using VOID Descriptions + |
Uses Framework | {{{Framework}}} + |
Uses Methodology | {{{Methodology}}} + |
Uses Toolbox | - + |