SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking

SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking
SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking
Bibliographical Metadata
Subject:	Ontology matching
Keywords:	data integration, RDF interlinking, instance matching, linked data, entity recognition, entity search.
Year:	2011
Authors:	Samur Araujo, Jan Hidders, Daniel Schwabe, Arjen P. de Vries, Abraham Bernstein
Venue	ArXiv
Content Metadata
Problem:	Link Discovery
Approach:	No data available now.
Implementation:	SERIMI
Evaluation:	Accuracy Evaluation

Abstract

The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. In this paper, we propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud.

Conclusion

RDF instance matching in the context of interlinking RDF datasets published in the Linked Data Cloud is the task of determining if two resources are referred to the same entity in the real world. This is a challenging task in high demand by data publishers that wish to interlink their datasets in the cloud. In this work, we propose a novel approach, called SERIMI, for solving the RDF instance-matching problem automatically. SERIMI matches instances between a source and target datasets, without prior knowledge of the data, domain or schema of these datasets. It does so by approximating the notion of similarity by pairing instances based on entity labels as well as structural (ontological) context. As part of the SERIMI approach, we proposed the CRDS function to approximate that judgment of similarity. We used two collections proposed by the OAEI 2010 initiative to evaluate SERIMI. On average, SERIMI outperforms two representative systems, RiMOM and ObjectCoref, which tried to solve the same problem using the same collections and reference alignment, in 70% of the cases.

Future work

As future work, we intend to investigate how our model can be adjusted to consider partial string matching in the similarity function that we proposed, and to accommodate different score distribution metrics as the threshold for the parameter Also, we intend to evaluate this approach in different collections that may provide a more accurate reference alignment than the ones that we used in this work.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: https://github.com/samuraraujo/SERIMI-RDF-Interlinking

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: Mac OS X

Vendor: Open Source

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: Ruby

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: We have loaded all these datasets into an open-source instance of Virtuoso Universal server 10 , where around 2GB of data were loaded. An exception was the DBPedia dataset, which we accessed online via its Sparql endpoint. The Virtuoso server was installed in a Mac OS X – version 10.5.8, with 2.4 GHz Intel Core 2 Duo processor and with 4 GB 1067 MHz DDR3 of memory. We ran the script that implements the SERIMI approach directly over the local SPARQL endpoints and DBPedia online endpoint.

Evaluation Method : In order to evaluate the effectiveness of the proposed interlinking method, we used the precision, recall and F1 metrics.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: Accuracy

Benchmark used: DBpedia, Sider, DrugBank, LinkedCT, Dailymed, TCM, Diseasome

Results: No data available now.

Access API	No data available now. +
Event in series	ArXiv +
Has Benchmark	DBpedia +, Sider +, DrugBank +, LinkedCT +, Dailymed +, TCM + and Diseasome +
Has Challenges	No data available now. +
Has DataCatalouge	{{{Catalogue}}} +
Has Description	No data available now. +
Has Dimensions	Accuracy +
Has DocumentationURL	http://No data available now. +
Has Downloadpage	https://github.com/samuraraujo/SERIMI-RDF-Interlinking +
Has Evaluation	Accuracy Evaluation +
Has EvaluationMethod	In order to evaluate the effectiveness of the proposed interlinking method, we used the precision, recall and F1 metrics. +
Has ExperimentSetup	We have loaded all these datasets into an … We have loaded all these datasets into an open-source instance of Virtuoso Universal server 10 , where around 2GB of data were loaded. An exception was the DBPedia dataset, which we accessed online via its Sparql endpoint. The Virtuoso server was installed in a Mac OS X – version 10.5.8, with 2.4 GHz Intel Core 2 Duo processor and with 4 GB 1067 MHz DDR3 of memory. We ran the script that implements the SERIMI approach directly over the local SPARQL endpoints and DBPedia online endpoint. RQL endpoints and DBPedia online endpoint. +
Has GUI	No +
Has Hypothesis	No data available now. +
Has Implementation	SERIMI +
Has InfoRepresentation	No data available now. +
Has Limitations	No data available now. +
Has NegativeAspects	No data available now. +
Has PositiveAspects	No data available now. +
Has Requirements	No data available now. +
Has Results	No data available now. +
Has Subproblem	No data available now. +
Has Version	No data available now. +
Has abstract	The interlinking of datasets published in … The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. In this paper, we propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud. rlinking problem on the Linked Data Cloud. +
Has approach	No data available now. +
Has authors	Samur Araujo +, Jan Hidders +, Daniel Schwabe +, Arjen P. de Vries + and Abraham Bernstein +
Has conclusion	RDF instance matching in the context of in … RDF instance matching in the context of interlinking RDF datasets published in the Linked Data Cloud is the task of determining if two resources are referred to the same entity in the real world. This is a challenging task in high demand by data publishers that wish to interlink their datasets in the cloud. In this work, we propose a novel approach, called SERIMI, for solving the RDF instance-matching problem automatically. SERIMI matches instances between a source and target datasets, without prior knowledge of the data, domain or schema of these datasets. It does so by approximating the notion of similarity by pairing instances based on entity labels as well as structural (ontological) context. As part of the SERIMI approach, we proposed the CRDS function to approximate that judgment of similarity. We used two collections proposed by the OAEI 2010 initiative to evaluate SERIMI. On average, SERIMI outperforms two representative systems, RiMOM and ObjectCoref, which tried to solve the same problem using the same collections and reference alignment, in 70% of the cases. reference alignment, in 70% of the cases. +
Has future work	As future work, we intend to investigate h … As future work, we intend to investigate how our model can be adjusted to consider partial string matching in the similarity function that we proposed, and to accommodate different score distribution metrics as the threshold for the parameter Also, we intend to evaluate this approach in different collections that may provide a more accurate reference alignment than the ones that we used in this work. t than the ones that we used in this work. +
Has keywords	data integration, RDF interlinking, instance matching, linked data, entity recognition, entity search. +
Has motivation	No data available now. +
Has platform	No data available now. +
Has problem	Link Discovery +
Has relatedProblem	No data available now. +
Has subject	Ontology matching +
Has vendor	Open Source +
Has year	2011 +
ImplementedIn ProgLang	Ruby +
Proposes Algorithm	No data available now. +
RunsOn OS	Mac OS X +
Title	SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking +
Uses Framework	No data available now. +
Uses Methodology	No data available now. +
Uses Toolbox	No data available now. +

SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking

Contents

Abstract

Conclusion

Future work

Approach

Implementations

Research Problem

Evaluation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Search

Create

Data

Kuratierung

Tools