Integration of Scholarly Communication Metadata using Knowledge Graphs

From Openresearch
Revision as of 14:31, 28 June 2018 by Said (talk | contribs) (Created page with "{{Paper |Title=Integration of Scholarly Communication Metadata using Knowledge Graphs |Authors=Afshin Sadeghi, Christoph Lange, Maria-Esther Vidal, Sören Auer, |Series=TPDL |...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Integration of Scholarly Communication Metadata using Knowledge Graphs
Integration of Scholarly Communication Metadata using Knowledge Graphs
Bibliographical Metadata
Year: 2017
Authors: Afshin Sadeghi, Christoph Lange, Maria-Esther Vidal, Sören Auer
Venue TPDL
Content Metadata
Problem: No data available now.
Approach: No data available now.
Implementation: No data available now.
Evaluation: No data available now.

Abstract

Important questions about the scientific community, e.g., what authors are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required to answer such questions is often scattered over multiple isolated datasets. Recently, the Knowledge Graph (KG) concept has been identified as a means for interweaving heterogeneous datasets and enhancing answer completeness and soundness. We present a pipeline for creating high quality knowledge graphs that comprise data collected from multiple isolated structured datasets. As proof of concept, we illustrate the different steps in the construction of a knowledge graph in the domain of scholarly communication metadata (SCM-KG). Particularly, we demonstrate the benefits of exploiting semantic web technology to reconcile data about authors, papers, and conferences. We conducted an experimental study on an SCM-KG that merges scientific research metadata from the DBLP bibliographic source and the Microsoft Academic Graph. The observed results provide evidence that queries are processed more effectively on top of the SCM-KG than over the isolated datasets, while execution time is not negatively affected.

Conclusion

In this paper, we presented the concept of Scholarly Communication Metadata Knowledge Graph (SCM-KG), which integrates heterogeneous, distributed schemas, data and metadata from a variety of scholarly communication data sources. As a proof-of-concept, we developed an SCM-KG pipeline to create a knowledge graph by integrating data collected from heterogeneous data sources. We showed the capability of parallelization in rule-based data mappings, and we also presented how semantic similarity measures are applied to determine the relatedness of concepts in two resources in terms of the relatedness of their RDF interlinking structure. Results of the empirical evaluation suggest that the integration approach pursued by the SCM-KG pipeline is able to effectively integrate pieces of information spread across different data sources. The experiments suggest that the rule based mapping together with semantic structure based instance matching technique implemented in the SCM-KG pipeline integrates data in a knowledge graph with high accuracy. Although our initial use case addresses the scientific metadata domain, we generated billions of triples with high accuracy in mapping and linking, and we regard it capable at an industrial scale and in use cases demanding high precision.

Future work

In the context of the OSCOSS project on Opening Scholarly Communication in the Social Sciences, the SCM-KG approach will be used for providing authors with precise and complete lists of references during the article writing process.

Approach

Positive Aspects: No data available now.

Negative Aspects: No data available now.

Limitations: No data available now.

Challenges: No data available now.

Proposes Algorithm: No data available now.

Methodology: No data available now.

Requirements: No data available now.

Limitations: No data available now.

Implementations

Download-page: No data available now.

Access API: No data available now.

Information Representation: No data available now.

Data Catalogue: {{{Catalogue}}}

Runs on OS: No data available now.

Vendor: No data available now.

Uses Framework: No data available now.

Has Documentation URL: No data available now.

Programming Language: No data available now.

Version: No data available now.

Platform: No data available now.

Toolbox: No data available now.

GUI: No

Research Problem

Subproblem of: No data available now.

RelatedProblem: No data available now.

Motivation: No data available now.

Evaluation

Experiment Setup: No data available now.

Evaluation Method : No data available now.

Hypothesis: No data available now.

Description: No data available now.

Dimensions: No data available now.

Benchmark used: No data available now.

Results: No data available now.

Access APINo data available now. +
Event in seriesTPDL +
Has BenchmarkNo data available now. +
Has ChallengesNo data available now. +
Has DataCatalouge{{{Catalogue}}} +
Has DescriptionNo data available now. +
Has DimensionsNo data available now. +
Has DocumentationURLhttp://No data available now. +
Has Downloadpagehttp://No data available now. +
Has EvaluationNo data available now. +
Has EvaluationMethodNo data available now. +
Has ExperimentSetupNo data available now. +
Has GUINo +
Has HypothesisNo data available now. +
Has ImplementationNo data available now. +
Has InfoRepresentationNo data available now. +
Has LimitationsNo data available now. +
Has NegativeAspectsNo data available now. +
Has PositiveAspectsNo data available now. +
Has RequirementsNo data available now. +
Has ResultsNo data available now. +
Has SubproblemNo data available now. +
Has VersionNo data available now. +
Has abstractImportant questions about the scientific c
Important questions about the scientific community, e.g., what authors

are the experts in a certain field, or are actively engaged in international collaborations, can be answered using publicly available datasets. However, data required to answer such questions is often scattered over multiple isolated datasets. Recently, the Knowledge Graph (KG) concept has been identified as a means for interweaving heterogeneous datasets and enhancing answer completeness and soundness. We present a pipeline for creating high quality knowledge graphs that comprise data collected from multiple isolated structured datasets. As proof of concept, we illustrate the different steps in the construction of a knowledge graph in the domain of scholarly communication metadata (SCM-KG). Particularly, we demonstrate the benefits of exploiting semantic web technology to reconcile data about authors, papers, and conferences. We conducted an experimental study on an SCM-KG that merges scientific research metadata from the DBLP bibliographic source and the Microsoft Academic Graph. The observed results provide evidence that queries are processed more effectively on top of the SCM-KG than

over the isolated datasets, while execution time is not negatively affected.
execution time is not negatively affected. +
Has approachNo data available now. +
Has authorsAfshin Sadeghi +, Christoph Lange +, Maria-Esther Vidal + and Sören Auer +
Has conclusionIn this paper, we presented the concept of
In this paper, we presented the concept of Scholarly Communication Metadata Knowledge Graph (SCM-KG), which integrates heterogeneous, distributed schemas, data and

metadata from a variety of scholarly communication data sources. As a proof-of-concept, we developed an SCM-KG pipeline to create a knowledge graph by integrating data collected from heterogeneous data sources. We showed the capability of parallelization in rule-based data mappings, and we also presented how semantic similarity measures are applied to determine the relatedness of concepts in two resources in terms of the relatedness of their RDF interlinking structure. Results of the empirical evaluation suggest that the integration approach pursued by the SCM-KG pipeline is able to effectively integrate pieces of information spread across different data sources. The experiments suggest that the rule based mapping together with semantic structure based instance matching technique implemented in the SCM-KG pipeline integrates data in a knowledge graph with high accuracy. Although our initial use case addresses the scientific metadata domain, we generated billions of triples with high accuracy in mapping and linking, and we regard it capable at an industrial scale and in use cases demanding high

precision.
and in use cases demanding high precision. +
Has future workIn the context of the OSCOSS project on Opening Scholarly Communication

in the Social Sciences, the SCM-KG approach will be used for providing authors with

precise and complete lists of references during the article writing process. +
Has motivationNo data available now. +
Has platformNo data available now. +
Has problemNo data available now. +
Has relatedProblemNo data available now. +
Has vendorNo data available now. +
Has year2017 +
ImplementedIn ProgLangNo data available now. +
Proposes AlgorithmNo data available now. +
RunsOn OSNo data available now. +
TitleIntegration of Scholarly Communication Metadata using Knowledge Graphs +
Uses FrameworkNo data available now. +
Uses MethodologyNo data available now. +
Uses ToolboxNo data available now. +