Crowdsourced Semantic Annotation of Scientific Publications and Tabular Data in PDF

Significant amounts of knowledge in science and technology have so far not been published as Linked Open Data but are contained in the text and tables of legacy PDF publications. Making such information available as RDF would, for example, provide direct access to claims and facilitate surveys of related work. A lot of valuable tabular information that till now only existed in PDF documents would also finally become machine understandable. Instead of studying scientific literature or engineering patents for months, it would be possible to collect such input by simple SPARQL queries. The SemAnn approach enables collaborative annotation of text and tables in PDF documents, a format that is still the common denominator of publishing, thus maximising the potential user base. The resulting annotations in RDF format are available for querying through a SPARQL endpoint. To incentivise users with an immediate benefit for making the effort of annotation, SemAnn recommends related papers, taking into account the hierarchical context of annotations in a novel way. We evaluated the usability of SemAnn and the usefulness of its recommendations by analysing annotations resulting from tasks assigned to test users and by interviewing them. While the evaluation shows that even few annotations lead to a good recall, we also observed unexpected, serendipitous recommendations, which confirms the merit of our low-threshold annotation support for the crowd.

Speakers:

Jaana Takis

National Chung Hsing University, Taiwan, University of Bonn & Fraunhofer IAIS
www.cs.uni-bonn.de

Aqm Saiful Islam

National Chung Hsing University, Taiwan, University of Bonn & Fraunhofer IAIS
www.cs.uni-bonn.de

Christoph Lange

National Chung Hsing University, Taiwan, University of Bonn & Fraunhofer IAIS
www.cs.uni-bonn.de

Sören Auer

Director

Leibniz Information Center for Science & Technology
https://www.tib.eu/en/

Following stations at the universities of Dresden, Ekaterinburg, Leipzig, Pennsylvania, Bonn and the Fraunhofer Society, Prof. Auer was appointed Professor of Data Science and Digital Libraries at Leibniz Universität Hannover and Director of the TIB in 2017.

Search form

Crowdsourced Semantic Annotation of Scientific Publications and Tabular Data in PDF

Speakers:

Jaana Takis

Aqm Saiful Islam

Christoph Lange

Sören Auer