Harnessing Disagreement in Crowdsourcing a Relation Extraction Gold Standard

One of the first steps in any kind of web data analytics is creating a human annotated gold standard. These gold standards are created based on the assumption that for each annotated instance there is a single right answer. From this assumption it has always followed that gold standard quality can be measured in inter-annotator agreement. We challenge this assumption by demonstrating that for certain annotation tasks, disagreement reflects semantic ambiguity in the target instances. Based on this observation we hypothesize that disagreement is not noise but signal. We provide the first results validating this hypothesis in the context of creating a gold standard for relation extraction from text. In this paper, we present a framework for analyzing and understanding gold standard annotation disagreement and show how it can be harnessed for relation extraction in medical texts. We also show that crowdsourcing relation annotation tasks can achieve similar results to experts at the same task.

By: Lora Aroyo, Chris Welty

Published in: RC25371 in 2013


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .