Crowdsourcing & Human Computation for Data Labeling & Building Hybrid Systems
Matthew Lease, University of Texas at Austin, USA
This tutorial is aimed at those with little to intermediate experience with crowdsourcing and human computation, who are interested in learning more about the capabilities and limitations of crowdsourcing for: 1) collecting labeled data; and 2) integrating human computation with automation to build more effective, hybrid intelligent systems.
In addition to highlighting opportunities and challenges, the tutorial will provide practical “how to” knowledge on using the Mechanical Turk platform (with discussion of alternative platforms), review statistical approaches to aggregating noisy responses in order to minimize cost and maximize label accuracy, and discuss best practices for achieving efficient, inexpensive, and accurate results with crowdsourcing. Topics covered will provide a foundation for applying crowdsourcing in the context of one's own data mining research.
Matthew Lease is an Assistant Professor in the School of Information at the University of Texas at Austin. He holds a PhD in Computer Science from Brown University. His primary research expertise lies at the intersection of information retrieval (IR) and crowdsourcing. Lease recently spent a mini-sabbatical at CrowdFlower studying “real-world” crowdsourcing problems at industrial scale.
Lease received a DARPA 2012 Young Faculty Award for his ongoing crowdsourcing research, and he has offered past crowdsourcing tutorials at SIGIR 2011, WSDM 2011, CrowdConf 2011, and SIGIR 2012. Lease has also organized crowdsourcing workshops at SIGIR 2010, WSDM 2011, and SIGIR 2011. Since 2011, Lease has organized an annual shared task evaluation of crowdsourcing at the National Institute of Standards and Technology (NIST) Text REtrieval Conference (TREC).