The Data Science Workshop on Computational Social Science

The workshop was held in on October 20, 2017. Featured in the Yale Daily News.


Dragomir Radev Daniel Spielman Harry Zhou
Yale University
The Human Components of Machine Learning
Jenn Wortman Vaughan
Microsoft Research
Machine learning is often viewed as an automated process. Data is fed to a learning algorithm that outputs a trained model which then makes predictions. In practice, however it is common for every step of this process to rely on humans in the loop. Training data is often generated through human activity, either as passive observations of social processesor actively crowdsourced annotations. Humans prepare this data for use by the algorithm, engineer features, and tweak the algorithm’s parameters to fit their needs. And in many applications to medicine, criminal justice, and other critical domains, humans must interpret the learned model’s predictions to determine how to best make use of these predictions in their own decision making process. In this talk, I’ll argue for the importance of understanding the humans in the loop. As one example, I’ll describe some of my own research on crowdsourcing aimed at understanding who the crowd is, what motivates them, and how they communicate. I’ll also touch on a new direction of research that I’m particularly excited about, studying how to make machine learning human-interpretable, and what human interpretability even means.
Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City. She studies algorithmic economics, machine learning, and social computing, often in the conext of prediction markets, crowdsourcing, and other human-in-the-loop systems. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn's 2009Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and a handful of best paper or best student paper awards. In her spare time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which has been held each year since 2006.

Coffee Break

Social Event Extraction: Inferring International Relations and Police Killings from the News
Brendan O'Connor
University of Massachusetts Amherst
What can text analysis tell us about society? Enormous corpora of news, social media, and historical documents record events, beliefs, and culture. Automated text analysis scales to large data sets, and can assist in discovering patterns and themes. I will discuss projects to extract event databases from the news, in the domains of international relations and police killings in the U.S. First, we analyze the raw text of 15 years of news articles to extract temporal trends of diplomacy, conflict, and military actions between pairs of countries, including, for example, the recent history of Israeli-Palestinian relations. Our model combines syntactic parsing and latent-variable probabilistic modeling to induce event classes, and we validate against predefined ontologies and databases of interstate conflict. Second, we tackle the surprising lack of systematic records on police killings of civilians in the U.S., by helping automate the extraction of these fatality events from news articles, in order to assist manual curation efforts. Our methods make use of distant supervision and outperform extractors used in previous NLP research. In addition to using natural language processing to advance social understanding, findings from the social sciences can better inform the design of artificial intelligence. Given time and interest, I will briefly overview our efforts to identify and fix dialectal and racial disparity in language technologies.
Brendan O'Connor is an assistant professor in the College of Information and Computer Sciences at the University of Massachusetts, Amherst. Prof. O'Connor works in computational social science, developing natural language processing, machine learning, and interface tools to help scientific investigation about political and social trends; for example, analyzing opinions and slang in Twitter, censorship in Chinese microblogs, and and political events reported in the news. His work has been featured in the New York Times and the Wall Street Journal. He received his PhD in 2014 from Carnegie Mellon University's Machine Learning Department, advised by Noah A. Smith, and has previously been a Visiting Fellow at the Harvard Institute for Quantitative Social Science, and an intern with the Facebook Data Science team. Before graduate school, he worked on crowdsourcing at CrowdFlower / Dolores Labs, and natural language search at Powerset. He holds an BS and MS in Symbolic Systems from Stanford University.
Using Bayesian methods to infer language spread.
Claire Bowern
Yale University
In this talk I present results of work that uses computational phylogenetics to infer the dates and trajectories of spread of the Pama-Nyungan family of Australian languages. I show how evolutionary approaches to language change allow us to model language spread and gain insights into both specific language histories and patterns of change more generally.
Claire Bowern is Professor of Linguistics. Her area of research is language documentation and historical linguistics, with a focus on the languages of Australia. Following a PhD at Harvard in historical linguistics, she spent four years at Rice University before moving to Yale in 2008. She runs the HistLing/Pama-Nyungan lab in Yale's Linguistics Department, where researchers combine computational (particularly phylogenetic) and theoretical approaches to language changes with ethnographic practices. She is also the author of 4 books: two reference grammars and two textbooks, and the editor (with Bethwyn Evans) of the Routledge Handbook of Historical Linguistics.


Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech
Jesse Shapiro
Brown University
We study trends in the partisanship of congressional speech from 1873 to 2016. We define partisanship to be the ease with which an observer could infer a congressperson’s party from a fixed amount of speech, and we estimate it using a structural choice model and methods from machine learning. Our method corrects a severe finite-sample bias that we show arises with standard estimators. The results reveal that partisanship is far greater in recent years than in the past, and that it increased sharply in the early 1990s after remaining low and relatively constant over the preceding century. Our method is applicable to the study of high-dimensional choices in many domains, and we illustrate its broader utility with an application to residential segregation.
Jesse Shapiro is the George S. and Nancy B. Parker Professor of Economics at Brown University. Prior to joining Brown University in 2015 he was the Chookaszian Family Professor of Economics at the University of Chicago Booth School of Business. Shapiro received his BA in economics in 2001 and his PhD in economics in 2005 from Harvard University. He is a Research Associate at the National Bureau of Economic Research and a former editor of the Journal of Political Economy. He was a 2011-12 Alfred P. Sloan Research Fellow.

Probabilistic Typology: Deep Generative Models of Vowel Inventories
Ryan Cotterell
Johns Hopkins University
Linguistic typology studies the range of structures present in human language. The main goal of the field is to discover which sets of possible phenomena are universal, and which are merely frequent. For example, all languages have vowels, while most—but not all—languages have an [u] sound. In this paper we present the first probabilistic treatment of a basic question in phonological typology: What makes a natural vowel inventory? We introduce a series of deep stochastic point processes, and contrast them with previous computational, simulation-based approaches. We provide a comprehensive suite of experiments on over 200 distinct languages.
Ryan is a fourth year Ph.D. student in the Johns Hopkins Computer Science department affiliated with the Center for Language and Speech Processing, where he is coadvised by Jason Eisner and David Yarowsky. He specializes in natural language processing, computational linguistics and machine learning, focusing on deep learning and statistical approaches to phonology, morphology, linguistic typology and low-resource languages. He has received best paper awards at ACL 2017 and EACL 2017 and two honorable mentions for best paper at EMNLP 2015 and NAACL 2016. Previously, he was a visiting Ph.D. student at the Center for Information and Language Processing at LMU Munich supported by a Fulbright Fellowship and a DAAD Research Grant under the supervision of Hinrich Schütze. Since Fall 2016 he has been supported by an NDSEG graduate fellowship and since 2017 by the Fredrick Jelinek Fellowship.

Robots for Autism
Brian Scassellati
Yale University
In the last decade, there has been a slowly growing interaction between robotics researchers and clinicians to look at the viability of using robots as a tool for enhancing therapeutic and diagnostic options forindividuals with autism spectrum disorder. While much of the early work in using robots for autism therapy lacked clinical rigor, new research is beginning to demonstrate that robots improve engagement and elicit novel social behaviors from people (particularly children and teenagers) with autism. However, why robots in particular show this capability, when similar interactions with other technology or with adults or peers fails to show this response, remains unknown. This talk will present some of the most recent evidence showing robots eliciting social behavior from individuals with autism and discuss some of the mechanisms by which these effects may be generated. As a diagnostic tool, robots offer a social press that is repeatable and controllable to allow for standardization of interactive stimuli across individuals and across time. Because robots can provide consistent, reliable actions, clinicians can ensure that identical stimuli are presented at each diagnostic session. Furthermore, the component systems in socially aware robots may offer non-interactive methods for tracking human-human social behaviors. The perceptual systems of these robots are designed to measure and quantify social behavior—that is, exactly the skills that must be identified during diagnosis.
Brian Scassellati is a Professor of Computer Science, Cognitive Science, and Mechanical Engineering at Yale University and Director of the NSF Expedition on Socially Assistive Robotics. His research focuses on building embodied computational models of human social behavior, especially the developmental progression of early social skills. Using computational modeling and socially interactive robots, his research evaluates models of how infants acquire social skills and assists in the diagnosis and quantification of disorders of social development (such as autism). His other interests include humanoid robots, human-robot interaction, artificial intelligence, machine perception, and social learning. Dr. Scassellati received his Ph.D. in Computer Science from the Massachusetts Institute of Technology in 2001. His dissertation work (Foundations for a Theory of Mind for a Humanoid Robot) with Rodney Brooks used models drawn from developmental psychology to build a primitive system for allowing robots to understand people. His work at MIT focused mainly on two well-known humanoid robots named Cog and Kismet. He also holds a Master of Engineering in Computer Science and Electrical Engineering (1995), and Bachelors degrees in Computer Science and Electrical Engineering (1995) and Brain and Cognitive Science (1995), all from MIT. Dr. Scassellati's research in social robotics and assistive robotics has been recognized within the robotics community, the cognitive science community, and the broader scientific community. He was named an Alfred P. Sloan Fellow in 2007 and received an NSF CAREER award in 2003. His work has been awarded five best-paper awards. He was the chairman of the IEEE Autonomous Mental Development Technical Committee from 2006 to 2007, the program chair of the IEEE International Conference on Development and Learning (ICDL) in both 2007 and 2008, and the program chair for the IEEE/ACM International Conference on Human-Robot Interaction (HRI) in 2009.

Coffee Break

Does Restricting Information make the Crowd Smarter?
Vineet Kumar
Yale University
Crowdsourcing has been widely used for fairly accurate information gathering, and digital technologies enlarge the scope. We examine how herding in crowdsourcing can arise based on how much information is made available to participants in such a cros setting. Using a randomized experiment with an aggregator of crowdsourced financial information, we investigate the causal impact of differential informational environments on overall quality and quantity of information. We find significant user heterogeneity i n how they respond to different designs, and evaluate strategies for how the value of crowdsourcing can be improved.
Vineet studies how firms should create and deliver value in digital products and services, and strategies firms should adopt in designing digital products. His research focuses on developing economic models using microfoundations-based structural economic and game theoretic models to understand consumer and firm behavior in markets for technology products and services. At Yale, he teaches an elective MBA elective class on Digital Strategy and a doctoral level course on Networks. He also directs industry collaborations and projects with companies through the Yale Center for Customer Insights. Vineet received his PhD from Carnegie Mellon University and holds an undergraduate degree in engineering from Indian institute of Technology, Madras.

Using phylogenetic methods to understand cultural variation in Australian Aboriginal societies
Duncan Learmouth
Durham University
Australian Aboriginal societies have a diverse and complex cultural heritage expressed, in particular, through differences in ritual, myth, song and art. This variation can be analyzed using phylogenetic tree-modeling methods that may help us answer such questions as: 1. Can changes in ritual complexity be explained by variation in other elements of Australian social life, and 2. Can some cultural elements, such as rock art motifs, be used as 'population markers' to trace historic movements of people within Australia. Preliminary analyses associated with answering both these questions will be presented at the workshop.
Duncan Learmouth is a PhD student in Anthropology at Durham University in the UK studying the application of phylogenetic methods to the study of cultural variation in Australian societies. He completed his Masters degree in Evolutionary Anthropology at University College London (UCL) in 2016. He is currently collaborating with Claire Bowern at Yale on applying the Australian (Pama-Nyungan) language phylogeny to answer questions related to cultural variation.

Short Presentations