Data Science Colloquia Spring 2014

iCal icon iCal Calendar

Unraveling the Mechanism of Chaperone-Mediated Protein Folding

Jian Peng

Tuesday, February 11, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Sina Ghaemmaghami

Chaperones are special proteins that aid the folding, unfolding, assembly and disassembly of other proteins. Chaperones rely on a large and diverse set of co-chaperones that regulate their specificity and function. How these co-chaperones regulate protein folding and whether they have chaperone-independent biological functions is largely unknown. In this talk, I will first present novel experimental and computational approaches to study the chaperone/co-chaperone/client interaction network in a systematic way. We delineated the relationship between the Hsp70 and Hsp90 chaperone systems, uncovered novel co-chaperones and clients, and established a surprisingly distinct network of protein-protein interactions for co-chaperones. Our results provided a rich resource for exploring how this network changes in the context of development and disease. Finally, I will discuss a case study on the interactions between Hsp90 and kinases, both being important drug targets for cancer therapy. Using graphical modeling and robust sparse regression methods, we identified striking associations between the binding specificity and a structural motif that includes deeply-buried hydrophobic residues in the kinase core region. Computation-guided mutagenesis validated the role of this motif in binding and suggested that Hsp90 recognizes intermediate kinase conformations by sensing the thermostability of the kinase core region. We anticipate our new results will advance the understanding of the role of Hsp90 in cancer drug development.

Jian Peng is a postdoctoral researcher in the Department of Mathematics and the Computer Science and Artificial Intelligence Lab at Massachusetts Institute of Technology. Working with Bonnie Berger, Jian is developing computational and statistical approaches for analyzing massive datasets in genomics, systems biology and molecular biology. Jian obtained his PhD in Computer Science from Toyota Technological Institute at Chicago in 2013. His doctoral research is on statistical inference for protein structure prediction. His prediction program RaptorX was ranked very top in the recent community-wide protein structure prediction competitions (CASP). Jian received Microsoft PhD Research Fellowship in 2010 and Young Investigator Award in the Conference on Retroviruses and Opportunistic Infections in 2011.

Machine Learning for Collective Intelligence

Qiang Liu

Friday, February 14, 2014
10:45 AM - 11:45 AM

Computer Studies Building, Room 209

Host: Dan Gildea

In recent years, intelligent systems have become much more powerful by exploiting "big data", incorporating massive volumes of data to improve their predictions. However, many of these data require some human intervention: labeling, rating, or otherwise curating or annotating the raw values. To accomplish this, crowdsourcing approaches outsource these human judgment tasks through the Internet. However, the (usually anonymous) crowd members are diverse in their quality and often unreliable or biased. This gives rise to a computational challenge of how to properly aggregate the results of the diverse crowd, and how to correct for bias by injecting a small amount of expert knowledge.

Probabilistic graphical models provide a powerful framework for aggregating multiple sources of information and reasoning over large numbers of variables. In this talk, I show how to approach the crowdsourcing problem using graphical model tools, which make it possible to leverage powerful inference algorithms such as belief propagation (BP) for crowd aggregation. When estimating continuous quantities such as event probabilities, point spreads and economic indicators, humans judgements are often systematically biased, which can be corrected only with extra ground truth information (e.g., qualification tests or control questions). We study the problem of how many control questions to use: more control questions evaluates the workers better, but leaves fewer questions for the unknown targets, and vice versa. We present theoretical results for this problem under different scenarios, and provide a simple rule of thumb for practice.

Qiang Liu is a Ph.D. candidate in the Bren School of Information and Computer Science at UC Irvine. His research focuses on machine learning and probabilistic graphical models, with applications to areas such as sensor networks, computational biology and crowdsourcing. He received a Microsoft Research Fellowship in 2011, and a notable paper award at the 2011 AI and Statistics conference.

Individual Genomes Reveal Deep Population Histories and Uncover the Evolutionary Rolesof Non Coding DNA

Ilan Gronau

Tuesday, February 18, 2014 and Wednesday, February 19, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Jack Werren

High throughput DNA sequencing has lead to a surge of genomic data, which is expected to revolutionize our knowledge of evolution and genomic function. In this talk, I will introduce some of the tough computational challenges we face when trying to make use of these rich data sets to resolve open questions in evolution. The talk will focus on methods I developed to reconstruct population histories and quantify the effect of recent natural selection using complete individual genome sequences. I will present work I did in utilizing these methods to discover deep splits in human population history, investigate the origins of domestic dogs, and examine the contribution of non coding regulatory elements to recent evolution of the human genome. I will conclude with a short survey of my ongoing research, and a map of the opportunities and challenges we face in the study of evolution in a world of rapidly evolving genomic data sets.

Ilan Gronau is a computational biologist studying evolution and population genetics. He received his PhD from the Computer Science department at the Technion and has a Masters degree in Bioinformatics from the Weizmann Institute. Since 2009, he has been a postdoctoral fellow in Adam Siepel's computational genomics lab in Cornell. Ilan develops computational methods for solving a wide range of fundamental evolutionary inference problems, such as phylogenetic reconstruction, demography inference, and detection of recent natural selection. His work combines innovative computational approaches and cutting edge genomic data sets to examine central open questions in evolution.

At the Intersection of Robot Planning and Natural Language Understanding

Thomas Howard

Thursday, February 27, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Wendi Heinzelman

Contemporary examples of autonomous robots exhibit enough intelligence to drive cars in human environments, manipulate objects on assembly lines, and explore distant planets. This performance is however often the result of significant engineering and algorithms that rely on strict simplifying assumptions that fail to extrapolate to more difficult scenarios. Adaptability of planning algorithms to novel tasks and environments is necessary for robots to meet or exceed human performance in domains such as manufacturing, agriculture, and exploration. Two key factors that influence the performance of planning algorithms are the representation of the decision space and the methods for searching it. In this talk, I will discuss my research towards improving the feasibility, optimality, and efficiency of robot decision spaces and present a new probabilistic model for inferring the formulation of robot planning problems from natural language instructions. Throughout the seminar I will highlight applications of my research on planetary rovers, field robots, autonomous automobiles, mobile manipulators, and robotic torsos. The talk will conclude with a presentation of my vision for how the amalgamation of robot planning, natural language understanding, and machine learning will improve the scalability of intelligent cyber-physical systems.

Thomas Howard is a Research Scientist in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. Dr. Howard1s research centers on robot intelligence in complex, unstructured environments with a specific focus on motion planning and natural language understanding. Previously, he was a Research Technologist II at the Jet Propulsion Laboratory and a Lecturer in Mechanical Engineering at the California Institute of Technology. He earned his Ph.D. in Robotics from Carnegie Mellon University in 2009 and his B.S. degrees in Mechanical Engineering and Electrical and Computer Engineering from the University of Rochester in 2004.

Ecohydrolology and Climatic Variability in the Present and Future

Stephen Good

Thursday, March 6, 2014 and Friday, March 7, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Carmie Garzione

Global climate regimes are characterized by both complex patterns in both intra-annual and inter-annual variability. Understanding the consequences of present and future climate variability on the global distribution of ecosystems and the services they provide is a critical issue in contemporary earth sciences. Using data from field studies, satellite observations, and large scale models I explore how climate is linked to ecosystem structure and function. My research is aimed at quantifying these linkages and how ecosystems, in turn, influence the global energy, water, and carbon cycles.

Stephen studied Mechanical Engineering as an undergraduate at Carnegie Mellon University, and then served as a Peace Corps Volunteer in the Dominican Republic working on rural water and sanitation problems. After the Peace Corps, Stephen completed his PhD in Environmental Engineering from Princeton University, and his dissertation research focused on the linkages between climate dynamics, biogeography, and ecosystem functionality in Africa. Currently, Stephen is a post-doctoral fellow in the Geology and Geophysics department at the University of Utah investigating how stable isotope tracers can inform our knowledge of the global hydrologic cycle.

Design Techniques for Crowdsourcing Complex Tasks

Edith Law

Thursday, March 20, 2014 and Friday, March 21, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Henry Kautz

Human computation (a.k.a. crowdsourcing) systems are theoretically interesting because they challenge the way we currently think about and build intelligent systems. We now have to design the system to take into account factors that affect how people compute, including their motivation, cognitive limitations and expertise. Having access to both automated algorithms and many human computers also means that, as system designers, we must explicitly reason about the division of labor -- between novices, experts, and machines -- that will lead to the best computational outcomes.

There are numerous examples of human computation systems achieving remarkable feats -- massively and rapidly labeling images (e.g., the ESP Game), digitizing books (e.g., reCAPTCHA), folding proteins (e.g., FoldIt), translating text (e.g., Duolingo). Yet, many of the problems tackled through crowdsourcing are simple, in that they require only basic perceptual abilities and common-sense knowledge, or that they can be handled by independent workers each having only a local view of solution. In this talk, I will describe several general design techniques for crowdsourcing complex tasks and specific examples of their use in developing a variety of human computation systems, including games with a purpose, and social computing platforms for planning and text summarization.

Can we extend existing crowdsourcing models to handle tasks that require substantially more expertise, such as research tasks involving the collection, annotation and analysis of scientific data? How can we lower the barrier of entry for scientists, who are domain experts but not necessarily technically savvy or familiar with crowdsourcing, to use crowdsourcing as a tool for their research? I will conclude by describing my research agenda on mixed-expertise crowdsourcing in the scientific domain, and a citizen science platform and research infrastructure, called Curio, for exploring an entirely new space of complex problems that can benefit from leveraging contributions from expert communities and non-expert crowds.

Edith Law is a CRCS postdoctoral fellow at the School of Engineering and Applied Sciences at Harvard University. She graduated from Carnegie Mellon University in 2012 with Ph.D. in Machine Learning, where she studied human computation systems that harness the joint efforts of machines and humans. She is a Microsoft Graduate Research Fellow, co-authored the book "Human Computation" in the Morgan & Claypool Synthesis Lectures on Artificial Intelligence and Machine Learning, co-organized the Human Computation Workshops, and helped create the first AAAI Conference on Human Computation and Crowdsourcing. Her work on games with a purpose and large-scale collaborative planning has received best paper honorable mentions at CHI.

The Physics of Complex Systems Through the Lens of Networks

Gourab Ghoshal

Thursday, March 27, 2014 and Friday, March 28, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Steve Teitel

The dynamics of systems in the real world are often the result of the interplay between their myriad structural constituents. Their bulk properties emerge as a manifestation of collective behavior and multiple subsystems where very frequently "the whole is greater than the (simple) sum of its parts." Systems with these properties are broadly classified as emergent or complex systems. Very frequently, it turns out that they consist of many interconnected components and consequently, the science of networks is an important aspect of studying complex systems.

I am a theoretical physicist with a strong interest both in analytic and computational modeling as well as their experimental and observational tests. My current research interests falls within the purview of the Physics of Living Systems from the macroscopic level involving the study of societal entities such as cities and social media, all the way down to the microscopic level where I'm studying biologically inspired chemical models for cells.

In this talk, I will focus on some of my work that falls in the intersection of Big Data and Complex Networks. Specifically I'll discuss some of my past and current work related to efficacy of search in knowledge databases and the emergence and characterization of trends in Social Media and other societally generated systems. Time permitting I'll touch upon some other aspects of my research program and outline directions for the future. Gourab Ghoshal is currently a Research Scientist in the Department of Earth and Planetary Sciences at Harvard University. He received his PhD in 2009 in Statistical Physics at the University of Michigan and then was a Postdoctoral Associate at the Center for the Study of Complex Systems at Harvard Medical School/Northeastern University and a Visiting Research Scholar at the Media Lab, MIT. His research interests are in the theory and application of Complex Networks, as well as Non-equilibrium Statistical Physics, Game theory, Econophysics, Dynamical Systems and the Origins of Life. He is the editor of a book on Complex Networks (published by Springer) and his work has been published in Nature, Science and Physical Review Letters.

Compressive Imaging and Display Systems

Gordon Wetzstein

Tuesday, April 1, 2014 and Wednesday, April 2, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Henry Kautz

Compressive image acquisition and display is an emerging platform for consumer electronics and other applications that explores the co-design of optics, electronics, applied mathematics, and real-time computing. Together, such hardware/software systems exploit compressibility of the recorded or presented data to facilitate new device form factors and relax requirements on electronics and optics. For instance, light field or glasses-free 3D displays usually show different perspectives of the same 3D scene to a range of different viewpoints. All these images are very similar and therefore highly compressible. By combining multilayer hardware architectures and directional backlighting with real-time implementations of light field tensor factorization, limitations of existing displays, for instance in resolution, contrast, depth of field, and field of view, can be overcome. A similar design paradigm also applies to light field and multi-spectral image acquisition, super-resolution and high dynamic range display, glasses-free 3D projection, computational lithography, microscopy, and many other applications. In this talk, we review the fundamentals of compressive camera and display systems and discuss their impact on future consumer electronics, remote sensing, scientific imaging, and human-computer interaction.

Gordon Wetzstein is a Research Scientist in the Camera Culture Group at the Massachusetts Institute of Technology. His research focuses on computational imaging and display systems as well as computational light transport. At the intersection of computer graphics, machine vision, optics, scientific computing, and perception, this research has a wide range of applications in next-generation consumer electronics, scientific imaging, human-computer interaction, remote sensing, and many other areas. Gordon's cross-disciplinary approach to research has been funded by DARPA, NSF, Samsung, and other grants from industry sponsors and research councils. In 2006, Gordon graduated with Honors from the Bauhaus in Weimar, Germany, and he received a Ph.D. in Computer Science from the University of British Columbia in 2011. His doctoral dissertation focuses on computational light modulation for image acquisition and display and won the Alain Fournier Ph.D. Dissertation Annual Award. He organized the IEEE 2012 and 2013 International Workshops on Computational Cameras and Displays, founded as a forum for sharing computational display design instructions with the DIY community, and presented a number of courses on Computational Displays and Computational Photography at ACM SIGGRAPH. Gordon won the best paper award for "Hand-Held Schlieren Photography with Light Field Probes" at ICCP 2011 and a Laval Virtual Award in 2005.

Modeling Microbial Ecosystems

Jack Gilbert

Tuesday, April 8, 2014 and Wednesday, April 9, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Jack Werren

The understanding of Earth’s climate and ecology requires multi‐scale observations of the biosphere, of which microbial life are a major component. However, to acquire and process physical samples of soil, water and air that comprise the appropriate spatial and temporal resolution to capture the immense variation in microbial dynamics, would require a herculean effort and immense financial resources dwarfing even the most ambitious projects to date. To overcome this hurdle we created the Earth Microbiome Project, a crowd-sourced effort to acquire physical samples from researchers around the world that are, importantly, contextualized with physical, chemical and biological data detailing the environmental properties of that sample in the location and time it was acquired. The EMP leverages these existing efforts in a systematic analysis of microbial taxonomic and functional dynamics across a vast array of environmental gradients. The EMP uses the data standards format to capture the environmental gradients, location, time and sampling protocol information about every sample donated by our valued collaborators. Physical samples are then processed using a standardized DNA extraction, PCR, and shotgun sequencing protocol to generate comparable data regarding the microbial community structure and function in each sample. To date we have processed >20,000 samples, and have >20,000 in the process of being analyzed. One of the key goals of the EMP is to map the spatiotemporal variability of microbial communities to capture the changes in important processes that need to be appropriately expressed in models to provide reliable forecasts of ecosystem phenotype across our changing planet.

Dr Jack A Gilbert earned his Ph.D. from Nottingham University, UK in 2002, and received his postdoctoral training in Canada at Queens University. We subsequently returned to the UK in 2005 and worked for Plymouth Marine Laboratory at a senior scientist until his move to Argonne National Laboratory and the University of Chicago in 2010. Dr Gilbert is an Environmental Microbiologist at Argonne National Laboratory, Associate Professor in the Department of Ecology and Evolution at University of Chicago, and senior fellow of the Institute of Genomic and Systems Biology. Dr. Gilbert is currently applying next-generation sequencing technologies to microbial metagenomics and metatranscriptomics to test fundamental hypotheses in microbial ecology. He has authored >100 publications and book chapters on metagenomics and approaches to ecosystem ecology ( He has focused on analyzing microbial function and diversity, with a specific focus on nitrogen and phosphorus cycling, with an aim of predicting the metabolic output from a community. He is currently working on generating observational and mechanistic models of microbial communities associated with aquatic and terrestrial ecosystems. He is on the board of the Genomic Standards Consortium (, is an senior section editor for PLoS ONE and senior editor for the ISME Journal and Environmental Microbiology, and is PI for the Earth Microbiome Project (, Home Microbiome Project (, Gulf Microbial Modeling Project (, and Hospital Microbiome Project (

Computational Methods for Data-Driven Study of Protein Structure and Function

Jinbo Xu

Thursday, April 10, 2014 and Friday, April 11, 2014
12:30 PM - 1:30 PM

Goergen Hall, Room 101

Host: Jack Werren

High-throughput sequencing has been producing a large amount of protein sequences, but many of them are missing solved structures and functional annotations, which are essential to the understanding of life process and diseases and also have tremendous implications to drug discovery and design. This talk will focus on protein homology detection and knowledge-based structure prediction, which are widely used for the elucidation of protein structure and function as well as protein evolutionary relationship. In particular, this talk will demonstrate how statistical machine learning (e.g., probabilistic graphical models) and optimization methods can be applied to address some fundamental challenges facing protein homology detection and protein folding by taking advantage of high-throughput sequencing.

Dr. Jinbo Xu is an associate professor at the Toyota Technological Institute at Chicago, a computer science research and educational institute located at the University of Chicago, and a research affiliate of the MIT Computer Science and Artificial Intelligence Laboratory. Dr. Xu’s research lies in machine learning, optimization and computational biology (especially protein bioinformatics and biological network analysis). He has developed several popular bioinformatics programs such as the CASP-winning RaptorX ( for protein structure prediction and IsoRank for comparative analysis of protein interaction networks. Dr. Xu is the recipient of Alfred P. Sloan Research Fellowship and NSF CAREER award.