Biological computation is the information-processing that cells carry out to make the myriad of decisions required to grow and sustain life. Uncovering what this computation is, remains far from trivial – cells are infuriatingly complex, they are noisy, run multiple operations in parallel, and there’s a blurred line between what we might consider to be biological software and hardware. This motivates the need for interdisciplinary approaches to extract knowledge from data, and to formulate predictive, explanatory models of cellular decision-making that can be used in the future to guide experiments, and ultimately for the development of novel therapies in medicine.
The Biological Computation group at Microsoft Research focuses on developing theory, methods and software for understanding and programming information-processing in biology. Our research currently centres on three areas: Molecular Programming, Synthetic Biology and Stem Cell Biology. We tackle key questions in these fields through the development of mathematical models and domain-specific computational tools, first seeking to find the right level of abstraction to model the system under study, and second, by designing tools that allow us to extract knowledge from data that can be used to parametrise or constrain such models.
As one example, in collaboration with experimental researchers at both the Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, and the University of Padua, we are investigating the information-processing that governs growth and development. Together we are studying the pluripotent nature of embryonic stem cells. Pluripotency is the unique characteristic of these cells to differentiate into all cell types of the adult body – from skin cells, to gut cells, to blood cells, to brain cells. This potency marks them as a potentially invaluable tool for medicine. Even more remarkably perhaps, is the discovery that the pluripotent state can be induced from fate-specified cells using only a handful of factors, which could allow us to bypass the embryo altogether. This paints a picture of a sort-of stem cell utopia: imagine being able to generate patient-specific pools of cells for those suffering from heart disease, Alzheimer’s or Parkinson’s disease, or even insulin-producing cells for those with diabetes?
While embryonic stem cells hold significant promise for cell therapies and regenerative medicine, we still lack a fundamental understanding of the molecular processes that determine how differentiation proceeds, and what directs an embryonic stem cell towards a specific lineage. ‘Reprogramming’ cells back to the naïve state is also poorly understood, it and remains an inefficient process. The goal of our interdisciplinary collaboration has been to derive the biological program governing installation and maintenance of the pluripotent state. To this end, we’ve borrowed techniques from the field of formal verification, which are traditionally used in computer science to verify the correctness of computer programs, or check for bugs in software. The aim here, however, is to verify that a potential biological program is consistent with what is known experimentally by translating experimental observations into formal specifications that must be satisfied by the model.
Models that simply explain what is already known are limited in their usefulness. Certainly, reconciling a large number of experimental results into a single model is a significant step, allowing one to capture the present state of understanding in the field, and even resolve counterintuitive results. However, if your model can be used to predict some as-yet untested behaviour, which is subsequently found to hold experimentally, then you gain confidence in this current explanation of biological function and have learned something new biologically to boot. Beyond this, an often-overlooked benefit of modelling is when your model fails to predict some untested behaviour accurately. Incorrect predictions can be extremely informative in that they expose a flaw in the prevailing understanding of the system, forcing you to reconsider the assumptions that you have made. Ultimately, the approach is iterative, and models will be refined as they are constrained against new experimental data – to paraphrase George Box*, no model will ever be perfect, but some models will be useful.
Following this approach, we have developed models of the information-processing at work at the transcriptional level in embryonic stem cells. First, by encoding previous experimental results as ‘program specifications’ we sought to capture an understanding of pluripotency that accounted for changes in gene expression due to changes in the cell’s environment, and as the result of molecular perturbations. Importantly, we could generate then predictions of untested behaviour, that were subsequently supported by experimental tests, underscoring the usefulness of this modelling approach. More recently, we have sought to apply this understanding to explain how the pluripotent state is established during ‘reprogramming’ of somatic cells to the naïve state, which has also allowed us to predict accurately how to accelerate and enhance the efficiency of this process.
Ultimately the tools that we are designing as a group, such as those we have applied to better understand stem cell decision-making, will be combined into a platform for programming biology. Such a platform will enable users to uncover the biological computation that governs cellular decision-making, and then to use this understanding to reprogram, design and engineer biological behaviour.
Sara-Jane Dunn will be giving a Keynote Address at ON Helix 2017. Come and find out more about the work Microsoft is doing in the life science sector. More information can be found here.
*A 20th Century British statistician
Written by, Sara-Jane Dunn, Scientist, Microsoft.
The One Nucleus blog is written by individuals and is not necessarily a reflection of the views held by One Nucleus.