Our work focuses on the development and application of statistical and machine learning approaches that can exploit molecular and genomic data to assist in directing therapies to patients likely to benefit. Our efforts encompass both (i) direct prediction of therapeutic response and (ii) scalable estimation of molecular networks and dynamics that can shed light on disease mechanisms and heterogeneity, inform prediction of response and help in identifying promising therapeutic opportunities. We work on specific biomedical questions, addressed in collaboration with experimental groups, as well as methodological research in statistics and machine learning motivated by such questions. The potential of computational approaches in medicine is increasingly clear, but the challenges posed by noisy and incomplete data, biological and clinical heterogeneity and complex underlying processes and dynamics remain substantial. Our work is aimed at developing and exploiting statistical methods that can help to surmount some of these challenges. High-dimensional approaches, networks and graphical models and inference for dynamical systems are key methodological themes in much of our work. Two key ongoing projects are: Data-driven characterization of biological networks in cancer. How is the genomic heterogeneity of cancer manifested at the level of biological networks, such as those involved in cell signalling? Do cancers show altered “wiring” due to genomic aberrations? And if so, how? In close collaboration with experimental partners, we are working on both theoretical and applied aspects of these questions. We are also investigating whether protein signalling networks differ by cancer type and how networks can be used to help discover and define cancer subtypes. Finally, we are developing scalable methodologies by which to systematically assess causal network estimation approaches using interventional data. Statistical methods for personalized medicine. We are addressing statistical challenges that arise in the prediction of drug response from multiple high-throughput data types. These challenges include the large number of potential predictors (high-dimensionality), heterogeneity arising from known and unknown disease subtypes, limited number of samples and the need to integrate multiple data types.