The purpose of this K01 proposal is to develop innovative Big Data methodologies to improve cancer outcomes. I am a board-certified hematologist-oncologist completing a PhD in biomedical informatics at Stanford University. This proposal builds on my background and research in developing integrative analysis methods for multi-scale data. It also leverages the exceptional environment at Stanford for advanced training in machine learning, distributed computing, and longitudinal study analysis. Under the mentorship of my team of experts I will enhance my methodologies for improving knowledge discovery in cancer. Cancer research abounds with multi-scale data, from imaging to multi-modal molecular data, such as genomic, epigenomic, transcriptomic, and proteomic. Prediction models of clinical outcomes, including survival and therapeutic response, could capitalize on the richness of information that the data embody. In practice, however, the lack of effective methods for data integrative analysis leaves much of the latent knowledge untapped. For example, imaging data are routinely obtained for diagnostic purposes, but often underutilized in integrative analysis of cancer outcomes. By establishing inter-data correlations, imaging data have the potential to become noninvasive proxies for biopsy-acquired molecular data. Furthermore, traditional methods of data analysis have limited ability to extract knowledge from multi-scale data, which are large, heterogeneous, and exhibit complex inter-data interactions. This project outlines specific approaches to enhance knowledge extraction through integrative analyses that: (1) directly relates imaging data to molecular data, and (2) provides biomedical decision support (prediction of clinical outcomes) from multi-scale data. It applies these approaches to the analysis of brain and colorectal cancers. The training aims of the proposal are designed to further the research objectives by: (1) incorporating advanced machine learning skills to enhance information capture from each data source, (2) boosting computational efficiency and overall performance of the developed methodologies to ensure scalability, and (3) adapting methodologies to a longitudinal clinical study. The proposed project has the capacity to make a significant clinical impact by establishing the role of imaging data as a surrogate for molecular data, delineating potential therapeutic targets, and generating predictive markers for clinical outcomes. Importantly, these methodologies have a high potential to be generalizable to other cancers. Data from this project will cumulatively form the basis for an R01 proposal aimed at examining the optimal analysis of longitudinal multi-scale data to determine the minimum set of data needed to achieve maximum knowledge. The proposed work, designed for completion within the award period, will build on my research skills, generate preliminary data, forge productive collaborative relationships, and enable me to compete for R01 funding. In summary, this K01 will accelerate my career development and support launching my career as an independent physician-scientist in cancer data science research.