Our proposal for a Sage CCSB, "Integrating cancer datasets for predictive model development and training," has as its Central scientific theme the generation of a set of probabilistic causal models for a series of tumor types from numerous collaborators. By selecting sample sets with different clinical outcomes, the resultant Sage models will have applications impacting cancer biology, early intervention, and cancer treatments. The Sage CCSB leverages the extensive work done at Rosetta/Merck on predictive models in numerous disease areas, which has been gifted to a new nonprofit medical research organization, "Sage Bionetworks." The Sage CCSB operational model contains a core platform of curated data, mathematical models and experienced investigators mentoring postdoctoral trainees/fellows. The data comes from collaborators and consists of DNA variation data, RNA expression data and clinical outcomes. The trainees will collate and annotate the genotypic, intermediate molecular phenotype, and clinical end point data from at least five different tumor-type cohorts and develop models that can predict potential new cancer targets, markers for early detection, and clinical outcomes. They will do externships at other sites (CCSBs), where they will build additional models of their data and facilitate reciprocal exchange of ideas. The trainees will delineate specifications for tools that will make the access to these models more scalable. Validation of their hypotheses will be performed at the Fred Hutchinson Cancer Research Center and the Netherlands Cancer Institute. This post-doctoral program will provide a unique training and mentorship environment in cancer systems biology and facilitate interactions between CCSBs and NCI.