This renewal application proposes to carry out a Program Project of statistical methods research to address gaps and barriers arising in the analysis of large and complex data from observational studies in cancer research. The ultimate goal of the Program is to use rich data sources to develop effective strategies for reducing cancer burden in the U.S. and improving longevity and quality of life. This Program Project comprises three research projects and two cores. The three integrated projects jointly address the statistical needs for three research priority areas identified by the Division of Cancer Contro and Population Science of National Cancer Institute: Health Disparities; Comparative Effectiveness Research; and Public Health Genomics. In Project 1, we will develop statistical methods to overcome common data limitations for the investigation of social and racial disparities spanning the cancer continuum. We will analyze data from the SEER database that is linked with data from the National Longitudinal Mortality Survey (NLMS). In Project 2, we will develop methods for comparative effectiveness research (CER) in cancer using large observational data. We will use the SEER-Medicare data and the CaPSURE cohort to emulate complex randomized trials to compare the effectiveness of personalized strategies for cancer diagnosis and dynamic strategies for cancer treatment. In Project 3, we will develop statistical methods for analysis of next generation sequencing data in genetic cancer epidemiological studies. The proposed research in Project 3 is motivated by and applied to the Harvard lung cancer and breast cancer exome and targeted sequencing studies as well as the affiliated Genome-Wide Association Studies. The Administrative Core will coordinate the overall scientific direction and programmatic activities of the Program, which will include regular P01 meetings, seminars, the annual retreat, the external advisory committee meeting, short courses, a visitor program, dissemination of research results. The Statistical Computing Core will allow access to Harvard largest high performance computing cluster, perform data management, and ensure the development and dissemination of open access, high quality software. The Program PIs, Professors Xihong Lin and Francesca Dominici, are renowned biostatisticians with strong track records of methodological and collaborative research and academic administration.