Colorectal cancer (CRC) is the second leading cause of cancer death in the US. Linkage studies and genome-wide association studies (GWAS) have successfully identified high-penetrance mutations such as those that occur in APC or DNA mismatch-repair genes, as well as low-penetrance variants such as 8q24 and SMAD7. However, these variants explain only a fraction of the heritability of CRC. This is not surprising, as contributions from large classes of genetic variation, specifically less frequent and rare singl nucleotide variants (SNV) with allele frequency of 0.1-5%, insertion/deletions (indels), and copy number variants (CNVs), have not been systematically investigated across the genome. These genetic variants are predicted to have stronger effect sizes than common low-penetrance variants and are postulated to explain a substantial proportion of the heritability of CRC. To comprehensively identify these variants across the genome, we propose to use next generation technology to sequence the whole genome with 12x coverage in 2,123 high-risk CRC cases and 2,123 controls (Aim 1.1). These cases and controls will be selected from our existing Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO; U01CA137088, PI: Peters) of 15 well-characterized prospective cohorts and case-control studies. We demonstrate that combining whole genome sequence data with imputation using existing GWAS data in large sets of case-control studies allows a powerful and efficient screen for CRC susceptibility loci. This method is particularly well suited to identifying less frequent and rare SNVs, indels, and CNVs. Accordingly, in Aim 1.2 we use the sequencing data from Aim 1.1 to impute ~20M variants in an additional 8,958 CRC cases and 10,212 controls with existing GWAS data. We will test the associations between CRC risk and variants (sequenced and imputed) in a total of 11,081 cases and 12,335 controls. In Aim 1.3, we will replicate the most promising loci by genotyping 3,000 variants in 8,827 independent CRC cases and 8,595 controls. In Aim 2, we will investigate gene-environment interactions for directly sequenced and imputed variants, utilizing GECCO studies, which have detailed clinical and epidemiologic data that have already been harmonized across studies. To improve the power for Aim 1 and 2, we will apply novel statistical methods. This project brings together a highly qualified, multidisciplinary team of investigators with expertise in CRC research, biostatistics, population and statistical genetics, epidemiology, and next generation sequencing. We expect to identify several novel CRC susceptibility variants with effect sizes larger than previous GWAS findings. These results will improve our understanding of which genes are impacting CRC. Such knowledge about the underlying biology could have long term impacts on screening, treatment and disease prevention.