Consistent Model Selection in the p>>n Setting

Valen Earl Johnson

2 Collaborator(s)

Funding source

National Cancer Institute (NIH)

Among the most fundamental and commonly encountered statistical problems in medical research is the problem of model selection. Model selection is the process by which researchers identify the relationships between measured quantities; thus it plays a Central role in the analysis of essentially all high-throughput screening data. Model selection procedures represent the primary analytical mechanism through which the associations between diseases and large numbers of biochemical, genetic and pharmacological variables are discovered. The fundamental hypothesis tested in this application is that a new class of model selection procedures can be used to effectively identify associations between biological variables and disease outcomes, even in settings where there are many more potential biological correlates than there are observations on each variable. The goals of this project are to develop these variable selection procedures so that they can be applied to high-throughput screening data, and to apply the resulting methodology in three important application areas. To achieve these goals, the following specific aims will be addressed. Known theoretical properties of the proposed model selection procedures will be extended to cases in which there are many more biological measurements available than there are observations on each measurement (i.e., p n setting). Constraints on the number of variables that can be included in final models for outcome variables will be determined, and efficient numerical algorithms will be developed so that these methods can be applied to actual high-throughput screening data. The new model selection procedures will be used to define binary classification algorithms that can predict clinical outcomes from high-dimensional gene expression data sets. The new model selection procedures will be used to identify and analyze interactions between genes that are associated with cancer and other diseases in genome-wide association studies using single-nucleotide polymorphism data. The new model selection procedures will be used to analyze biological pathways as informed by high- throughput molecular interrogation data. The algorithms developed during this project constitute a major innovation in the field of model selection and will provide medical researchers with a new and unique set of tools for effectively identifying biological associations among biomarkers, disease attributes, and patient outcomes from high-throughput screening data.

Consistent Model Selection in the p>>n Setting

Valen Earl Johnson

2 Collaborator(s)

Funding source

Related projects

Scott Kopetz

AMPK Directs the Balance Between Mutant Kras and PI3K Signaling and Defines Molecular Subsets in Colorectal Cancer

Matthew P Goetz

Mayo Clinic Breast Cancer SPORE

Ramesh K Ganju

Role of S100A7 in Breast Cancer Progression and Metastasis

Wei Zhang

Characterization of Colorectal Cancer Genome by Deep Sequencing and Functional Validation of Selected Candidates

William G Kaelin

The RBP2/JARID1A/KDM5A Histone Demethylase as a Potential Drug Target in Cancer

Li Tang

Nutrigenetics of cruciferous vegetable intake and breast cancer prognosis

Lauren E Burkard

Mechanisms of Macrophage-Mediated Tumor Metastasis

Robert J Coffey

Role of EGFR Ligand-Containing Exosomes in Colorectal Cancer

Qing-Bai She

Targeting Translation Dependence in Colorectal Cancer Progression