I will generate genome wide data sets in the laboratory that will provide the basis for developing novel computational tools to detect and, later, predict the function and regulatory targets of endogenous siRNAs (endo-siRNAs). The initial experiments consist of a combination of dsRNA enriched sequencing data, prepared using the J2 antibody, p19 enriched dsRNA siRNA sequencing libraries and PEG enriched small RNA sequencing libraries prepared from wild type and Dicer knockdown backgrounds. These data will then be combined with information from public resources such as Ensembl and ENCODE. Machine learning will be used to identify a set of features associated with Dicer dependent endo-siRNA loci. Clustering techniques will also be used to identify features that can distinguish distinct sets of small RNA producing loci. These analyses will be used as the basis for producing methods for interpreting sRNA sequencing data. These methods will be applied to publically available sRNA sequencing data for Dicer knockout systems and other tissues and cell lines to assess performance and generate more general expression profiles. I will use endo-siRNA perturbations, ChIP-seq and public Ago-CLIP datasets to identify potential targets of transcriptional gene silencing (TGS). I will examine the attributes of target sites for features that will allow development of a predictive method. There is growing evidence that Dicer processing of dRNAs, endo-siRNAs and TGS may play a role in a range developmental and medical systems, including learning, geographic atrophy and breast cancer. This proposal will address a pressing need for tools that can enable research in a rapidly expanding niche of cellular biology.