While some cancer deaths are attributable to progression of the primary disease, many, if not the majority, are due to recurrent metastatic cancer that develops after successful definitive therapy for earlier stage disease. Most tumor registries, including SEER, do not capture recurrence. Therefore, remarkably little is known about the population-based incidence or patterns and outcomes of care for advanced recurrent cancer. A valid and reliable algorithm for identifying such recurrences in administrative data would enable a literal explosion of comparative effectiveness research on this common, costly, and lethal condition. In particular, an algorithm that could be used to identify recurrence in administrative data would make it possible to (1) conduct studies using disease-free survival as an outcome, and (2) would enable the identification of inception cohorts in whom to study patterns and outcomes of care for advanced recurrent disease. Through an existing multidisciplinary collaboration between Dana-Farber Cancer Institute and Cancer Research Network investigators, we have made considerable progress on the development of a recurrence algorithm, working in two unique data sets that contain complete claims linked to gold standard data on recurrence. To date, we have shown that published recurrence identification strategies have unacceptably low sensitivity and specificity in our recent, population-based data sets, and have developed a highly promising two-phase probabilistic model that first determines the probability of recurrence and then estimates the date on which it occurred. We now propose to build on this work, conducting further development and validation of the algorithm, and then applying it to generate policy-relevant data on the public health burden imposed by recurrent advanced cancer. Specifically, we will: (1) complete the development of a candidate algorithm for detecting recurrence after definitive therapy of non-metastatic lung, colorectal, breast, and prostate cancer by incorporating use of cross-validation estimates and rigorously assessing algorithm performance~ (2) employ novel methods to directly and indirectly validate the algorithm in several entirely new data sets~ and (3) apply thevalidated algorithm to estimate the proportion of all-cause mortality attributable to recurrence and the total annualized costs of care for patients with recurrent disease, compared to patients presenting with advanced disease at diagnosis. As more cancer patients survive and survive longer, the population at risk for recurrence increases. Our algorithm will enable a new generation of research on the effectiveness, quality, and outcomes of cancer care that takes into account this sentinel event in the cancer trajectory. In our applied studies, we will begin to capitalize on this opportunity by measuring the impact of this condition on the public health and the consumption of societal resources.