Recent interest to promote and support replication efforts assume that there is well-established methodological guidance for designing and implementing these studies. However, no such consensus exists in the methodology literature. This article addresses these challenges by describing design-based approaches for planning systematic replication studies. Our general approach is derived from the Causal Replication Framework (CRF), which formalizes the assumptions under which replication success can be expected. The assumptions may be understood broadly as replication design requirements and individual study design requirements. Replication failure occurs when one or more CRF assumptions are violated. In design-based approaches to replication, CRF assumptions are systematically tested to evaluate the replicability of effects, as well as to identify sources of effect variation when replication failure is observed. In direct replication designs, replication failure is evidence of bias or incorrect reporting in individual study estimates, while in conceptual replication designs, replication failure occurs because of effect variation due to differences in treatments, outcomes, settings, and participant characteristics. The paper demonstrates how multiple research designs may be combined in systematic replication studies, as well as how diagnostic measures may be used to assess the extent to which CRF assumptions are met in field settings.
Researchers are rarely satisfied to learn only whether an intervention works, they also want to understand why and under what circumstances interventions produce their intended effects. These questions have led to increasing calls for implementation research to be included in evaluations. When an intervention protocol is highly standardized and delivered through verbal interactions with participants, a set of natural language processing techniques termed semantic similarity can be used to provide quantitative measures of how closely intervention sessions adhere to a standardized protocol, as well as how consistently the protocol is replicated across sessions. Given the intense methodological, budgetary, and logistical challenges in conducting implementation research, semantic similarity approaches have the benefit of being low-cost, scalable, and context agnostic for use. In this paper, we demonstrate the application of semantic similarity approaches in an experiment and discuss strengths and limitations, as well as the most appropriate contexts for applying this method.
Education researchers have traditionally faced severe data limitations in studying local policy variation; administrative data sets capture only a fraction of districts’ policy decisions, and it can be expensive to collect more nuanced implementation data from teachers and leaders. Natural language processing and web-scraping techniques can help address these challenges by assisting researchers in locating and processing policy documents located online. School district policies and practices are commonly documented in student and staff manuals, school improvement plans, and meeting minutes that are posted for the public. This article introduces an end-to-end framework for collecting these sorts of policy documents and extracting structured policy data: The researcher gathers all potentially relevant documents from district websites, narrows the text corpus to spans of interest using a text classifier, and then extracts specific policy data using additional natural language processing techniques. Through this framework, a researcher can describe variation in policy implementation at the local level, aggregated across state- or nationwide populations even as policies evolve over time.
Since the start of the War on Poverty in the 1960s, social scientists have developed and refined experimental and quasi-experimental methods for evaluating and understanding the ways in which public policies, programs, and interventions affect people’s lives. The overarching mission of many social scientists is to understand “what works” in education and social policy. These are causal questions about whether an intervention, practice, program, or policy affects some outcome of interest. Although causal questions are not the only relevant questions in program evaluation, they are assumed by many in the fields of public health, economics, social policy, and now education to be the scientific foundation for evidence-based decision making. Fortunately, over the last half-century, two methodological advances have improved the rigor of social science approaches for making causal inferences. The first was acknowledging the primacy of research designs over statistical adjustment procedures. Donald Campbell and colleagues showed how research designs could be used to address many plausible threats to validity. The second methodological advancement was the use of potential outcomes to specify exact causal quantities of interest. This allowed researchers to think systematically about research design assumptions and to develop diagnostic measures for assessing when these assumptions are met. This article reviews important statistical methods for estimating the impact of interventions on outcomes in education settings, particularly programs that are implemented in field, rather than laboratory, settings.
Given the widespread use of nonexperimental (NE) methods for assessing program impacts, there is a strong need to know whether NE approaches yield causally valid results in field settings. In within-study comparison (WSC) designs, the researcher compares treatment effects from an NE with those obtained from a randomized experiment that shares the same target population. The goal is to assess whether the stringent assumptions required for NE methods are likely to be met in practice. This essay provides an overview of recent efforts to empirically evaluate NE method performance in field settings. We discuss a brief history of the design, highlighting methodological innovations along the way. We also describe papers that are included in this two-volume special issue on WSC approaches and suggest future areas for consideration in the design, implementation, and analysis of WSCs.
Replication has long been a cornerstone for establishing trustworthy scientific results, but there remains considerable disagreement about what constitutes a replication, how results from these studies should be interpreted, and whether direct replication of results is even possible. This article addresses these concerns by presenting the methodological foundations for a replication science. It provides an introduction to the causal replication framework, which defines “replication” as a research design that tests whether two (or more) studies produce the same causal effect within the limits of sampling error. The framework formalizes the conditions under which replication success can be expected, and allows for the causal interpretation of replication failures. Through two applied examples, the article demonstrates how the causal replication framework may be utilized to plan prospective replication designs, as well as to interpret results from existing replication efforts.