A Glossary of Terms

Below are some core terms frequently used throughout the book and more broadly in discussions of randomized field experiments.

A.1 Key Concepts

See the module on causal inference, estimands and estimators.

Potential outcome \(Y_i(T)\) The outcome \(Y\) that unit \(i\) would have under treatment condition \(T\). We think of these as fixed quantities for a specific point in time. \(T\) can be 0 for control or 1 for treatment if there is only one type of treatment. See the module on causal inference.
Treatment effect \(\tau_i\) for unit \(i\) The contrast between potential outcomes under two treatment conditions for unit \(i\). We typically define the treatment effect as the difference in potential outcomes under treatment and control, \(Y_i(1)-Y_i(0)\). See the module on causal inference.
Fundamental problem of causal inference in the counterfactual framework. We can’t observe both \(Y_i(1)\) and \(Y_i(0)\) for a given unit, so we can’t get \(\tau_i\) directly. See the module on causal inference.
Estimand The thing you want to estimate. An example of an estimand is the average treatment effect. In counterfactual causal inference, this is a function of potential outcomes, not fully observed outcomes. See the module on estimands and estimators.
Estimator How you make a guess about the value of your estimand from the data you have (i.e., observed). An example of an estimator is the difference-in-means. See the module on estimands and estimators.
- Average treatment effect, ATE The average of the treatment effect for all individuals in your subject pool. This is a type of estimand. If we define \(\tau_i\) to be \(Y_i(1)-Y_i(0)\), then the ATE is \(\overline{Y_i(1)-Y_i(0)}\), which is also equivalent to \(\overline{{Y}_i(1)}-\overline{{Y}_i(0)}\). Notice that we do not use the \(E[Y_i (1)]\) style of notation here because \(E[]\) means “average over repeated operations,” but \(\overline{Y}\) means “average over a set of observations”. See the module on causal inference and the module on estimands and estimators.
Random sampling Selecting subjects from a population with known probabilities strictly between 0 and 1.
\(k\)-arm experiment An experiment that has \(k\) treatment conditions (including control). See the module on randomization.
Random assignment Assigning subjects to experimental conditions with known probabilities strictly between 0 and 1. This is equivalent to random sampling without replacement from the potential outcomes. There are several strategies for random assignment: simple, complete, cluster, block, blocked-cluster. See the module on randomization.
External validity Findings from your study teach you about contexts outside of your sample — in other locations or for other interventions.

A.2 Statistical Inference

See modules on hypothesis testing and statistical power.

Hypothesis A simple, clear, falsifiable claim about the world. In counterfactual causal inference, this is a statement about a relationship among potential outcomes, like \(H_0: Y_i(T_i=0) = Y_i(T_i=1) + \tau_i\) for the hypothesis that the potential outcome under treatment is the potential outcome under control plus some effect for each unit \(i\). See the module on hypothesis testing.
Null hypothesis A conjecture about the world that you may reject after seeing the data. See the module on hypothesis testing.
Sharp null hypothesis of no effect The null hypothesis that there is no treatment effect for any subject. This means \(Y_i(1)=Y_i(0)\) for all \(i\). We might write this as \(H_0: Y_i(T_i=0) = Y_i(T_i=1)\). See the module on hypothesis testing.
\(p\)-value The probability of seeing a test statistic as large (in absolute value) as or larger than the test statistic calculated from observed data. See the module on hypothesis testing.
One-sided vs. two-sided test When you have a strong expectation that the effect is either positive or negative, you can conduct a one-sided test. When you do not have such a strong expectation, conduct a two-sided test. A one-sided test has more power than a two-sided test for the same experiment. See the module on hypothesis testing.
Standard deviation Square root of the mean-square deviation from the average of a variable. It is a measure of the dispersion or spread of a statistic. \(SD_x=\sqrt{\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2}\)
False Positive Rate/Type I Error of a Test A well-operating hypothesis test rejects a hypothesis about a true causal effect no more than \(\alpha\) % of the time. The false positive rate is the rate at which a test will cast doubt on a true hypothesis. It is the rate at which the test will encourage the analyst to say “statistically significant” when, in fact, there is no causal relationship. See the module on hypothesis testing.
Sampling distribution The distribution of estimates (e.g., estimates of the ATE) for all possible treatment assignments. In design-based statistical inference for randomized experiments, the distribution of estimates from an estimator is generated from randomizations. Many call this a “sampling distribution” because textbooks often use the idea of repeated samples from a population rather than repeated randomizations to describe this kind of variation.
Standard error The standard deviation of the sampling distribution. A bigger standard error means that our estimates are more susceptible to sampling variation. See the module on estimands and estimators.
Coverage of a confidence interval A well-operating confidence interval contains the true causal effect \(100 ( 1 - \alpha)\) % of the time. A confidence interval has incorrect coverage when it excludes the true parameter less than \(100 (1 - \alpha)\)% of the time. For example, a 95% confidence interval is supposed to only exclude the true parameter less than 5% of the time.
Statistical power of a test Probability that a test of causal effects will detect a statistically significant treatment effect if the effect exists. See the module on statistical power. This depends on:
- The number of observations in each arm of the experiment
- Effect size (usually measured in standardized units)
- Noisiness of the outcome variable
- Significance level (\(\alpha\), which is fixed by convention)
- Other factors including what proportion of your units are assigned to different treatment conditions.
Intra-cluster correlation How correlated the potential outcomes of units are within clusters compared to across clusters. Higher intra-cluster correlation hurts power.
Unbiased An estimator is unbiased if you expect that it will return the right outcome. That means that if you were to run the experiment many times, the estimate might be too high or to low sometimes but it will be right on average. See the module on estimands and estimators.
Bias Bias is the difference between the average value of the estimator across its sampling distribution and the single, fixed value of the estimand. See the module on estimands and estimators.
Consistency of an estimator An estimator that produces answers that become ever nearer to the true value of the estimand as the sample size increases is a consistent estimator of that estimand. A consistent estimator may or may not be unbiased. See the module on estimands and estimators.
Precision/Efficiency of an estimator The variation in or width of the sampling distribution of an estimator. See the module on estimands and estimators.

A.3 Randomization Strategies

See the module on randomization.

Simple An independent coin flip for each unit. You are not guaranteed that your experiment will have a specific number of treated units.
Complete Assign \(m\) out of \(N\) units to treatment. You know how many units will be treated in your experiment and each unit has a \(m/N\) probability of being treated. The number of ways treatment can be assigned (number of permutations of treatment assignment) is \(\frac{N!}{m!(N-m)!}\).
Block First divide the sample into blocks, then do complete randomization separately in each block. A block is a set of units within which you conduct random assignment.
Cluster Clusters of units are randomly assigned to treatment conditions. A cluster is a set of units that will always be assigned to the same treatment status.
Blocked-Cluster First form blocks of clusters. Then in each block, randomly assign the clusters to treatment conditions using complete randomization.

A.4 Factorial Designs

See the module on randomization.

Factorial design A design with more than one treatment, with each treatment assigned independently. The simplest factorial design is a 2 by 2.
Conditional marginal effect The effect of one treatment, conditional on the other being held at a fixed value. For example: \(Y_i(T_1=1|T_2=0)-Y_i(T_1=0|T_2=0)\) is the marginal effect of \(T_1\) conditional on \(T_2=0\).
Average marginal effect Main effect of each treatment in a factorial design. It is the average of the conditional marginal effects for all the conditions of the other treatment, weighted by the proportion of the sample that was assigned to each condition.
Interaction effect In a factorial design, we may also estimate interaction effects.
- No interaction effect: one treatment does not amplify or reduce the effect of the other treatment.
- Multiplicative interaction effect: the effect of one treatment depends on which condition a unit was assigned for the other treatment. This means one treatment does amplify or reduce the effect of the other. The effect of two treatments together is not the sum of the effect of each treatment.

A.5 Threats

See the module on threats.

Hawthorne effect When a subject responds to being observed.
Spillover When a subject responds to another subject’s treatment status. Example: my health depends on whether my neighbor is vaccinated, as well as whether I am vaccinated.
Attrition When outcomes for some subjects are not measured. This might be caused, for example, by people migrating, refusing to respond to endline surveys, or dying. This is especially problematic for inference when it is correlated with treatment status.
Compliance A unit’s treatment status matches its assigned treatment condition. Example of non-compliance: a unit assigned to treatment doesn’t take it. Example of compliance: a unit assigned to control does not take treatment.
Compliance types There are four types of units in terms of compliance:
- Compliers Units that would take treatment if assigned to treatment and would be untreated if assigned to control.
- Always-takers Units that would take treatment if assigned to treatment and if assigned to control.
- Never-takers Units that would be untreated if assigned to treatment and if assigned to control.
- Defiers Units that would be untreated if assigned to treatment and would take treatment if assigned to control.
One-sided non-compliance The experiment has only compliers and either always-takers or never-takers. Usually, we think of one-sided non-compliance as having only never-takers and compliers, meaning that that local average treatment effect is the effect of treatment on the treated.
Two-sided non-compliance The experiment may have all four latent groups.
Encouragement design An experiment that randomizes \(T\) (treatment assignment), and we measure \(D\) (whether the unit takes treatment) and \(Y\) (outcome). We can estimate the ITT and the LATE (local average treatment effect, aka CACE—complier average causal effect). It requires three assumptions.
- Monotonicity Assumption of either no defiers or no compliers. Usually we assume no defiers which means that the effect of assignment on take up of treatment is either positive or zero but not negative.
- First stage Assumption that there is an effect of \(T\) on \(D\).
- Exclusion restriction Assumption that \(T\) affects \(Y\) only through \(D\). This is usually the most problematic assumption.
Intention-to-treat effect (ITT) The effect of \(T\) (treatment assignment) on \(Y\).
Local average treatment effect (LATE) The effect of \(D\) (taking treatment) on \(Y\) for compliers. Also known as the complier average causal effect (CACE). Under the exclusion restriction and monotonicity, the LATE is equal to ITT divided by the proportion of your sample who are compliers.
Downstream experiment An encouragement design study that takes advantage of the randomization of \(T\) by a previous study. The outcome from that previous study is the \(D\) in the downstream experiment.