---
title: "Estimation and Hypothesis Testing 2 | *Les estimateurs et les tests d'hypothèses 2*"
author: "Adikath + Macartan; Vin + Yannick"
date: today
bibliography: assets/learningdays-book.bib
format:
  revealjs:
    embed-resources: true
---

## A Quick Reminder | *Un pétit rappel*


```{r, include = FALSE}
library(DeclareDesign)
library(estimatr)
library(knitr)
set.seed(1)
df <- fabricate(N = 100, 
               Coffee = rep(0:1, N/2), 
               Sport = rnorm(N), 
               Energy = .2 + .4*Coffee  + 
                 .3* Sport   + 
                 .2* Sport*Coffee + rnorm(N))
```

::: {.columns}
::: {.column .lang-en width="50%"}

- Remember: Analyze as you randomize
- We prefer estimators that are unbiased and have greater precision
- Hypothesis testing can be simple with linear regression

:::
::: {.column .lang-fr width="50%"}

- N'oubliez pas : analysez comme vous randomisez
- Nous préférons les estimateurs non biaisés et plus précis
- Les tests d'hypothèse peuvent être simples avec la régression linéaire

:::
:::

# Multiple Arm Experiments | *Les expériences avec plusieurs bras*

## Estimator 1: Difference-in-Means | *Estimateur 1 : La différence en moyennes*

| | | |
|:--:|:--:|:--:|
| $Z_A$ only | $Z_B$ only | Neither (control) |

::: {.columns}
::: {.column .lang-en width="50%"}

- We can always take the difference-in-means between any two groups.

:::
::: {.column .lang-fr width="50%"}

- Nous pouvons toujours tenir compte de la différence de moyennes entre deux groupes.

:::
:::

## Estimator 2: Linear regression | *Estimateur 2 : La régression linéaire*

$$Y_i = {\alpha} + {\beta_A} Z_{Ai} + {\beta_B} Z_{Bi} + e_i$$
$$Y_i = {\alpha} + {\beta_A} Z_{Ai} + {\beta_B} Z_{Bi} + {\gamma} X_i + e_i$$

::: {.columns}
::: {.column .lang-en width="50%"}

- Regression with an indicator variable for each of the two treatment arms.
    - $Z_{Ai}=1$ if unit $i$ has treatment $Z_A$, 0 otherwise
    - $Z_{Bi}=1$ if unit $i$ has treatment $Z_B$, 0 otherwise
- We can also do covariate adjustment at the same time.

:::
::: {.column .lang-fr width="50%"}

- Régression avec une variable indicatrice pour chacun des deux bras de traitement.
    - $Z_{Ai}=1$ si l'unité $i$ a le traitement $Z_A$, sinon 0
    - $Z_{Bi}=1$ si l'unité $i$ a le traitement $Z_B$, sinon 0
- Nous pouvons également effectuer un ajustement covariable en même temps.

:::
:::

## Estimator 2: Linear regression | *Estimateur 2 : La régression linéaire*

$$Y_i = {\alpha} + {\beta_A} Z_{Ai} + {\beta_B} Z_{Bi} + e_i$$

::: {.columns}
::: {.column .lang-en width="50%"}

- $\hat{\beta_A}$ is the $\widehat{ATE}$ of $Z_A$ (compared with control).
- $\hat{\beta_B}$ is the $\widehat{ATE}$ of $Z_B$ (compared with control).
- How do we estimate the effect of $Z_B$ compared to $Z_A$?

:::
::: {.column .lang-fr width="50%"}

- $\hat{\beta_A}$ est $\widehat{ATE}$ de $Z_A$ (par rapport au contrôle).
- $\hat{\beta_B}$ est $\widehat{ATE}$ de $Z_B$ (par rapport au contrôle).
- Comment estimer l'effet de $Z_B$ par rapport à $Z_A$ ?

:::
:::

## Estimator 2: Linear regression | *Estimateur 2 : La régression linéaire*

::: {.columns}
::: {.column width="55%"}

![](images/regression2.png){width="220px" fig-align="center"}

:::
::: {.column width="45%"}

$Z_A$ only

$Z_B$ only

Neither (control)

$$Y_i = {\alpha} + {\beta_A} Z_{Ai} + {\beta_B} Z_{Bi} + e_i$$

:::
:::

## Estimators for Multi-arm Designs | *Les estimateurs pour les expériences avec plusieurs bras*


# Block Randomization | *Randomisation par bloc (ou stratifiée)*

## Block Randomization | *Randomisation par bloc*

::: {.columns}
::: {.column .lang-en width="50%"}

- Block randomization is like doing a separate experiment in each block.
- We present 2 estimators for block randomization. Others are also available.

:::
::: {.column .lang-fr width="50%"}

- La randomisation par bloc est comme faire une expérience distincte dans chaque bloc.
- Nous présentons 2 estimateurs pour la randomisation par bloc. D'autres sont également disponibles.

:::
:::

## Estimator 1: Blocked Difference-in-Means | *Estimateur 1 : La différence des moyennes par bloc*

::: {.columns}
::: {.column .lang-en width="50%"}

- Calculate the $\widehat{ATE_j}$ for each block using difference in means. $j$ indicates which block.
- The $\widehat{ATE}$ is the average of the block-level $\widehat{ATE_j}$ weighted by block size $N_j / N$.
- You can use this estimator even when the probability of treatment assignment is different by blocks.

:::
::: {.column .lang-fr width="50%"}

- Calculez $\widehat{ATE_j}$ pour chaque bloc en utilisant la différence des moyennes.
- $\widehat{ATE}$ est la moyenne pondérée de $\widehat{ATE_j}$ pondérée par la taille du bloc $N_j / N$.
- Nous pouvons utiliser cet estimateur même si la probabilité d'assignation du traitement diffère selon les blocs.

:::
:::

## Estimator 1: Blocked Difference-in-Means | *Estimateur 1 : La différence des moyennes par bloc* {.compact-table}

::: {.columns}
::: {.column .lang-en width="50%"}

| Unit | Block | $Z_i$ | $Y_i$ |
|:----:|:-----:|:-----:|:-----:|
| a | Q | 0 | 4 |
| b | Q | 1 | 3 |
| c | Q | 0 | 2 |
| d | R | 1 | 3 |
| e | R | 0 | 0 |
| f | R | 0 | 2 |
| g | S | 1 | 4 |
| h | S | 0 | 0 |
| i | S | 0 | 2 |
| j | S | 1 | 4 |

:::
::: {.column .lang-fr width="50%"}

$$\widehat{ATE}_Q = \frac{3}{1}-\frac{4+2}{2}= 0$$
$$\widehat{ATE}_R = \frac{3}{1}-\frac{0+2}{2}= 2$$
$$\widehat{ATE}_S = \frac{4+4}{2}-\frac{0+2}{2}= 3$$

$$\widehat{ATE} = \frac{N_Q}{N}\widehat{ATE}_Q + \frac{N_R}{N}\widehat{ATE}_R + \frac{N_S}{N}\widehat{ATE}_S$$
$$= \frac{3}{10}*0 + \frac{3}{10}*2 + \frac{4}{10}*3 = \frac{9}{5}$$

:::
:::

## Estimator 2: Linear Regression with Block Fixed Effects | *Estimateur 2 : La régression linéaire avec effets fixes par bloc*

$$Y_{ij} = \alpha_0 + \beta_1 Z_{ij} + \gamma_A BlockA_{ij} + \gamma_B BlockB_{ij} + ... + \epsilon_{ij}$$

::: {.columns}
::: {.column .lang-en width="50%"}

- You can use linear regression with block fixed effects, applying weights to each observation.
- The weight is the inverse of the proportion of subjects in the same block who were assigned to the same condition.

:::
::: {.column .lang-fr width="50%"}

- Nous pouvons utiliser la régression linéaire avec des effets fixes par bloc, en appliquant des pondérations à chaque observation.
- Le poids est l'inverse de la proportion de sujets d'un même bloc assignés à la même condition.

:::
:::

$$w_{ij} = \frac{z_i}{p_{ij}} + \frac{1-z_i}{1-p_{ij}} \text{, where } p_{ij}\equiv\frac{m_j}{N_j}$$

## Block randomization in R | *Randomisation par bloc en R*

::: {.columns}
::: {.column .lang-en width="50%"}

```r
library(estimatr)
difference_in_means(Y ~ t, blocks = block_variable)

lm_robust(Y ~ treatment + as.factor(block_variable),
          weights = weight_variable)
```

:::
::: {.column .lang-fr width="50%"}

```r
library(estimatr)
difference_in_means(Y ~ t, blocks = block_variable)

lm_robust(Y ~ treatment + as.factor(block_variable),
          weights = weight_variable)
```

:::
:::

# Cluster Randomization | *Randomisation par grappe*

## Estimator: Regression with cluster-robust standard errors | *Estimateur : La régression avec des erreurs types robustes au niveau du cluster*

$$Y_{ic} = {\alpha_0} + {\beta_1} Z_{c} + e_{ic}$$
$$Y_{ic} = {\alpha_0} + {\beta_1} Z_{c} + {\gamma} X_{ic} + e_{ic}$$

::: {.columns}
::: {.column .lang-en width="50%"}

- Our analysis has to take into account the fact that treatment is assigned at the cluster level with *cluster-robust standard errors*.
- $\hat{\beta_1}$ is the $\widehat{ATE}$ of the treatment on individual units.
- We can also do covariate adjustment at the same time.

:::
::: {.column .lang-fr width="50%"}

- Notre analyse doit prendre en compte le fait que le traitement est attribué au niveau du cluster avec des *erreurs types robustes au niveau du cluster*.
- $\hat{\beta_1}$ est $\widehat{ATE}$ du traitement sur les unités individuelles.
- Nous pouvons également effectuer un ajustement covariable en même temps.

:::
:::

## Cluster randomization | *Randomisation par grappe*

![](images/reg_cluster.png){width="400px" fig-align="center"}

::: {.columns}
::: {.column .lang-en width="50%"}

```r
library(estimatr)
lm_robust(Y ~ treatment, clusters = cluster_variable)
lm_robust(Y ~ treatment + covariate, clusters = cluster_variable)
```

:::
::: {.column .lang-fr width="50%"}

```r
library(estimatr)
lm_robust(Y ~ treatment, clusters = cluster_variable)
lm_robust(Y ~ treatment + covariate, clusters = cluster_variable)
```

:::
:::

# Factorial Design | *La conception factorielle*

## Estimator 1: Difference-in-Means | *Estimateur 1 : La différence en moyennes* {.compact-table}

| | $Z_2 = 0$ | $Z_2 = 1$ | Effect of $Z_2$ |
|:--|:--|:--|:--|
| $Z_1 = 0$ | Neither | $Z_2$ only | $\beta_2$ |
| $Z_1 = 1$ | $Z_1$ only | Both $Z_1$ and $Z_2$ | $\beta_2 + \beta_3$ |
| Effect of $Z_1$ | $\beta_1$ | $\beta_1 + \beta_3$ | $\beta_3$ (diff-in-diff) |

::: {.columns}
::: {.column .lang-en width="50%"}

- We use factorial design when we are interested in interaction effects.
- If we have a 2×2 factorial design, we have four groups.
- We can always take the difference-in-means between any two groups.

:::
::: {.column .lang-fr width="50%"}

- Nous utilisons un plan factoriel quand nous nous intéressons aux effets d'interaction.
- Si nous avons une conception factorielle 2×2, nous avons 4 groupes.
- Nous pouvons toujours tenir compte de la différence de moyennes entre deux groupes.

:::
:::

## Estimator 2: Linear Regression with an Interaction Term | *Estimateur 2 : La régression linéaire avec un terme d'interaction*

$$Y_i = {\alpha_0} + {\beta_1} Z_{1i} + {\beta_2} Z_{2i} + {\beta_3} Z_{1i}*Z_{2i} + e_i$$
$$Y_i = {\alpha_0} + {\beta_1} Z_{1i} + {\beta_2} Z_{2i} + {\beta_3} Z_{1i}*Z_{2i} + {\gamma} X_i + e_i$$

::: {.columns}
::: {.column .lang-en width="50%"}

- Indicator variables for $Z_1$ and $Z_2$.
- We can also do covariate adjustment at the same time.

:::
::: {.column .lang-fr width="50%"}

- Variables indicatrices pour $Z_1$ et $Z_2$.
- Nous pouvons également effectuer un ajustement covariable en même temps.

:::
:::

## Estimator 2: Linear Regression with an Interaction Term | *Estimateur 2 : La régression linéaire avec un terme d'interaction* {.compact-table}

| | $Z_2 = 1$ | $Z_2 = 0$ |
|:--|:--|:--|
| $Z_1 = 1$ | Both $Z_1$ and $Z_2$ | **$Z_1$ only** |
| $Z_1 = 0$ | $Z_2$ only | **Neither** |

$$Y_i = {\alpha_0} + {\beta_1} Z_{1i} + {\beta_2} Z_{2i} + {\beta_3} Z_{1i}*Z_{2i} + e_i$$

::: {.columns}
::: {.column .lang-en width="50%"}

- Estimand: $E[Y(Z_1=1)| Z_2=0] - E[Y(Z_1=0) | Z_2=0]$
- $\hat{\beta_1}$ is the $\widehat{ATE}$ of $Z_{1}$ conditional on $Z_{2}=0$.

:::
::: {.column .lang-fr width="50%"}

- Paramètre : $E[Y(Z_1=1)| Z_2=0] - E[Y(Z_1=0) | Z_2=0]$
- $\widehat{ATE}$ de $Z_{1}$ conditionnel à $Z_{2}=0$.

:::
:::

## Estimator 2: Linear Regression with an Interaction Term | *Estimateur 2 : La régression linéaire avec un terme d'interaction* {.compact-table}

| | $Z_2 = 1$ | $Z_2 = 0$ |
|:--|:--|:--|
| $Z_1 = 1$ | **Both $Z_1$ and $Z_2$** | $Z_1$ only |
| $Z_1 = 0$ | **$Z_2$ only** | Neither |

$$Y_i = {\alpha_0} + {\beta_1} Z_{1i} + {\beta_2} Z_{2i} + {\beta_3} Z_{1i}*Z_{2i} + e_i$$

::: {.columns}
::: {.column .lang-en width="50%"}

- Estimand: $E[Y(Z_1=1) | Z_2=1] - E[Y(Z_1=0) | Z_2=1]$
- $\hat{\beta_1} + \hat{\beta_3}$ = $\widehat{ATE}$ of $Z_{1}$ conditional on $Z_{2}=1$
- $\beta_3$ is called the interaction effect.

:::
::: {.column .lang-fr width="50%"}

- Paramètre : $E[Y(Z_1=1) | Z_2=1] - E[Y(Z_1=0) | Z_2=1]$
- $\widehat{ATE}$ de $Z_{1}$ conditionnel à $Z_{2}=1$
- $\beta_3$ est appelé l'effet d'interaction.

:::
:::


## Estimator 2: Linear Regression with an Interaction Term | *Estimateur 2 : La régression linéaire avec un terme d'interaction* {.smaller}

```{r, echo = TRUE, eval = FALSE}
lm_robust(Energy ~ Coffee * Sport, data = df)
```

```{r, echo = FALSE, results = 'asis'}
fit_coffee_sport <- lm_robust(Energy ~ Coffee * Sport, data = df)
b0 <- unname(coef(fit_coffee_sport)["(Intercept)"])
b1 <- unname(coef(fit_coffee_sport)["Coffee"])
b2 <- unname(coef(fit_coffee_sport)["Sport"])
b3 <- unname(coef(fit_coffee_sport)["Coffee:Sport"])
fmt <- function(x) format(round(x, 3), nsmall = 3)
coef_term <- function(b, name) {
  op <- if (b >= 0) " + " else " - "
  paste0(op, fmt(abs(b)), " \\times ", name)
}
fit_coffee_sport |> texreg::htmlreg(include.ci = FALSE)
```


```{r, echo = FALSE, results = 'asis'}
cat(
  "$$\\hat{Y} = ", fmt(b0),
  coef_term(b1, "Coffee"),
  coef_term(b2, "Sport"),
  coef_term(b3, "Coffee \\times Sport"),
  "$$\n"
)
```

## Estimator 2: Linear Regression with an Interaction Term | *Estimateur 2 : La régression linéaire avec un terme d'interaction* {.compact-table}

::: {.columns}
::: {.column .lang-en width="50%"}

| | |
|:--|:--|
| $\hat{Y} |$ Coffee = 0, Sport = 0 | $\hat{\alpha} =$ `r fmt(b0)` |
| $\hat{Y} |$ Coffee = 0, Sport = 1 | $\hat{\alpha} + \hat{\beta}_2 =$ `r fmt(b0 + b2)` |
| $\hat{Y} |$ Coffee = 1, Sport = 0 | $\hat{\alpha} + \hat{\beta}_1 =$ `r fmt(b0 + b1)` |
| $\hat{Y} |$ Coffee = 1, Sport = 1 | $\hat{\alpha} + \hat{\beta}_1 + \hat{\beta}_2 + \hat{\beta}_3 =$ `r fmt(b0 + b1 + b2 + b3)` |
| $\widehat{ATE}$ of Coffee \| Sport = 0 | $\hat{\beta}_1 =$ `r fmt(b1)` |
| $\widehat{ATE}$ of Coffee \| Sport = 1 | $\hat{\beta}_1 + \hat{\beta}_3 =$ `r fmt(b1 + b3)` |
| Interaction effect | $\hat{\beta}_3 =$ `r fmt(b3)` |

:::
::: {.column .lang-fr width="50%"}

| | |
|:--|:--|
| $\hat{Y} |$ Coffee = 0, Sport = 0 | $\hat{\alpha} =$ `r fmt(b0)` |
| $\hat{Y} |$ Coffee = 0, Sport = 1 | $\hat{\alpha} + \hat{\beta}_2 =$ `r fmt(b0 + b2)` |
| $\hat{Y} |$ Coffee = 1, Sport = 0 | $\hat{\alpha} + \hat{\beta}_1 =$ `r fmt(b0 + b1)` |
| $\hat{Y} |$ Coffee = 1, Sport = 1 | $\hat{\alpha} + \hat{\beta}_1 + \hat{\beta}_2 + \hat{\beta}_3 =$ `r fmt(b0 + b1 + b2 + b3)` |
| $\widehat{ATE}$ de Coffee \| Sport = 0 | $\hat{\beta}_1 =$ `r fmt(b1)` |
| $\widehat{ATE}$ de Coffee \| Sport = 1 | $\hat{\beta}_1 + \hat{\beta}_3 =$ `r fmt(b1 + b3)` |
| Effet d'interaction | $\hat{\beta}_3 =$ `r fmt(b3)` |

:::
:::


```{r, echo = FALSE, results = 'asis'}
cat(
  "$$\\hat{Y} = ", fmt(b0),
  coef_term(b1, "Coffee"),
  coef_term(b2, "Sport"),
  coef_term(b3, "Coffee \\times Sport"),
  "$$\n"
)
```


# Encouragement design | *Conception incitative*

## Encouragement design | *Conception incitative*

::: {.columns}
::: {.column .lang-en width="50%"}

- Situation: You can't force people to take (receive) your treatment. Treatment assigned is not the same as treatment received.
- We can randomize **encouragement** $Z$ to take the treatment, such as a request to drink coffee or offering a subsidy to participate in a program.
- We measure the encouragement $Z$, taking the treatment $D$, and the outcome $Y$.
- We analyze using instrumental variables techniques (two stage least squares).

:::
::: {.column .lang-fr width="50%"}

- Situation : vous ne pouvez pas forcer les gens à prendre (recevoir) le traitement. Le traitement assigné n'est pas le même que le traitement reçu.
- Nous pouvons randomiser l'**incitation** $Z$ à suivre le traitement, par exemple en demandant de boire un café ou en offrant une subvention.
- On mesure l'incitation $Z$, le traitement reçu $D$, et le résultat $Y$.
- Nous analysons à l'aide de techniques de variables instrumentales (moindres carrés en deux étapes).

:::
:::


## Encouragement design: Code | *Conception incitative: Code*

```{r, echo = TRUE}
df <- fabricate(N = 1000, Z = complete_ra(N),
            complier = complete_ra(N),
            D = Z*complier,
            Y = complier + D + rnorm(N)/100)

df |> head() |> kable()

```

## Encouragement design: Code | *Conception incitative: Code* {.smaller}

```{r, results='asis', echo = TRUE}
list(
  ITT = lm_robust(Y ~ Z, data = df),
  WRONG = lm_robust(Y ~ D, data = df),
  CACE = iv_robust(Y ~ D | Z, data = df)) |> 
texreg::htmlreg(include.ci = FALSE,)

```