Inverse probability weighting

class: center, middle, inverse, title-slide

.title[
# Inverse probability weighting
]
.subtitle[
## with R
]
.author[
### Lin Yu
]
.date[
### 2022-06-07
]

---

background-image: url(data:image/png;base64,#https://images.ctfassets.net/mrop88jh71hl/2KW9pmaCJANvvkLmpe2GiV/143b1b0b5e8ab03de46ae3f1f7bcee0d/kid-computational-thinking-light.jpeg)
background-position: 50% 50%
class: center, bottom, inverse

# Rationale???

---
class: center, middle

# ![](data:image/png;base64,#testimage.PNG)
---
class: center, middle

.pull-left[
# That is...

![](data:image/png;base64,#confounder.PNG)
]

.pull-right[

# Our goal
![](data:image/png;base64,#removearrow.PNG)
]

---
background-image: url(data:image/png;base64,#https://images.saymedia-content.com/.image/ar_16:9%2Cc_fill%2Ccs_srgb%2Cfl_progressive%2Cq_auto:eco%2Cw_1200/MTc0MTgxMzc5NzA4Mjk4NzQ4/thinking-thoughts-without-language.jpg)
background-position: 50% 50%
class: center, bottom, inverse
# How?

---
class: center,inverse, middle

# Strategy 1: Regression
---
class: inverse, middle

# Strategy 2: Propensity sore

### is a **probability**

### is an individual’s probability of receiving treatment(or being exposed) given **measured** characteristics

### `\(P(exposed| covariates)=P(exposed|X_1,X_2,...X_n,X_1*X_2,X_1^2)\)`

---
class: inverse, middle, center
# Application of propensity score

---
class: center,middle

.pull-left[
## Stratification
![](data:image/png;base64,#stratification.PNG)
]

.pull-right[
## Matching
![](data:image/png;base64,#Matching.PNG)
### Unmatched, loss information,  potential bias
]

---
class: center,middle

## Weighting (topic today)
![](data:image/png;base64,#weighting.PNG)

---

.pull-left[
![](data:image/png;base64,#exampledata.PNG)

### Goal:

#### create pseudo-population

#### achieve exchangeability

]
--

.pull-right[
![](data:image/png;base64,#exampledata2.PNG)

![](data:image/png;base64,#exampledata3.PNG)

![](data:image/png;base64,#exampledata4.PNG)
]

---
class: center, middle
![](data:image/png;base64,#exampledata5.PNG)

### Did we achieve exchangeability?

#### i.e., `\(P(A=1|Z=1)=P(A=1|Z=0)=P(A=1)\)`?

`\(P(A=1|Z=1)=(54+6)/(54+6+24+36)=0.5\)`

`\(P(A=1|Z=0)=(28+12)/(28+12+8+32)=0.5\)`

`\(P(A=1)=0.5\)`
---
### Further understanding on IPW

suppose Z, A, and Y are binary variables, the original sample size is shown below:

|     | treatment(A=1) | control(A=0) |
|-----|----------------|--------------|
| Z=1 | a              | b            |
| Z=0 | c              | d            |

#### When Z=1:
In treatment group: `\(PS=P(A=1|Z=1)= a/(a+b)\)`, `\(weight_1 =1/PS= (a+b)/a\)`

In control group: `\(1-PS=P(A=0|Z=1)=b/(a+b)\)`, `\(weight_2=1/(1-PS)=(a+b)/b\)`

Therefore,

weighted sample size with Z=1 in treatment(A=1) group is `\(a*weight_1=a*(a+b)/a=a+b\)`;

weighted sample size with Z=1 in control(A=0) group is `\(b*weight_2=b*(a+b)/b=a+b\)`

---
#### Similarly, when Z = 0:

in treatment group: `\(PS = P(A=1|Z=0)=c/(c+d)\)`, `\(weight_3 = 1/PS = (c+d)/c\)`

in control group: `\(1-PS = P(A=0|Z=0)=d/(c+d)\)`, `\(weight_4 = 1/(1-PS)=(c+d)/d\)`

sample size with Z=0 in treatment(A=1) group = `\(c*weight_3 = c*(c+d)/c=c+d\)`

sample size with Z=0 in the control(A=0) group = `\(d*weight_4 = d*(c+d)/d=c+d\)`

to sum up, the weighted sample size is:

|     | treatment(A=1) | control(A=0) |
|-----|----------------|--------------|
| Z=1 | a+b            | a+b          |
| Z=0 | c+d            | c+d          |

that is, exchangeability achieved!
---
### Notes

### Calculation

`\(IPW=T_i/PS+(1−T_i)/(1-PS)\)`

### Assumptions:

#### 1. Sufficient measured confounders;

#### 2. All participants have a non-zero probability of receiving treatment.

---
class: inverse, center, middle

# An example (with R codes)

---

# The Data

### Objective

Our goal in this example is to estimate the causal effect of **bed net usage** on **malaria risk** using only observational data.

### Variables

The **mosquito_net** data set contains the following variables:

- **Malaria risk (malaria_risk)**: The likelihood that someone in the household will be infected with malaria, range 0-1.

- **Mosquito net (net and net_num)**: A binary variable indicating if the household used mosquito nets.
Eligible for program (eligible): A binary variable indicating if the household is eligible for the free net program.

-  **Income (income)**: The household’s monthly income.

- **Nighttime temperatures (temperature)**: The average temperature at night.

---
# The data (continued)

### Variables

- **Health (health)**: Self-reported healthiness in the household. Measured on a scale of 0–100.

- **Number in household (household)**: Number of people living in the household.

- **Insecticide resistance (resistance)**: Some strains of mosquitoes are more resistant to insecticide and thus pose a higher risk of infecting people with malaria. This is measured on a scale of 0–100, with higher values indicating higher resistance.

### Import dataset

```r
pacman::p_load(readr)
nets <- read_csv("mosquito_nets.csv")
```

---

```r
 head(nets[,c(1:6)])
```

```
## # A tibble: 6 x 6
##      id net   net_num malaria_risk income health
##   <dbl> <lgl>   <dbl>        <dbl>  <dbl>  <dbl>
## 1     1 TRUE        1           33    781     56
## 2     2 FALSE       0           42    974     57
## 3     3 FALSE       0           80    502     15
## 4     4 TRUE        1           34    671     20
## 5     5 FALSE       0           44    728     17
## 6     6 FALSE       0           25   1050     48
```

```r
 head(nets[,c(7:10)])
```

```
## # A tibble: 6 x 4
##   household eligible temperature resistance
##       <dbl> <lgl>          <dbl>      <dbl>
## 1         2 FALSE           21.1         59
## 2         4 FALSE           26.5         73
## 3         3 FALSE           25.6         65
## 4         5 TRUE            21.3         46
## 5         5 FALSE           19.2         54
## 6         1 FALSE           25.3         34
```

---
class: center,middle

# Draw DAG

![](data:image/png;base64,#DAG.PNG)

### health, income, and temperature

---

# Step1: Generate Propensity score

```r
###fit logistic regression to get propensity score
model_net <- glm(net ~ income + temperature + health,
                 data = nets,
                 family = binomial(link = "logit"))
# tidy(model_net, exponentiate = TRUE)
pacman::p_load(broom)##augment_column
net_probabilities <- augment_columns(
  model_net,##model
  nets,  ## original data
  type.predict="response" ## type of the prediction
  ) %>% 
  rename(propensity = .fitted)

net_probabilities %>% 
  select(id, net, income, temperature, health, propensity) %>% 
  head()
```

```
## # A tibble: 6 x 6
##      id net   income temperature health propensity
##   <dbl> <lgl>  <dbl>       <dbl>  <dbl>      <dbl>
## 1     1 TRUE     781        21.1     56      0.367
## 2     2 FALSE    974        26.5     57      0.389
## 3     3 FALSE    502        25.6     15      0.158
## 4     4 TRUE     671        21.3     20      0.263
## 5     5 FALSE    728        19.2     17      0.308
## 6     6 FALSE   1050        25.3     48      0.429
```
---
# Step2: Calculate IPW

```r
net_ipw <- net_probabilities %>%
  mutate(ipw = (net_num / propensity) + ((1 - net_num) / (1 - propensity)))

# Look at the first few rows of a few columns
net_ipw %>% 
  select(id, net, income, temperature, health, propensity, ipw) %>% 
  head()
```

```
## # A tibble: 6 x 7
##      id net   income temperature health propensity   ipw
##   <dbl> <lgl>  <dbl>       <dbl>  <dbl>      <dbl> <dbl>
## 1     1 TRUE     781        21.1     56      0.367  2.72
## 2     2 FALSE    974        26.5     57      0.389  1.64
## 3     3 FALSE    502        25.6     15      0.158  1.19
## 4     4 TRUE     671        21.3     20      0.263  3.81
## 5     5 FALSE    728        19.2     17      0.308  1.44
## 6     6 FALSE   1050        25.3     48      0.429  1.75
```
---
# Step3: Check balance

```r
# pacman::p_load(jstable)
pacman::p_load(tableone)# svyCreateTableOne
pacman::p_load(survey)#svydesign
sdm_dat_weighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ ipw, nest = FALSE, data = net_ipw)

sdm_dat_unweighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ 1, nest = FALSE, data = net_ipw)

tabWeighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_weighted, test = FALSE)

tabunweighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_unweighted , test = FALSE)
```
---
# Step3: Check balance

```r
## Show table with SMD
print(tabunweighted, smd = TRUE)
```

```
##                          Stratified by net
##                           FALSE            TRUE            SMD   
##   n                       1071.00          681.00                
##   income (mean (SD))       872.75 (172.69) 955.19 (201.63)  0.439
##   health (mean (SD))        48.06 (17.22)   54.91 (18.93)   0.379
##   temperature (mean (SD))   24.09 (4.03)    23.38 (4.20)    0.172
```

```r
print(tabWeighted, smd = TRUE)
```

```
##                          Stratified by net
##                           FALSE            TRUE             SMD   
##   n                       1741.09          1784.96                
##   income (mean (SD))       899.89 (175.51)  890.39 (210.85)  0.049
##   health (mean (SD))        50.38 (17.49)    49.71 (19.40)   0.036
##   temperature (mean (SD))   23.83 (4.03)     23.91 (4.24)    0.019
```
---
# Step4: Modelling with IPW

```r
model_unweighted <- lm(malaria_risk ~ net,
                       data = nets)

tidy(model_unweighted)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)     41.9     0.405     104.  0        
## 2 netTRUE        -16.3     0.649     -25.1 2.25e-119
```

```r
model_ipw <- lm(malaria_risk ~ net, 
                data = net_ipw,
                weights = ipw)
tidy(model_ipw)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     39.7     0.468      84.7 0       
## 2 netTRUE        -10.1     0.658     -15.4 3.21e-50
```
---
class: inverse, center, middle

# One last thing...

---
class: inverse, center, middle

# Based on the casual quantities of interest, different versions of weights could be calculated,

# and different pseudo-populations are generated.

# However, the ultimate goal is always the same.(i.e., achieve exchangeability)
---
class: inverse, center, middle
# Casual quantities of interest

---
### 1. Average Treatment Effect
**Calculation** : `\(IPW=T_i/PS+(1−T_i)/(1-PS)\)`

**Target population**: the whole population, both treated and controlled.

![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-11-1.png)
---
### 2. Average Treatment Effect Among the Treated

**Calculation** : `\(IPW=(PS*T_i)/PS+[PS*(1−T_i)]/(1-PS)\)`

**Target population**: treated population

![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-12-1.png)

---
### 3. Average Treatment Effect Among the Controls

**Calculation** : `\(IPW=[(1-PS)*T_i]/PS+[(1-PS)*(1−T_i)]/(1-PS)\)`

**Target population**: treated population

![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-13-1.png)
---

### 4. Average Treatment Effect Among the Evenly Matchable

**Calculation** : `\(IPW=[min{PS,1-PS}]/[T_i*PS+(1−T_i)*(1-PS)]\)`

**Target population**: treated population
![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-14-1.png)
---

#### PSM result:
![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-16-1.png)

---
### 5. Average Treatment Effect Among the Overlap Population

**Calculation** : `\(IPW=(1-PS)*T_i+PS*(1−T_i)\)`

**Target population**: treated population

![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-17-1.png)

---
class: inverse, center,middle
![](data:image/png;base64,#https://media0.giphy.com/media/5GoVLqeAOo6PK/200.gif)

## I am so happy:)

---
# References

#### [1] https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/#how-do-we-incorporate-a-propensity-score-in-a-weight

#### [2] https://evalf20.classes.andrewheiss.com/example/matching-ipw/

#### [3] https://www.youtube.com/watch?v=CKm1rZlAwuA&list=WL

#### [4] Nicholas C Chesnaye, Vianda S Stel, Giovanni Tripepi, Friedo W Dekker, Edouard L Fu, Carmine Zoccali, Kitty J Jager, An introduction to inverse probability of treatment weighting in observational research, Clinical Kidney Journal, Volume 15, Issue 1, January 2022, Pages 14–20, https://doi.org/10.1093/ckj/sfab158

#### [5] https://www.youtube.com/watch?v=PfLYPt9ur4g

#### [6] https://cran.r-project.org/web/packages/tableone/vignettes/smd.html