+ - 0:00:00
Notes for current slide
Notes for next slide

Inverse probability weighting

with R

Lin Yu

2022-06-07

1 / 36

Rationale???

2 / 36

3 / 36

That is...

4 / 36

That is...

Our goal

4 / 36

How?

5 / 36

Strategy 1: Regression

6 / 36

Strategy 2: Propensity sore

is a probability

is an individual’s probability of receiving treatment(or being exposed) given measured characteristics

P(exposed|covariates)=P(exposed|X1,X2,...Xn,X1X2,X12)

7 / 36

Application of propensity score

8 / 36

Stratification

9 / 36

Stratification

Matching

Unmatched, loss information, potential bias

9 / 36

Weighting (topic today)

10 / 36

Goal:

create pseudo-population

achieve exchangeability

11 / 36

Goal:

create pseudo-population

achieve exchangeability

11 / 36

Did we achieve exchangeability?

i.e., P(A=1|Z=1)=P(A=1|Z=0)=P(A=1)?

P(A=1|Z=1)=(54+6)/(54+6+24+36)=0.5

P(A=1|Z=0)=(28+12)/(28+12+8+32)=0.5

P(A=1)=0.5

12 / 36

Further understanding on IPW

suppose Z, A, and Y are binary variables, the original sample size is shown below:

treatment(A=1) control(A=0)
Z=1 a b
Z=0 c d

When Z=1:

In treatment group: PS=P(A=1|Z=1)=a/(a+b), weight1=1/PS=(a+b)/a

In control group: 1PS=P(A=0|Z=1)=b/(a+b), weight2=1/(1PS)=(a+b)/b

Therefore,

weighted sample size with Z=1 in treatment(A=1) group is aweight1=a(a+b)/a=a+b;

weighted sample size with Z=1 in control(A=0) group is bweight2=b(a+b)/b=a+b

13 / 36

Similarly, when Z = 0:

in treatment group: PS=P(A=1|Z=0)=c/(c+d), weight3=1/PS=(c+d)/c

in control group: 1PS=P(A=0|Z=0)=d/(c+d), weight4=1/(1PS)=(c+d)/d

sample size with Z=0 in treatment(A=1) group = cweight3=c(c+d)/c=c+d

sample size with Z=0 in the control(A=0) group = dweight4=d(c+d)/d=c+d

to sum up, the weighted sample size is:

treatment(A=1) control(A=0)
Z=1 a+b a+b
Z=0 c+d c+d

that is, exchangeability achieved!

14 / 36

Notes

15 / 36

Notes

Calculation

IPW=Ti/PS+(1Ti)/(1PS)

15 / 36

Notes

Calculation

IPW=Ti/PS+(1Ti)/(1PS)

Assumptions:

1. Sufficient measured confounders;

2. All participants have a non-zero probability of receiving treatment.

15 / 36

An example (with R codes)

16 / 36

The Data

17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

Variables

The mosquito_net data set contains the following variables:

17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

Variables

The mosquito_net data set contains the following variables:

  • Malaria risk (malaria_risk): The likelihood that someone in the household will be infected with malaria, range 0-1.
17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

Variables

The mosquito_net data set contains the following variables:

  • Malaria risk (malaria_risk): The likelihood that someone in the household will be infected with malaria, range 0-1.

  • Mosquito net (net and net_num): A binary variable indicating if the household used mosquito nets. Eligible for program (eligible): A binary variable indicating if the household is eligible for the free net program.

17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

Variables

The mosquito_net data set contains the following variables:

  • Malaria risk (malaria_risk): The likelihood that someone in the household will be infected with malaria, range 0-1.

  • Mosquito net (net and net_num): A binary variable indicating if the household used mosquito nets. Eligible for program (eligible): A binary variable indicating if the household is eligible for the free net program.

  • Income (income): The household’s monthly income.

17 / 36

The Data

Objective

Our goal in this example is to estimate the causal effect of bed net usage on malaria risk using only observational data.

Variables

The mosquito_net data set contains the following variables:

  • Malaria risk (malaria_risk): The likelihood that someone in the household will be infected with malaria, range 0-1.

  • Mosquito net (net and net_num): A binary variable indicating if the household used mosquito nets. Eligible for program (eligible): A binary variable indicating if the household is eligible for the free net program.

  • Income (income): The household’s monthly income.

  • Nighttime temperatures (temperature): The average temperature at night.

17 / 36

The data (continued)

Variables

18 / 36

The data (continued)

Variables

  • Health (health): Self-reported healthiness in the household. Measured on a scale of 0–100.
18 / 36

The data (continued)

Variables

  • Health (health): Self-reported healthiness in the household. Measured on a scale of 0–100.

  • Number in household (household): Number of people living in the household.

18 / 36

The data (continued)

Variables

  • Health (health): Self-reported healthiness in the household. Measured on a scale of 0–100.

  • Number in household (household): Number of people living in the household.

  • Insecticide resistance (resistance): Some strains of mosquitoes are more resistant to insecticide and thus pose a higher risk of infecting people with malaria. This is measured on a scale of 0–100, with higher values indicating higher resistance.

18 / 36

The data (continued)

Variables

  • Health (health): Self-reported healthiness in the household. Measured on a scale of 0–100.

  • Number in household (household): Number of people living in the household.

  • Insecticide resistance (resistance): Some strains of mosquitoes are more resistant to insecticide and thus pose a higher risk of infecting people with malaria. This is measured on a scale of 0–100, with higher values indicating higher resistance.

Import dataset

pacman::p_load(readr)
nets <- read_csv("mosquito_nets.csv")
18 / 36
head(nets[,c(1:6)])
## # A tibble: 6 x 6
## id net net_num malaria_risk income health
## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 1 TRUE 1 33 781 56
## 2 2 FALSE 0 42 974 57
## 3 3 FALSE 0 80 502 15
## 4 4 TRUE 1 34 671 20
## 5 5 FALSE 0 44 728 17
## 6 6 FALSE 0 25 1050 48
head(nets[,c(7:10)])
## # A tibble: 6 x 4
## household eligible temperature resistance
## <dbl> <lgl> <dbl> <dbl>
## 1 2 FALSE 21.1 59
## 2 4 FALSE 26.5 73
## 3 3 FALSE 25.6 65
## 4 5 TRUE 21.3 46
## 5 5 FALSE 19.2 54
## 6 1 FALSE 25.3 34
19 / 36

Draw DAG

20 / 36

Draw DAG

health, income, and temperature

20 / 36

Step1: Generate Propensity score

###fit logistic regression to get propensity score
model_net <- glm(net ~ income + temperature + health,
data = nets,
family = binomial(link = "logit"))
# tidy(model_net, exponentiate = TRUE)
pacman::p_load(broom)##augment_column
net_probabilities <- augment_columns(
model_net,##model
nets, ## original data
type.predict="response" ## type of the prediction
) %>%
rename(propensity = .fitted)
net_probabilities %>%
select(id, net, income, temperature, health, propensity) %>%
head()
## # A tibble: 6 x 6
## id net income temperature health propensity
## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 1 TRUE 781 21.1 56 0.367
## 2 2 FALSE 974 26.5 57 0.389
## 3 3 FALSE 502 25.6 15 0.158
## 4 4 TRUE 671 21.3 20 0.263
## 5 5 FALSE 728 19.2 17 0.308
## 6 6 FALSE 1050 25.3 48 0.429
21 / 36

Step2: Calculate IPW

net_ipw <- net_probabilities %>%
mutate(ipw = (net_num / propensity) + ((1 - net_num) / (1 - propensity)))
# Look at the first few rows of a few columns
net_ipw %>%
select(id, net, income, temperature, health, propensity, ipw) %>%
head()
## # A tibble: 6 x 7
## id net income temperature health propensity ipw
## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 TRUE 781 21.1 56 0.367 2.72
## 2 2 FALSE 974 26.5 57 0.389 1.64
## 3 3 FALSE 502 25.6 15 0.158 1.19
## 4 4 TRUE 671 21.3 20 0.263 3.81
## 5 5 FALSE 728 19.2 17 0.308 1.44
## 6 6 FALSE 1050 25.3 48 0.429 1.75
22 / 36

Step3: Check balance

# pacman::p_load(jstable)
pacman::p_load(tableone)# svyCreateTableOne
pacman::p_load(survey)#svydesign
sdm_dat_weighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ ipw, nest = FALSE, data = net_ipw)
sdm_dat_unweighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ 1, nest = FALSE, data = net_ipw)
tabWeighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_weighted, test = FALSE)
tabunweighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_unweighted , test = FALSE)
23 / 36

Step3: Check balance

## Show table with SMD
print(tabunweighted, smd = TRUE)
## Stratified by net
## FALSE TRUE SMD
## n 1071.00 681.00
## income (mean (SD)) 872.75 (172.69) 955.19 (201.63) 0.439
## health (mean (SD)) 48.06 (17.22) 54.91 (18.93) 0.379
## temperature (mean (SD)) 24.09 (4.03) 23.38 (4.20) 0.172
print(tabWeighted, smd = TRUE)
## Stratified by net
## FALSE TRUE SMD
## n 1741.09 1784.96
## income (mean (SD)) 899.89 (175.51) 890.39 (210.85) 0.049
## health (mean (SD)) 50.38 (17.49) 49.71 (19.40) 0.036
## temperature (mean (SD)) 23.83 (4.03) 23.91 (4.24) 0.019
24 / 36

Step4: Modelling with IPW

model_unweighted <- lm(malaria_risk ~ net,
data = nets)
tidy(model_unweighted)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 41.9 0.405 104. 0
## 2 netTRUE -16.3 0.649 -25.1 2.25e-119
model_ipw <- lm(malaria_risk ~ net,
data = net_ipw,
weights = ipw)
tidy(model_ipw)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 39.7 0.468 84.7 0
## 2 netTRUE -10.1 0.658 -15.4 3.21e-50
25 / 36

One last thing...

26 / 36

Based on the casual quantities of interest, different versions of weights could be calculated,

27 / 36

Based on the casual quantities of interest, different versions of weights could be calculated,

and different pseudo-populations are generated.

27 / 36

Based on the casual quantities of interest, different versions of weights could be calculated,

and different pseudo-populations are generated.

However, the ultimate goal is always the same.(i.e., achieve exchangeability)

27 / 36

Casual quantities of interest

28 / 36

1. Average Treatment Effect

Calculation : IPW=Ti/PS+(1Ti)/(1PS)

Target population: the whole population, both treated and controlled.

29 / 36

2. Average Treatment Effect Among the Treated

Calculation : IPW=(PSTi)/PS+[PS(1Ti)]/(1PS)

Target population: treated population

30 / 36

3. Average Treatment Effect Among the Controls

Calculation : IPW=[(1PS)Ti]/PS+[(1PS)(1Ti)]/(1PS)

Target population: treated population

31 / 36

4. Average Treatment Effect Among the Evenly Matchable

Calculation : IPW=[minPS,1PS]/[TiPS+(1Ti)(1PS)]

Target population: treated population

32 / 36

PSM result:

33 / 36

5. Average Treatment Effect Among the Overlap Population

Calculation : IPW=(1PS)Ti+PS(1Ti)

Target population: treated population

34 / 36

I am so happy:)

35 / 36

References

[1] https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/#how-do-we-incorporate-a-propensity-score-in-a-weight

[2] https://evalf20.classes.andrewheiss.com/example/matching-ipw/

[3] https://www.youtube.com/watch?v=CKm1rZlAwuA&list=WL

[4] Nicholas C Chesnaye, Vianda S Stel, Giovanni Tripepi, Friedo W Dekker, Edouard L Fu, Carmine Zoccali, Kitty J Jager, An introduction to inverse probability of treatment weighting in observational research, Clinical Kidney Journal, Volume 15, Issue 1, January 2022, Pages 14–20, https://doi.org/10.1093/ckj/sfab158

[5] https://www.youtube.com/watch?v=PfLYPt9ur4g

[6] https://cran.r-project.org/web/packages/tableone/vignettes/smd.html

36 / 36

Rationale???

2 / 36
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow