class: center, middle, inverse, title-slide .title[ # Inverse probability weighting ] .subtitle[ ## with R ] .author[ ### Lin Yu ] .date[ ### 2022-06-07 ] --- background-image: url(data:image/png;base64,#https://images.ctfassets.net/mrop88jh71hl/2KW9pmaCJANvvkLmpe2GiV/143b1b0b5e8ab03de46ae3f1f7bcee0d/kid-computational-thinking-light.jpeg) background-position: 50% 50% class: center, bottom, inverse # Rationale??? --- class: center, middle # ![](data:image/png;base64,#testimage.PNG) --- class: center, middle .pull-left[ # That is... ![](data:image/png;base64,#confounder.PNG) ] -- .pull-right[ # Our goal ![](data:image/png;base64,#removearrow.PNG) ] --- background-image: url(data:image/png;base64,#https://images.saymedia-content.com/.image/ar_16:9%2Cc_fill%2Ccs_srgb%2Cfl_progressive%2Cq_auto:eco%2Cw_1200/MTc0MTgxMzc5NzA4Mjk4NzQ4/thinking-thoughts-without-language.jpg) background-position: 50% 50% class: center, bottom, inverse # How? --- class: center,inverse, middle # Strategy 1: Regression --- class: inverse, middle # Strategy 2: Propensity sore ### is a **probability** ### is an individual’s probability of receiving treatment(or being exposed) given **measured** characteristics ### `\(P(exposed| covariates)=P(exposed|X_1,X_2,...X_n,X_1*X_2,X_1^2)\)` --- class: inverse, middle, center # Application of propensity score --- class: center,middle .pull-left[ ## Stratification ![](data:image/png;base64,#stratification.PNG) ] -- .pull-right[ ## Matching ![](data:image/png;base64,#Matching.PNG) ### Unmatched, loss information, potential bias ] --- class: center,middle ## Weighting (topic today) ![](data:image/png;base64,#weighting.PNG) --- .pull-left[ ![](data:image/png;base64,#exampledata.PNG) ### Goal: #### create pseudo-population #### achieve exchangeability ] -- .pull-right[ ![](data:image/png;base64,#exampledata2.PNG) ![](data:image/png;base64,#exampledata3.PNG) ![](data:image/png;base64,#exampledata4.PNG) ] --- class: center, middle ![](data:image/png;base64,#exampledata5.PNG) ### Did we achieve exchangeability? #### i.e., `\(P(A=1|Z=1)=P(A=1|Z=0)=P(A=1)\)`? `\(P(A=1|Z=1)=(54+6)/(54+6+24+36)=0.5\)` `\(P(A=1|Z=0)=(28+12)/(28+12+8+32)=0.5\)` `\(P(A=1)=0.5\)` --- ### Further understanding on IPW suppose Z, A, and Y are binary variables, the original sample size is shown below: | | treatment(A=1) | control(A=0) | |-----|----------------|--------------| | Z=1 | a | b | | Z=0 | c | d | #### When Z=1: In treatment group: `\(PS=P(A=1|Z=1)= a/(a+b)\)`, `\(weight_1 =1/PS= (a+b)/a\)` In control group: `\(1-PS=P(A=0|Z=1)=b/(a+b)\)`, `\(weight_2=1/(1-PS)=(a+b)/b\)` Therefore, weighted sample size with Z=1 in treatment(A=1) group is `\(a*weight_1=a*(a+b)/a=a+b\)`; weighted sample size with Z=1 in control(A=0) group is `\(b*weight_2=b*(a+b)/b=a+b\)` --- #### Similarly, when Z = 0: in treatment group: `\(PS = P(A=1|Z=0)=c/(c+d)\)`, `\(weight_3 = 1/PS = (c+d)/c\)` in control group: `\(1-PS = P(A=0|Z=0)=d/(c+d)\)`, `\(weight_4 = 1/(1-PS)=(c+d)/d\)` sample size with Z=0 in treatment(A=1) group = `\(c*weight_3 = c*(c+d)/c=c+d\)` sample size with Z=0 in the control(A=0) group = `\(d*weight_4 = d*(c+d)/d=c+d\)` to sum up, the weighted sample size is: | | treatment(A=1) | control(A=0) | |-----|----------------|--------------| | Z=1 | a+b | a+b | | Z=0 | c+d | c+d | that is, exchangeability achieved! --- ### Notes -- ### Calculation `\(IPW=T_i/PS+(1−T_i)/(1-PS)\)` -- ### Assumptions: #### 1. Sufficient measured confounders; #### 2. All participants have a non-zero probability of receiving treatment. --- class: inverse, center, middle # An example (with R codes) --- # The Data -- ### Objective Our goal in this example is to estimate the causal effect of **bed net usage** on **malaria risk** using only observational data. -- ### Variables The **mosquito_net** data set contains the following variables: -- - **Malaria risk (malaria_risk)**: The likelihood that someone in the household will be infected with malaria, range 0-1. -- - **Mosquito net (net and net_num)**: A binary variable indicating if the household used mosquito nets. Eligible for program (eligible): A binary variable indicating if the household is eligible for the free net program. -- - **Income (income)**: The household’s monthly income. -- - **Nighttime temperatures (temperature)**: The average temperature at night. --- # The data (continued) ### Variables -- - **Health (health)**: Self-reported healthiness in the household. Measured on a scale of 0–100. -- - **Number in household (household)**: Number of people living in the household. -- - **Insecticide resistance (resistance)**: Some strains of mosquitoes are more resistant to insecticide and thus pose a higher risk of infecting people with malaria. This is measured on a scale of 0–100, with higher values indicating higher resistance. -- ### Import dataset ```r pacman::p_load(readr) nets <- read_csv("mosquito_nets.csv") ``` --- ```r head(nets[,c(1:6)]) ``` ``` ## # A tibble: 6 x 6 ## id net net_num malaria_risk income health ## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> ## 1 1 TRUE 1 33 781 56 ## 2 2 FALSE 0 42 974 57 ## 3 3 FALSE 0 80 502 15 ## 4 4 TRUE 1 34 671 20 ## 5 5 FALSE 0 44 728 17 ## 6 6 FALSE 0 25 1050 48 ``` ```r head(nets[,c(7:10)]) ``` ``` ## # A tibble: 6 x 4 ## household eligible temperature resistance ## <dbl> <lgl> <dbl> <dbl> ## 1 2 FALSE 21.1 59 ## 2 4 FALSE 26.5 73 ## 3 3 FALSE 25.6 65 ## 4 5 TRUE 21.3 46 ## 5 5 FALSE 19.2 54 ## 6 1 FALSE 25.3 34 ``` --- class: center,middle # Draw DAG ![](data:image/png;base64,#DAG.PNG) -- ### health, income, and temperature --- # Step1: Generate Propensity score ```r ###fit logistic regression to get propensity score model_net <- glm(net ~ income + temperature + health, data = nets, family = binomial(link = "logit")) # tidy(model_net, exponentiate = TRUE) pacman::p_load(broom)##augment_column net_probabilities <- augment_columns( model_net,##model nets, ## original data type.predict="response" ## type of the prediction ) %>% rename(propensity = .fitted) net_probabilities %>% select(id, net, income, temperature, health, propensity) %>% head() ``` ``` ## # A tibble: 6 x 6 ## id net income temperature health propensity ## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> ## 1 1 TRUE 781 21.1 56 0.367 ## 2 2 FALSE 974 26.5 57 0.389 ## 3 3 FALSE 502 25.6 15 0.158 ## 4 4 TRUE 671 21.3 20 0.263 ## 5 5 FALSE 728 19.2 17 0.308 ## 6 6 FALSE 1050 25.3 48 0.429 ``` --- # Step2: Calculate IPW ```r net_ipw <- net_probabilities %>% mutate(ipw = (net_num / propensity) + ((1 - net_num) / (1 - propensity))) # Look at the first few rows of a few columns net_ipw %>% select(id, net, income, temperature, health, propensity, ipw) %>% head() ``` ``` ## # A tibble: 6 x 7 ## id net income temperature health propensity ipw ## <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 TRUE 781 21.1 56 0.367 2.72 ## 2 2 FALSE 974 26.5 57 0.389 1.64 ## 3 3 FALSE 502 25.6 15 0.158 1.19 ## 4 4 TRUE 671 21.3 20 0.263 3.81 ## 5 5 FALSE 728 19.2 17 0.308 1.44 ## 6 6 FALSE 1050 25.3 48 0.429 1.75 ``` --- # Step3: Check balance ```r # pacman::p_load(jstable) pacman::p_load(tableone)# svyCreateTableOne pacman::p_load(survey)#svydesign sdm_dat_weighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ ipw, nest = FALSE, data = net_ipw) sdm_dat_unweighted <- svydesign(ids = ~ id, strata = ~ net, weights = ~ 1, nest = FALSE, data = net_ipw) tabWeighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_weighted, test = FALSE) tabunweighted <- svyCreateTableOne(vars = c('income','health','temperature'), strata = "net", data =sdm_dat_unweighted , test = FALSE) ``` --- # Step3: Check balance ```r ## Show table with SMD print(tabunweighted, smd = TRUE) ``` ``` ## Stratified by net ## FALSE TRUE SMD ## n 1071.00 681.00 ## income (mean (SD)) 872.75 (172.69) 955.19 (201.63) 0.439 ## health (mean (SD)) 48.06 (17.22) 54.91 (18.93) 0.379 ## temperature (mean (SD)) 24.09 (4.03) 23.38 (4.20) 0.172 ``` ```r print(tabWeighted, smd = TRUE) ``` ``` ## Stratified by net ## FALSE TRUE SMD ## n 1741.09 1784.96 ## income (mean (SD)) 899.89 (175.51) 890.39 (210.85) 0.049 ## health (mean (SD)) 50.38 (17.49) 49.71 (19.40) 0.036 ## temperature (mean (SD)) 23.83 (4.03) 23.91 (4.24) 0.019 ``` --- # Step4: Modelling with IPW ```r model_unweighted <- lm(malaria_risk ~ net, data = nets) tidy(model_unweighted) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 41.9 0.405 104. 0 ## 2 netTRUE -16.3 0.649 -25.1 2.25e-119 ``` ```r model_ipw <- lm(malaria_risk ~ net, data = net_ipw, weights = ipw) tidy(model_ipw) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 39.7 0.468 84.7 0 ## 2 netTRUE -10.1 0.658 -15.4 3.21e-50 ``` --- class: inverse, center, middle # One last thing... --- class: inverse, center, middle # Based on the casual quantities of interest, different versions of weights could be calculated, -- # and different pseudo-populations are generated. -- # However, the ultimate goal is always the same.(i.e., achieve exchangeability) --- class: inverse, center, middle # Casual quantities of interest --- ### 1. Average Treatment Effect **Calculation** : `\(IPW=T_i/PS+(1−T_i)/(1-PS)\)` **Target population**: the whole population, both treated and controlled. ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ### 2. Average Treatment Effect Among the Treated **Calculation** : `\(IPW=(PS*T_i)/PS+[PS*(1−T_i)]/(1-PS)\)` **Target population**: treated population ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ### 3. Average Treatment Effect Among the Controls **Calculation** : `\(IPW=[(1-PS)*T_i]/PS+[(1-PS)*(1−T_i)]/(1-PS)\)` **Target population**: treated population ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-13-1.png)<!-- --> --- ### 4. Average Treatment Effect Among the Evenly Matchable **Calculation** : `\(IPW=[min{PS,1-PS}]/[T_i*PS+(1−T_i)*(1-PS)]\)` **Target population**: treated population ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- #### PSM result: ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ### 5. Average Treatment Effect Among the Overlap Population **Calculation** : `\(IPW=(1-PS)*T_i+PS*(1−T_i)\)` **Target population**: treated population ![](data:image/png;base64,#IPW-pre_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- class: inverse, center,middle ![](data:image/png;base64,#https://media0.giphy.com/media/5GoVLqeAOo6PK/200.gif) ## I am so happy:) --- # References #### [1] https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/#how-do-we-incorporate-a-propensity-score-in-a-weight #### [2] https://evalf20.classes.andrewheiss.com/example/matching-ipw/ #### [3] https://www.youtube.com/watch?v=CKm1rZlAwuA&list=WL #### [4] Nicholas C Chesnaye, Vianda S Stel, Giovanni Tripepi, Friedo W Dekker, Edouard L Fu, Carmine Zoccali, Kitty J Jager, An introduction to inverse probability of treatment weighting in observational research, Clinical Kidney Journal, Volume 15, Issue 1, January 2022, Pages 14–20, https://doi.org/10.1093/ckj/sfab158 #### [5] https://www.youtube.com/watch?v=PfLYPt9ur4g #### [6] https://cran.r-project.org/web/packages/tableone/vignettes/smd.html