Change-in-estimate approach: Assessing confounding effects

Zhiqiang Wang

Badges Confounding; Logistic regression; Cox proportional hazards model; Linear regression;

Overview

The ‘chest’ package systematically calculates and compares effect estimates from various models with different combinations of variables. It calculates the changes in effect estimates when each variable is added to the model sequentially in a step-wise fashion. Effect estimates here can be regression coefficients, odds ratios and hazard ratios depending on modelling methods. At each step, only one variable that causes the largest change among the remaining variables is added to the model. The final results from many models are summarized in one graph and one data frame table. This approach can be used for assessing confounding effects in epidemiological studies and bio-medical research including clinical trials.

Installation

You can install the released version of chest from CRAN with:

Getting Started

library(chest)
names(diab_df)
#>  [1] "Endpoint"  "mid"       "Diabetes"  "Age"       "Sex"       "BMI"      
#>  [7] "Married"   "Smoke"     "CVD"       "Cancer"    "Education" "Income"   
#> [13] "t0"        "t1"

Data: diabetes and mortality

A data frame ‘diab_df’ is used to examine the association between Diabetes and mortality Endpoint. The purpose of using this data set is to demonstrate the use of the functions in this package rather than answering any research questions.

‘chest_speedglm’: report odds ratios at all steps with logistic regression models

chest_speedglm(
  crude = "Endpoint ~ Diabetes",
  xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"), 
  zero = 1, data = diab_df)

#>       variables       OR       lb       ub      Change        p    n
#> 1         Crude 2.312758 1.838975 2.908604          NA 7.57e-13 2372
#> 2         + Age 3.348051 2.500654 4.482605  44.7644279 4.83e-16 2372
#> 3      + Income 2.940581 2.153766 4.014836 -12.1703550 1.13e-11 2061
#> 4         + CVD 2.831739 2.068823 3.875994  -3.7013899 8.09e-11 2061
#> 5       + Smoke 2.930687 2.135170 4.022595   3.4942324 2.84e-11 2061
#> 6         + Sex 2.901284 2.113594 3.982528  -1.0032733 4.38e-11 2061
#> 7   + Education 2.881540 2.096819 3.959939  -0.6805040 6.81e-11 2051
#> 8      + Cancer 2.863889 2.083367 3.936828  -0.6125761 9.11e-11 2051
#> 9     + Married 2.878757 2.093150 3.959219   0.5191530 7.88e-11 2048

All Odds ratios are for the association between Diabetes and mortality Endpoint after each of other factors added to the model sequentially.

We can alter some details of the graph. For example, we used zero = 1 to mark the none effective line. Users can also save the result table to data frame for further presentation and analysis.

When the list of variables is long, or the same list to be used repeatedly, generate a object of variable list:

vlist <- c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income")
chest_speedglm(
  crude = "Endpoint ~ Diabetes",
  xlist = vlist,   zero = 1, data = diab_df)

#>       variables       OR       lb       ub      Change        p    n
#> 1         Crude 2.312758 1.838975 2.908604          NA 7.57e-13 2372
#> 2         + Age 3.348051 2.500654 4.482605  44.7644279 4.83e-16 2372
#> 3      + Income 2.940581 2.153766 4.014836 -12.1703550 1.13e-11 2061
#> 4         + CVD 2.831739 2.068823 3.875994  -3.7013899 8.09e-11 2061
#> 5       + Smoke 2.930687 2.135170 4.022595   3.4942324 2.84e-11 2061
#> 6         + Sex 2.901284 2.113594 3.982528  -1.0032733 4.38e-11 2061
#> 7   + Education 2.881540 2.096819 3.959939  -0.6805040 6.81e-11 2051
#> 8      + Cancer 2.863889 2.083367 3.936828  -0.6125761 9.11e-11 2051
#> 9     + Married 2.878757 2.093150 3.959219   0.5191530 7.88e-11 2048

Remove missing values, and change non-effect line

chest_speedglm(
  crude = "Endpoint ~ Diabetes", xlist = vlist, 
  data = diab_df, zero = c(0.98, 1.02),  na_omit = TRUE)

#>       variables       OR       lb       ub       Change        p    n
#> 1         Crude 2.305786 1.809862 2.937600           NA 1.37e-11 2048
#> 2         + Age 3.297099 2.421755 4.488837  42.99238627 3.50e-14 2048
#> 3      + Income 2.902046 2.125534 3.962238 -11.98183743 2.00e-11 2048
#> 4         + CVD 2.797882 2.044220 3.829405  -3.58931015 1.32e-10 2048
#> 5       + Smoke 2.900600 2.113218 3.981361   3.67127605 4.39e-11 2048
#> 6         + Sex 2.872333 2.092484 3.942824  -0.97453037 6.65e-11 2048
#> 7     + Married 2.894543 2.107594 3.975330   0.77324421 5.19e-11 2048
#> 8      + Cancer 2.881019 2.097209 3.957769  -0.46723505 6.52e-11 2048
#> 9   + Education 2.878757 2.093150 3.959219  -0.07851916 7.88e-11 2048

Add terms such as an interaction between Age and Sex, and age squared

library(tidyverse)
#> -- Attaching packages -------------------------------------------- tidyverse 1.3.0 --
#> v ggplot2 3.2.1     v purrr   0.3.3
#> v tibble  2.1.3     v dplyr   0.8.3
#> v tidyr   1.0.0     v stringr 1.4.0
#> v readr   1.3.1     v forcats 0.4.0
#> -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
diab_df <- diab_df %>%
mutate(Age_Sex = Age*Sex, Age2 = Age^2)
vlist_1<-c("Age", "Sex", "Age2", "Age_Sex", "Married", "Cancer", "CVD", "Education", "Income")
chest_speedglm(crude = "Endpoint ~ Diabetes", xlist = vlist_1, na_omit=TRUE, data = diab_df)

#>        variables       OR       lb       ub       Change        p    n
#> 1          Crude 2.305786 1.809862 2.937600           NA 1.37e-11 2048
#> 2          + Age 3.297099 2.421755 4.488837  42.99238627 3.50e-14 2048
#> 3       + Income 2.902046 2.125534 3.962238 -11.98183743 2.00e-11 2048
#> 4          + CVD 2.797882 2.044220 3.829405  -3.58931015 1.32e-10 2048
#> 5         + Age2 2.765179 2.028997 3.768472  -1.16884530 1.20e-10 2048
#> 6          + Sex 2.738303 2.008891 3.732558  -0.97195568 1.84e-10 2048
#> 7      + Age_Sex 2.770902 2.030833 3.780664   1.19049060 1.29e-10 2048
#> 8      + Married 2.794136 2.046620 3.814677   0.83848576 9.89e-11 2048
#> 9       + Cancer 2.780324 2.036062 3.796643  -0.49431423 1.25e-10 2048
#> 10   + Education 2.777871 2.032476 3.796634  -0.08822106 1.46e-10 2048

chest_glm: Logistic regression using (generalized linear models, glm).

‘chest_glm’ is slower than ‘chest_speedglm’. We can use indicate = TRUE to monitor the progress. If it is too slow, you may want to try ‘chest_speedglm’.

vlist <- c("Age", "Sex", "Married", "Smoke", "Education")
chest_glm(crude = "Endpoint ~ Diabetes", xlist = vlist, data = diab_df, indicate = TRUE)

chest_cox: Using Cox Proportional Hazards Models: ‘coxph’ of ‘survival’ package


chest_cox(crude = "Surv(t0, t1, Endpoint) ~ Diabetes", xlist = vlist,
          na_omit = TRUE, data = diab_df, zero = 1)

#>       variables       HR       lb       ub     Change            p    n
#> 1         Crude 1.588134 1.434544 1.758167         NA 4.950249e-19 2048
#> 2         + CVD 1.526276 1.377192 1.691499 -3.8949795 7.454317e-16 2048
#> 3      + Income 1.480726 1.335380 1.641891 -2.9844079 9.581156e-14 2048
#> 4       + Smoke 1.514956 1.366037 1.680108  2.3116907 3.596810e-15 2048
#> 5         + Sex 1.498963 1.351879 1.662049 -1.0556582 1.571022e-14 2048
#> 6     + Married 1.512616 1.363974 1.677456  0.9108110 4.451213e-15 2048
#> 7         + Age 1.526426 1.376076 1.693202  0.9129952 1.305521e-15 2048
#> 8      + Cancer 1.517896 1.368399 1.683726 -0.5587865 3.050629e-15 2048
#> 9   + Education 1.514437 1.365204 1.679982 -0.2279234 4.453189e-15 2048

chest_clogit: Conditional logistic regression: ‘clogit’ of ‘survival’ package

chest_clogit(crude = "Endpoint ~ Diabetes + strata(mid)", 
             xlist = vlist, data = diab_df, zero = 1)

#>       variables       OR       lb       ub    Change            p    n
#> 1         Crude 2.586950 1.719871 3.891170        NA 5.033866e-06 2372
#> 2      + Income 2.850010 1.752942 4.633671 10.168718 2.405822e-05 2061
#> 3     + Married 3.133480 1.875838 5.234301  9.946296 1.283423e-05 2058
#> 4   + Education 3.030468 1.810620 5.072149 -3.287484 2.452619e-05 2048
#> 5       + Smoke 3.128331 1.839469 5.320260  3.229314 2.559384e-05 2048
#> 6         + Age 3.212487 1.883223 5.480007  2.690121 1.844153e-05 2048
#> 7         + CVD 3.148114 1.824571 5.431754 -2.003848 3.776568e-05 2048
#> 8      + Cancer 3.100427 1.790709 5.368067 -1.514782 5.340664e-05 2048
#> 9         + Sex 3.100427 1.790709 5.368067  0.000000 5.340664e-05 2048

Notes: