vandomed@gmail.com

The purpose of **tab** is to make it easier to create tables for papers, including Table 1’s showing characteristics of the sample and summary tables for fitted regression models. Currently, the following functions are included:

`tabmeans`

compares means in two or more groups.`tabmedians`

compares medians in two or more groups.`tabfreq`

compares frequencies in two or more groups.`tabmulti`

compares multiple variables in two or more groups.`tabmeans.svy`

,`tabmedians.svy`

,`tabfreq.svy`

, and`tabmulti.svy`

serve the same purposes as the above functions, but for complex survey data.`tabglm`

summarizes generalized linear models (GLM’s) fit via`glm`

or`survey::svyglm`

.`tabgee`

summarizes generalized estimating equation models (GEE’s) fit via`gee::gee`

.`tabcoxph`

summarizes Cox Proportional Hazards models fit via`survival::coxph`

or`survey::svycoxph`

.

You can use `tabmulti`

to compare characteristics across levels of a factor variable, e.g. here comparing age, sex, and race by treatment group in the toy dataset `tabdata`

.

`tabmulti(Age + Sex + Race ~ Group, data = tabdata) %>% kable()`

Variable | Control | Treatment | P |
---|---|---|---|

Age, M (SD) | 70.5 (5.3) | 69.5 (5.9) | 0.15 |

Sex, n (%) | <0.001 | ||

Female | 93 (68.4) | 62 (38.5) | |

Male | 43 (31.6) | 99 (61.5) | |

Race, n (%) | 0.29 | ||

White | 46 (34.1) | 65 (39.6) | |

Black | 36 (26.7) | 52 (31.7) | |

Mexican American | 21 (15.6) | 19 (11.6) | |

Other | 32 (23.7) | 28 (17.1) |

To illustrate some options, we can request `Age`

and `Race`

to print as `Age (years)`

and `Race/ethnicity`

, compare medians rather than means for age, and include the sample sizes in the column headings:

```
tabmulti(Age + Sex + Race ~ Group, data = tabdata,
yvarlabels = list(Age = "Age (years)", Race = "Race/ethnicity"),
ymeasures = c("median", "freq", "freq"),
listwise.deletion = TRUE,
n.headings = TRUE) %>% kable()
```

Variable | Control (n = 134) | Treatment (n = 158) | P |
---|---|---|---|

Age (years), Median (IQR) | 70.0 (9.8) | 69.0 (11.0) | 0.19 |

Sex, n (%) | <0.001 | ||

Female | 92 (68.7) | 60 (38.0) | |

Male | 42 (31.3) | 98 (62.0) | |

Race/ethnicity, n (%) | 0.26 | ||

White | 46 (34.3) | 64 (40.5) | |

Black | 36 (26.9) | 50 (31.6) | |

Mexican American | 21 (15.7) | 17 (10.8) | |

Other | 31 (23.1) | 27 (17.1) |

Logistic regression for 1-year mortality vs. age, sex, and treatment, with the binary factor variables displayed in a “compressed” format:

```
fit <- glm(death_1yr ~ Age + Sex + Group, data = tabdata, family = binomial)
fit %>% tabglm(factor.compression = 5) %>% kable()
```

Variable | Beta (SE) | OR (95% CI) | P |
---|---|---|---|

Intercept | -2.02 (1.76) | - | 0.25 |

Age | 0.02 (0.02) | 1.02 (0.97, 1.07) | 0.50 |

Male | 0.11 (0.29) | 1.12 (0.63, 1.97) | 0.70 |

Treatment | -0.04 (0.29) | 0.96 (0.54, 1.69) | 0.88 |

GEE for high blood pressure (measured at 3 time points longitudinally) vs. various predictors, with some higher-order terms:

```
tabdata2 <- reshape(data = tabdata,
varying = c("bp.1", "bp.2", "bp.3", "highbp.1", "highbp.2", "highbp.3"),
timevar = "bp.visit", direction = "long")
tabdata2 <- tabdata2[order(tabdata2$id), ]
fit <- gee(highbp ~ poly(Age, 2, raw = TRUE) + Sex + Race + Race*Sex,
id = id, data = tabdata2, family = "binomial", corstr = "unstructured")
fit %>% tabgee(data = tabdata2) %>% kable()
```

Variable | Beta (SE) | OR (95% CI) | P |
---|---|---|---|

Intercept | -3.10 (14.84) | - | 0.83 |

Age | 0.06 (0.43) | 1.06 (0.46, 2.45) | 0.89 |

Age squared | -0.00 (0.00) | 1.00 (0.99, 1.01) | 0.88 |

Sex | |||

Female (ref) | - | - | - |

Male | 0.48 (0.29) | 1.61 (0.91, 2.84) | 0.10 |

Race | |||

White (ref) | - | - | - |

Black | 0.04 (0.32) | 1.04 (0.56, 1.95) | 0.90 |

Mexican American | 0.13 (0.38) | 1.14 (0.55, 2.39) | 0.72 |

Other | -0.83 (0.37) | 0.43 (0.21, 0.89) | 0.02 |

Sex by Race | |||

Male, Black | 0.23 (0.42) | 1.26 (0.55, 2.87) | 0.58 |

Male, Mexican American | 0.27 (0.54) | 1.31 (0.46, 3.75) | 0.61 |

Male, Other | 1.11 (0.51) | 3.05 (1.12, 8.25) | 0.03 |

Note that we had to set `data = tabdata2`

here, because `gee`

objects don’t store all of the information on factor variables (unlike `glm`

objects).

Survival model for mortality vs. predictors, again compressing the factor variables, and requesting slightly differnet columns (i.e. no p-values):

```
library("survival")
fit <- coxph(Surv(time = time, event = delta) ~ Age + Sex + Group, data = tabdata)
fit %>% tabcoxph(factor.compression = 5, columns = c("beta", "hr.ci")) %>% kable()
```

Variable | Beta | HR (95% CI) |
---|---|---|

Age | 0.03 | 1.03 (1.00, 1.06) |

Male | 0.01 | 1.01 (0.74, 1.39) |

Treatment | -0.05 | 0.95 (0.69, 1.30) |

The functions in **tab** can also accommodate complex survey data. To illustrate with the included dataset `tabsvydata`

(which is data from NHANES 2003-2004, except for the made-up variables `time`

and `event`

), here’s a Table 1:

```
library("survey")
design <- svydesign(
data = tabsvydata,
ids = ~sdmvpsu,
strata = ~sdmvstra,
weights = ~wtmec2yr,
nest = TRUE
)
tabmulti.svy(Age + Race + BMI ~ Sex, design = design) %>% kable()
```

Variable | Female | Male | P |
---|---|---|---|

Age, M (SD) | 37.0 (22.5) | 34.8 (21.7) | <0.001 |

Race, % (SE) | 0.08 | ||

Non-Hispanic White | 69.7 (3.7) | 69.6 (3.8) | |

Non-Hispanic Black | 13.2 (2.0) | 11.9 (1.9) | |

Mexican American | 8.6 (2.1) | 9.8 (2.2) | |

Other | 8.4 (1.0) | 8.8 (1.3) | |

BMI, M (SD) | 26.4 (7.5) | 26.0 (6.4) | 0.11 |

And here’s a linear regression:

```
fit <- svyglm(BMI ~ Age + Sex + Race, design = design)
fit %>% tabglm(factor.compression = 3) %>% kable()
```

Variable | Beta (SE) | 95% CI | P |
---|---|---|---|

Intercept | 20.95 (0.34) | (20.27, 21.62) | <0.001 |

Age | 0.14 (0.00) | (0.13, 0.15) | <0.001 |

Female (ref) | - | - | - |

Male | -0.07 (0.23) | (-0.51, 0.37) | 0.76 |

Non-Hispanic White (ref) | - | - | - |

Non-Hispanic Black | 1.91 (0.23) | (1.46, 2.35) | <0.001 |

Mexican American | 1.06 (0.30) | (0.47, 1.66) | 0.006 |

Other | -1.09 (0.33) | (-1.73, -0.45) | 0.007 |

All of the functions in **tab** have an argument called `print.html`

which can be used to export tables to word processors. Setting `print.html = TRUE`

will result in a HTML table being output to your current working directory. You can open the table (e.g. in Chrome) and copy/paste into your report.

I used **knitr**’s `kable`

function for the examples here, but other approaches should also work (e.g. **xtable**’s `xtable`

or **pandoc**’s `pandoc.table`

).

Lumley, Thomas. 2019. *Survey: Analysis of Complex Survey Samples*. https://CRAN.R-project.org/package=survey.

Lumley, Thomas, and others. 2004. “Analysis of Complex Survey Samples.” *Journal of Statistical Software* 9 (1): 1–19.

R by Thomas Lumley, Vincent J Carey. Ported to, and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. 2015. *Gee: Generalized Estimation Equation Solver*. https://CRAN.R-project.org/package=gee.

Terry M. Therneau, and Patricia M. Grambsch. 2000. *Modeling Survival Data: Extending the Cox Model*. New York: Springer.

Therneau, Terry M. 2015. *A Package for Survival Analysis in S*. https://CRAN.R-project.org/package=survival.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In *Implementing Reproducible Computational Research*, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

———. 2015. *Dynamic Documents with R and Knitr*. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.

———. 2018. *Knitr: A General-Purpose Package for Dynamic Report Generation in R*. https://yihui.name/knitr/.