How to show characteristics of study population in R with a single line of code

An advantage of R is the availability of distinct packages just for completing a particular task. For example, there is a package named tableone which create table one will less coding efforts. If you are not familiar with Table 1, it is used in research to describe the characteristics of the study population, such as calculating means, standard deviations, and proportions for categorical data.

Load the libraries

library(tidyverse)
library(RNHANES)
library(tableone)
library(labelled)

I will import the data from NHANES. The variables I selected are:

SEQN: id
RIAGENDR: gender
RIDAGEYR: age in years
BMXBMI: body mass index
LBXVIDMS: vitamin D
RIDRETH1: race/ethnicity
DIQ010: diabetes yes/no

dat = nhanes_load_data("DEMO_F", "2009-2010") %>%
  select(SEQN, RIAGENDR, RIDAGEYR, RIDRETH1) %>%
  left_join(nhanes_load_data("DIQ_F", "2009-2010"), by="SEQN") %>%
  select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010) %>% 
  left_join(nhanes_load_data("BMX_F", "2009-2010"), by="SEQN") %>% 
  select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010, BMXBMI ) %>% 
  left_join(nhanes_load_data("VID_F", "2009-2010"), by="SEQN") %>% 
  select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010, BMXBMI, LBXVIDMS) %>% 
  mutate(gender = recode_factor(RIAGENDR, 
                           `1` = "Males", 
                           `2` = "Females"),
         BMI = as.factor(if_else(BMXBMI >= 25, "Overweight", "Normal weight")),
         dm = recode_factor(DIQ010,  
                           `1` = "Yes", 
                           `2` = "No"),
         race = recode_factor(RIDRETH1, 
                         `1` = "Hispanic", 
                         `2` = "Hispanic", 
                         `3` = "White", 
                         `4` = "Black", 
                         `5` = "Others")) %>% 
  select(SEQN, RIDAGEYR, LBXVIDMS, gender, BMI, dm, race)

Table One

OK, the dataset is ready. To create a quick and simple table one, run the following code.

dat %>% 
  CreateTableOne(vars = select(dat, -SEQN) %>% names(), data = .) %>% 
  kableone()

	Overall
n	10537
RIDAGEYR (mean (SD))	32.60 (24.91)
LBXVIDMS (mean (SD))	64.06 (24.41)
gender = Females (%)	5312 (50.4)
BMI = Overweight (%)	4880 (51.8)
dm = No (%)	9230 (92.6)
race (%)
Hispanic	3517 (33.4)
White	4420 (41.9)
Black	1957 (18.6)
Others	643 ( 6.1)

I am using this select(dat, -SEQN) %>% names() to get the names for all variables except the id (SEQN). If you want to exclude other variables from the table one, use the sign -.

However, in table one for publication, I need to show the labels. For this, I will use the package labelled.

var_label(dat) = list(
  RIDAGEYR = "Age, years",
  BMI = "BMI",
  LBXVIDMS = "Vitamin D",
  gender = "Females",
  dm = "Diabetes ")

This is how the table one looks alike with variable labels:

dat %>% 
    CreateTableOne(
    vars = select(dat, -SEQN) %>% names(), 
    test = FALSE,
    data = .) -> tab_one
print(tab_one, varLabels = TRUE)
##                         
##                          Overall      
##   n                      10537        
##   Age, years (mean (SD)) 32.60 (24.91)
##   Vitamin D (mean (SD))  64.06 (24.41)
##   Females = Females (%)   5312 (50.4) 
##   BMI = Overweight (%)    4880 (51.8) 
##   Diabetes  = No (%)      9230 (92.6) 
##   race (%)                            
##      Hispanic             3517 (33.4) 
##      White                4420 (41.9) 
##      Black                1957 (18.6) 
##      Others                643 ( 6.1)

Often, I view the characteristics of population stratified by gender. In other words, I compare men with women.

dat %>% 
  CreateTableOne(
    vars = select(dat, -SEQN, -gender) %>% names(),
    strata ="gender",
    data = .,
    test = FALSE) -> tab_one
print(tab_one, varLabels = TRUE) 
##                         Stratified by gender
##                          Males         Females      
##   n                       5225          5312        
##   Age, years (mean (SD)) 32.24 (24.94) 32.95 (24.87)
##   Vitamin D (mean (SD))  63.91 (22.22) 64.22 (26.39)
##   BMI = Overweight (%)    2427 (51.9)   2453 (51.8) 
##   Diabetes  = No (%)      4573 (92.4)   4657 (92.8) 
##   race (%)                                          
##      Hispanic             1739 (33.3)   1778 (33.5) 
##      White                2202 (42.1)   2218 (41.8) 
##      Black                 972 (18.6)    985 (18.5) 
##      Others                312 ( 6.0)    331 ( 6.2)

To show the p value of comparison, change test=TRUE.

To add more style to your table I suggest to take a look at kableExtra package.

Data ManagementTips & Tricks