An advantage of R is the availability of distinct packages just for completing a particular task. For example, there is a package named tableone
which create table one will less coding efforts. If you are not familiar with Table 1, it is used in research to describe the characteristics of the study population, such as calculating means, standard deviations, and proportions for categorical data.
Load the libraries
library(tidyverse)
library(RNHANES)
library(tableone)
library(labelled)
I will import the data from NHANES. The variables I selected are:
- SEQN: id
- RIAGENDR: gender
- RIDAGEYR: age in years
- BMXBMI: body mass index
- LBXVIDMS: vitamin D
- RIDRETH1: race/ethnicity
- DIQ010: diabetes yes/no
dat = nhanes_load_data("DEMO_F", "2009-2010") %>%
select(SEQN, RIAGENDR, RIDAGEYR, RIDRETH1) %>%
left_join(nhanes_load_data("DIQ_F", "2009-2010"), by="SEQN") %>%
select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010) %>%
left_join(nhanes_load_data("BMX_F", "2009-2010"), by="SEQN") %>%
select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010, BMXBMI ) %>%
left_join(nhanes_load_data("VID_F", "2009-2010"), by="SEQN") %>%
select(SEQN, RIAGENDR,RIDAGEYR, RIDRETH1, DIQ010, BMXBMI, LBXVIDMS) %>%
mutate(gender = recode_factor(RIAGENDR,
`1` = "Males",
`2` = "Females"),
BMI = as.factor(if_else(BMXBMI >= 25, "Overweight", "Normal weight")),
dm = recode_factor(DIQ010,
`1` = "Yes",
`2` = "No"),
race = recode_factor(RIDRETH1,
`1` = "Hispanic",
`2` = "Hispanic",
`3` = "White",
`4` = "Black",
`5` = "Others")) %>%
select(SEQN, RIDAGEYR, LBXVIDMS, gender, BMI, dm, race)
Table One
OK, the dataset is ready. To create a quick and simple table one, run the following code.
dat %>%
CreateTableOne(vars = select(dat, -SEQN) %>% names(), data = .) %>%
kableone()
Overall | |
---|---|
n | 10537 |
RIDAGEYR (mean (SD)) | 32.60 (24.91) |
LBXVIDMS (mean (SD)) | 64.06 (24.41) |
gender = Females (%) | 5312 (50.4) |
BMI = Overweight (%) | 4880 (51.8) |
dm = No (%) | 9230 (92.6) |
race (%) | |
Hispanic | 3517 (33.4) |
White | 4420 (41.9) |
Black | 1957 (18.6) |
Others | 643 ( 6.1) |
I am using this select(dat, -SEQN) %>% names()
to get the names for all variables except the id (SEQN). If you want to exclude other variables from the table one, use the sign -
.
However, in table one for publication, I need to show the labels. For this, I will use the package labelled
.
var_label(dat) = list(
RIDAGEYR = "Age, years",
BMI = "BMI",
LBXVIDMS = "Vitamin D",
gender = "Females",
dm = "Diabetes ")
This is how the table one looks alike with variable labels:
dat %>%
CreateTableOne(
vars = select(dat, -SEQN) %>% names(),
test = FALSE,
data = .) -> tab_one
print(tab_one, varLabels = TRUE)
##
## Overall
## n 10537
## Age, years (mean (SD)) 32.60 (24.91)
## Vitamin D (mean (SD)) 64.06 (24.41)
## Females = Females (%) 5312 (50.4)
## BMI = Overweight (%) 4880 (51.8)
## Diabetes = No (%) 9230 (92.6)
## race (%)
## Hispanic 3517 (33.4)
## White 4420 (41.9)
## Black 1957 (18.6)
## Others 643 ( 6.1)
Often, I view the characteristics of population stratified by gender. In other words, I compare men with women.
dat %>%
CreateTableOne(
vars = select(dat, -SEQN, -gender) %>% names(),
strata ="gender",
data = .,
test = FALSE) -> tab_one
print(tab_one, varLabels = TRUE)
## Stratified by gender
## Males Females
## n 5225 5312
## Age, years (mean (SD)) 32.24 (24.94) 32.95 (24.87)
## Vitamin D (mean (SD)) 63.91 (22.22) 64.22 (26.39)
## BMI = Overweight (%) 2427 (51.9) 2453 (51.8)
## Diabetes = No (%) 4573 (92.4) 4657 (92.8)
## race (%)
## Hispanic 1739 (33.3) 1778 (33.5)
## White 2202 (42.1) 2218 (41.8)
## Black 972 (18.6) 985 (18.5)
## Others 312 ( 6.0) 331 ( 6.2)
To show the p value of comparison, change test=TRUE
.
To add more style to your table I suggest to take a look at kableExtra
package.