How to do regression analysis for multiple independent or dependent variables

In this post, I will show how to run a linear regression analysis for multiple independent or dependent variables. You should not be confused with the multivariable-adjusted model. This tutorial is not about multivariable models.

This post is to show how to do a regression analysis automatically when you want to investigate more than one independent or dependent at the time, and to avoid repetition of many separate models as it is shown below:

# lm(independent ~ dependent1 + var1 + var2, data=data)
# lm(independent ~ dependent2 + var1 + var2, data=data)
# lm(independent ~ dependent3 + var1 + var2, data=data)

Such investigations are favorite in genetics, but also it comes handy when I have several variables to investigate. I have seen previously similar posts which build loops to run regression analysis for multiple variables. Loops are challenging to create and edit for analysis.

Below I will show how you can quickly achieve the same results by using the broom and other packages.

Load the libraries

library(reshape)
library(tidyverse)
library(broom)

Create the dataset

This dataset have 7 variables including id, age, sex, bmi, sbp, dsp, and gluc.

set.seed(3319)
dt <- data.frame(
  id = 1:100, 
  age = rnorm(100, mean = 65, sd = 8), 
  sex = sample(c("Women", "Men"), 100, replace=TRUE, prob=c(0.4, 0.6)), 
  bmi = rnorm(100, mean=27, sd = 4), 
  sbp =rnorm(100, mean=145, sd = 30),
  dbp = rnorm(100, mean=80, sd=20),
  gluc = rnorm(100, mean=140, sd=50)
  )
dt %>% 
  slice(1:5)
##   id      age   sex      bmi      sbp      dbp      gluc
## 1  1 62.93794 Women 29.17697 114.7606 80.10420  82.78492
## 2  2 77.51981 Women 32.85751 162.5616 84.43917 109.96731
## 3  3 63.61055   Men 38.33631 181.8345 67.32783 184.93135
## 4  4 54.69469 Women 27.92725 115.0467 97.35000 208.07860
## 5  5 63.26105 Women 21.67979 156.6037 69.39819 149.68155

Let's suppose I am interested in the association between bmi (dependent) with gluc, sbp, and dsp. Normally, I have to create 3 separate models for each independent variable in this analysis. See example below:

# lm(bmi ~ gluc + age + sex, data=dt)
# lm(bmi ~ sbp + age + sex, data=dt)
# lm(bmi ~ dsp + age + sex, data=dt)

While 3 models are durable, 100 models will be extremely time-consuming. So it would be necessary to make it automatically. The first step is to transform the dataset from wide to long. The variable which I am interested to evaluate will be transformed in a long format. In this example are gluc, sbp, and dsp, so I keep the rest at id.vars.

The code below transform the dataset from wide to long using the function melt.

long = melt(dt, id.vars = c("id", "bmi", "age", "sex"))
long %>% 
  slice(1:5)
##   id      bmi      age   sex variable    value
## 1  1 29.17697 62.93794 Women      sbp 114.7606
## 2  2 32.85751 77.51981 Women      sbp 162.5616
## 3  3 38.33631 63.61055   Men      sbp 181.8345
## 4  4 27.92725 54.69469 Women      sbp 115.0467
## 5  5 21.67979 63.26105 Women      sbp 156.6037

Two new columns are created, one column shows the variable name and the other column the value which presents the value for the specific variable. If transforming from wide to long is not clear go back to dt dataset and compare the values with long.

Linear regression

Now that the dataset is ready I will run a linear regression by the group. The group_by variable includes gluc, sbp, and dsp. The value that you see in the model has all the values for the variables in the group. In addition, I do some other coding such as filter by term which select only our variables of interest and recode the estimate, std.error, and p.value.

long %>% 
  group_by(variable) %>% 
  do(tidy(lm(bmi ~ value + age + sex, .))) %>% 
  filter(term == "value") %>% 
  mutate(Beta = as.character(round(estimate, 3)), "P Value" = round(p.value, 3), SE = round(std.error, 3)) %>% 
  select(Beta, SE, "P Value") %>% 
  as.data.frame()
##   variable   Beta    SE P Value
## 1      sbp -0.001 0.013   0.965
## 2      dbp  -0.02 0.020   0.313
## 3     gluc -0.001 0.007   0.883

Having the results of regression analysis in this form allows me to transport to a table or to a PDF report.

If you have questions or comments about this post please let me know.

Linear RegressionLoopTips & Tricks