In this post, I will show you the advantages of using heatmap
to visualize data. The key feature of the heatmap in visualizing the data is the intensity of color across two variables of interest (i.e., X and Y). This is very useful when you want to show a general view of your variables.
The task of this analysis is to visualize the BMI across age and race in Americans using NHANES data.
Load the libraries
library(RNHANES)
library(tidyverse)
Select the dataset from NHANES:
dt1314 = nhanes_load_data("DEMO_H", "2013-2014") %>%
select(SEQN, cycle, RIDAGEYR, RIDRETH1, INDFMIN2) %>%
transmute(SEQN=SEQN, wave=cycle, Age=RIDAGEYR, RIDRETH1, INDFMIN2) %>%
left_join(nhanes_load_data("BMX_H", "2013-2014"), by="SEQN") %>%
select(SEQN, wave, Age, RIDRETH1, INDFMIN2, BMXBMI)
Recode and modify variables
I manipulate the data by including those older than 18 years old and remove missings in BMI. Also, I do some rename and recoding.
dat = dt1314 %>%
filter(Age > 18, !is.na(BMXBMI)) %>%
rename(BMI = BMXBMI) %>%
mutate(Race = recode_factor(RIDRETH1,
`1` = "Mexian American",
`2` = "Hispanic",
`3` = "Non-Hispanic, White",
`4` = "Non-Hispanic, Black",
`5` = "Others"))
Visualization
Now, when I visualize the data across two variables, the first thing that comes to my mind is to use a line or point plots.
geom_line
ggplot(dat, aes(x = Age, y = BMI)) +
geom_line(aes(color = Race))
It is difficult to grasp anything in the plot above.
Let try to use the function facet_wrap
to distinguish the race from each other.
facet_wrap
ggplot(dat, aes(x = Age, y = BMI)) +
geom_line(aes(color = Race)) +
facet_wrap(~Race)
This plot is better, but yet, it would be good to have in one figure.
Heatmap
The geom_raster
is the function to build a heatmap.
ggplot(dat, aes(Age, Race)) +
geom_raster(aes(fill = BMI))
To give your own colors use the scale_fill_gradientn
function.
ggplot(dat, aes(Age, Race)) +
geom_raster(aes(fill = BMI)) +
scale_fill_gradientn(colours=c("white", "red"))
With this plot, first, I can distinguish the highest BMI immediately across age and race. Second, it is easy to compare the values of BMI by race for a given age. Third, all this information is in one plot.
If you have a suggestion on visualizing the data or if I miss any critical function of ggplot2
, please comment below.