To create a new variable or to transform an old variable into a new one, usually, is a simple task in R.
The common function to use is newvariable <- oldvariable
. Variables are always added horizontally in a data frame. Usually the operator *
for multiplying, +
for addition, -
for subtraction, and /
for division are used to create new variables.
Let create a dataset:
hospital <- c("New York", "California") patients <- c(150, 350) costs <- c(3.1, 2.5) df <- data.frame(hospital, patients, costs)
The dataset we created is called df
:
df hospital patients costs New York 150 3.1 California 350 2.5
Now we will create a new variable called totcosts
as showing below:
df$totcosts <- df$patients * df$costs
Let see the dataset again:
df hospital patients costs totcosts New York 150 3.1 465 California 350 2.5 875
Now we are interested to rename and recode a variable in R.
Using dataset above we rename the variable:
df$costs_euro <- df$costs
Or we can also delete the variable by using command NULL
:
df$costs <- NULL
Now we see the dataset again:
df hospital patients costs_euro New York 150 3.1 California 350 2.5
Here is an example how to recode variable patients:
df$patients <- ifelse(df$patients==150, 100, ifelse(df$patients==350, 300, NA))
Let see the dataset again:
df hospital patients costs New York 100 3.1 California 300 2.5
For recoding variable I used the function ifelse()
, but you can use other functions as well.
Merging datasets
Merging datasets means to combine different datasets into one. If datasets are in different locations, first you need to import in R as we explained previously. You can merge columns, by adding new variables; or you can merge rows, by adding observations.
To add columns use the function merge()
which requires that datasets you will merge to have a common variable. In case that datasets doesn't have a common variable use the function cbind
. However, for the function cbind
is necessary that both datasets to be in same order.
Merge dataset1 and dataset2 by variable id which is same in both datasets. Using the code below we are adding new columns:
finaldt <- merge(dataset1, dataset2, by="id")
Or we can merge datasets by adding columns when we know that both datasets are correctly ordered:
finaldt <- cbind(dataset1, dataset2)
To add rows use the function rbind
. When you merge datasets by rows is important that datasets have exactly the same variable names and the same number of variables.
Here an example merge datasets by adding rows
finaldt <- rbind(dataset1, dataset2)
Do you have any questions, post comment below?