How to Create, Rename, Recode and Merge Variables in R

To create a new variable or to transform an old variable into a new one, usually, is a simple task in R.

The common function to use is newvariable <- oldvariable. Variables are always added horizontally in a data frame. Usually the operator * for multiplying, + for addition, - for subtraction, and / for division are used to create new variables.

Let create a dataset:

hospital <- c("New York", "California")
patients <- c(150, 350)
costs <- c(3.1, 2.5)
df <- data.frame(hospital, patients, costs)

The dataset we created is called df:

df
hospital patients costs 
New York      150   3.1 
California    350   2.5

Now we will create a new variable called totcosts as showing below:

df$totcosts <- df$patients * df$costs

Let see the dataset again:

df
hospital patients costs totcosts 
New York      150   3.1      465 
California    350   2.5      875

Now we are interested to rename and recode a variable in R.
Using dataset above we rename the variable:

df$costs_euro <- df$costs

Or we can also delete the variable by using command NULL:

df$costs <- NULL

Now we see the dataset again:

df
hospital patients costs_euro 
New York      150        3.1 
California    350        2.5

Here is an example how to recode variable patients:

df$patients <- ifelse(df$patients==150, 100, ifelse(df$patients==350, 300, NA))

Let see the dataset again:

df
hospital patients costs
New York      100   3.1 
California    300   2.5

For recoding variable I used the function ifelse(), but you can use other functions as well.

Merging datasets

Merging datasets means to combine different datasets into one. If datasets are in different locations, first you need to import in R as we explained previously. You can merge columns, by adding new variables; or you can merge rows, by adding observations.

To add columns use the function merge() which requires that datasets you will merge to have a common variable. In case that datasets doesn't have a common variable use the function cbind. However, for the function cbind is necessary that both datasets to be in same order.

Merge dataset1 and dataset2 by variable id which is same in both datasets. Using the code below we are adding new columns:

finaldt <- merge(dataset1, dataset2, by="id")

Or we can merge datasets by adding columns when we know that both datasets are correctly ordered:

finaldt <- cbind(dataset1, dataset2)

To add rows use the function rbind. When you merge datasets by rows is important that datasets have exactly the same variable names and the same number of variables.

Here an example merge datasets by adding rows

finaldt <- rbind(dataset1, dataset2)

Do you have any questions, post comment below?

Data Manipulation