Have you ever struggled to import hundred of small datasets files? Can be very time consuming or maybe impossible. I was in this situation some time ago when I had a folder with approximately three thousand CSV files, and I was interested in creating a single dataset.
At the time I was thinking to create a for loop for importing each file separately and then to merge all small datasets.
# file1 = read_csv("file1.csv")
# file2 = read_csv("file2.csv")
# file3 = read_csv("file3.csv")
I didn't know how that would work, or even it would be possible to merge 3000 datasets easily. Anyway, I started searching for similar questions, and I don't remember that I found something helpful until I discovered the plyr
package. I am happy to share it with you. There are no many codes.
Load the package
library(plyr)
library(readr)
For this post, I created 3 CSV files and put them in a folder (i.e., cvsfolder) in my desktop. You can do the same if you want to replicate this post. I set the directory in R and used the function list.files
to list all files in folder with extension CSV.
setwd("~/Desktop")
mydir = "csvfolder"
myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
myfiles
## [1] "csvfolder/file1.csv" "csvfolder/file2.csv" "csvfolder/file3.csv"
In the R Studio environment, I have only the location of CSV files; no file is uploaded yet. To upload all files and create a dataset will use ldply
and applied the read_csv
function.
dat_csv = ldply(myfiles, read_csv)
dat_csv
## a b c
## 1 1 34 98
## 2 23 55 10
## 3 43 67 3
## 4 32 21 56
## 5 34 23 57
## 6 31 24 58
## 7 43 65 77
## 8 45 63 78
## 9 57 61 79
Done!
You can apply the same function for importing .txt files as well. The function read.table
shall be used for .txt files. See code below:
# dat_txt = ldply(myfiles, read.table, sep = "\t", fill=TRUE, header = TRUE)
Extra
Below I will import each file separately to show that the dataset and variable names correspondent with the dat_csv
above.
read_csv(myfiles[1])
## # A tibble: 3 x 3
## a b c
## <int> <int> <int>
## 1 1 34 98
## 2 23 55 10
## 3 43 67 3
read_csv(myfiles[2])
## # A tibble: 3 x 3
## a b c
## <int> <int> <int>
## 1 32 21 56
## 2 34 23 57
## 3 31 24 58
read_csv(myfiles[3])
## # A tibble: 3 x 3
## a b c
## <int> <int> <int>
## 1 43 65 77
## 2 45 63 78
## 3 57 61 79
I hope you learned something new today and share it with your peers. Who knows it may be helpful for someone else.