Often I use R to handle large datasets, analyze the data and filter out the data I don’t need.
When all this is done, I usually use write.csv()
to print my data off and reopen it in Google Sheets.
My workflow would look something like this:
full_data <- read.csv("some_dataset.csv") #R analysis ending up with relevant_data write.csv(relevant_data, "relevant_data.csv") #continue work in Google Sheets
However, there’s an R package that provides a bridge between your Google account and your R environment: googlesheets.
Using this package we can read data directly from Google, modify it and create new files within our Google Drive.
Step 1: Install googlesheets
install.packages(googlesheets) library(googlesheets)
Step 2: Authenticate your Google account
Before we can do anything we need to allow google sheets to access our account.
We can do this by running this:
gs_auth(new_user = TRUE)
Have a browser open (Google Chrome worked for me) and it should open a new tab asking you to connect via an account:
Click on an account below this message and then ‘allow’ and it should take you to a page saying it has worked and to go back to R.
You can rerun this command any time you want to change accounts.
Sometimes if you don’t use the token for a while it will run out and you will have to refresh it, which it will initiate automatically if you run a command that requires you to connect to the Google API (i.e. any of the specialised googlesheets functions).
Step 3: See what’s in your Google Account
Calling the function gs_ls() will show you spreadsheets in your account.
gs_ls() # A tibble: 15 x 10 sheet_title author perm version updated sheet_key 1 for googlesheets rforjournali… rw new 2017-12-11 09:44:54 1Y0WCfTW… 2 Avon and Somerset Septe… rforjournali… rw new 2017-11-19 12:46:55 1TfC5Fs6… 3 Mid year 2015 UK popula… rforjournali… rw new 2017-10-25 21:19:52 1Vqg560s… 4 Cleveland 2016-09 rforjournali… rw new 2016-11-26 10:18:16 19xBr8nU… 5 Rankings of US presiden… rforjournali… rw new 2016-11-08 04:39:55 11PZxq7y… 6 Tennis #1s rforjournali… rw new 2016-11-06 20:04:42 1Riz8GRs… 7 Young persons railcard rob.grant rw new 2016-11-06 13:05:55 1XZsjJxu… 8 Copy of Young persons r… rforjournali… rw new 2016-11-05 18:14:38 1oUpRS-D… 9 defective rforjournali… rw new 2016-11-05 11:40:30 1jWZBILC… 10 Asylum rforjournali… rw new 2016-10-27 19:04:05 1CRMl2_1… 11 Buses rforjournali… rw new 2016-10-24 20:07:41 1qy9Z-sn… 12 Untitled spreadsheet rforjournali… rw new 2016-10-24 19:22:42 1_f_FI5n… 13 Population rforjournali… rw new 2016-10-24 18:29:17 1rrOQuV5… 14 Drugs rforjournali… rw new 2016-10-18 21:37:29 1UTsnGM6… 15 Food rforjournali… rw new 2016-10-15 13:24:30 1aWEAPR4…
Step 4: Read a spreadsheet
I am going to select the first spreadsheet ‘for googlesheets’ by its title. It’s a selection of 50 random numbers between 0 and 1 (you can recreate this function with runif()
in R).
for_gs <- gs_title("for googlesheets")
You can also locate the sheet by the key (the letters, numbers and characters after the /d/ in the URL) for the same result
for_gs <- gs_key("your_key_here")
This gives us a list, which we can turn into a data frame using the gs_read()
command.
for_gs_sheet <- gs_read(for_gs) str(for_gs_sheet) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 50 obs. of 2 variables: $ Number.x: num 0.4696 0.1587 0.0949 0.1823 0.0885 ... $ Number.y: num 0.67551 0.7041 0.00167 0.51302 0.20114 ... - attr(*, "spec")=List of 2 ..$ cols :List of 2 .. ..$ Number.x: list() .. .. ..- attr(*, "class")= chr "collector_double" "collector" .. ..$ Number.y: list() .. .. ..- attr(*, "class")= chr "collector_double" "collector" ..$ default: list() .. ..- attr(*, "class")= chr "collector_guess" "collector" ..- attr(*, "class")= chr "col_spec"
Step 5: Modify the spreadsheet
Next up, we modify our spreadsheet using the gs_edit_cells()
function.
This function has several arguments that we need to employ to edit our spreadsheet properly.
gs_edit_cells(for_gs, ws = "Sheet1", anchor = "A2", input = c(1,2), byrow = TRUE)
The ws
argument refers to the sheet name in the spreadsheet. The anchor
argument refers to the cell from which the modification will begin. In my example, I am editing two cells, where the first one will be the anchor cell A2. The byrow
argument indicates that the modification will apply horizontally (change to FALSE for vertical editing).
Note that this won’t change our data frame for_gs_sheet
that is based on this spreadsheet; just the spreadsheet itself.
Cell A2 now has a value of 1. A3 is 2.
Step 6: Create a Google Sheets file using R
We can create new spreadsheets using this package using gs_new()
.
We’ll use the mtcars
dataset as a test:
gs_new(title = "mtcars", ws_title = "first_sheet", input = mtcars)
It worked, except it didn’t include the rownames, which contains the cars.
That doesn’t matter, we can add them using gs_edit_cells()
, changing the byrow
argument to FALSE this time.
#register the new mtcars sheet in R mtcars_sheet <- gs_new(title = "mtcars", ws_title = "first_sheet", input = mtcars) #insert the rownames vertically in column L gs_edit_cells(mtcars_sheet, ws = "first_sheet", anchor = "L2", input = rownames(mtcars), byrow = FALSE)
Final thoughts
That was a quick overview of the most basic functions of the google sheets package.
This is a really useful package. A lot of my work involves reading data in Google Sheets either before or after using R.
Googlesheets means I won’t have to bother with read.csv()
or write.csv()
as much in the future, saving me time.
So thanks to Jenny Bryan for creating it!