The most important and primary step in Data Analysis is gathering data from all possible sources (Primary or Secondary). Data can be available in all sorts of formats ranging from flat files like (.txt,.csv) to exotic file formats like excel. These files may be stored locally in your system or in your working directory. Packages like utils of Base R, readR
, data.table
, XLconnect
can be used to expose some very important methods to access such locally saved files.
But there may be a scenario where those files are stored at some remote server (location) . Also the data is no longer present in expected file formats like .txt
, .csv
, .excel
. In such cases, the most common format in which data is stored on the Web can be json
, xml
, html
. This is where Accessing Web data in R comes in picture.
We refer such data as Web data and the exposed file path which is nothing but the url to access the Web data is referred to as an API. So when want to access and work on Web Data in our R studio we invoke/consume the corresponding API using HTTP clients in R.
HTTP: Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers. There are many possible HTTP methods used to consume an API, but below are the most commonly used:
- GET: is used to request data from a specified resource.
- POST: is used to send data to a server to create/update a resource.
Assume this is our base URL (API)
https://reqres.in/api/users
Types of URLS (based on how we send data as query parameters to the API). Directory-based url (separated by “/”). The path looks very similar to our local system file path.
https://reqres.in/api/users/pageid/1
Where pageid
is the key of the query parameter and 1 is the value of that key. This API will fetch all data from users table where pageid
is 1.
Parameter-based URL. The url contains key value pairs saprated by “&”.
https://reqres.in/api/users?pageid=1&userid=5
Where pageid
, userid are keys and 1 and 5 are their respective values.
When it comes to R to consume such APIS we focus majorly on the package below:
httr
This package takes it very seriously when we have to work with We data by exposing some very useful functions. It provides us with HTTP client to access APIS with GET/POST methods, passing query parameters, verifying fetched response wrt to data format and if error-free.jsonlite
In order to convert received JSON response to readable R Object or a data frame, jsonlite helps to convert json to R object and vice versa.rlist
To perform some additional manipulation on data structure of received JSON responserlist
exposes some important methods list.select and list.stack. This methods are useful to get parsed json data into a tibble.
Import all required libraries
# This package is required for Accessing APIS (HTTP or HTTPS URLS from Web)
library(httr)
#This package exposes some additional functions to convert json/text to data frame
library(rlist)
#This package exposes some additional functions to convert json/text to data frame
library(jsonlite)
#This library is used to manipulate data
library(dplyr)
resp<-GET("https://reqres.in/api/users?pageid=2")
#.When we get the response from API we will use to very basic methods of httr.
http_type(resp) #. This method will tell us what is the type of response fetched from GET() call to the API.
## [1] "application/json"
http_error(resp) #. This method just verifies if the response is error free for processing
## [1] FALSE
Now as we can see the API is parameter based and it expects a query parameter. Initially, we added query parameter inside the URL. But now we will separately supply the query parameter in form of a list in query argument of GET method.
query<-list(page="2")
resp<-GET("https://reqres.in/api/users",query=query)
http_type(resp)
## [1] "application/json"
http_error(resp)
## [1] FALSE
# Shows raw data which is not structured and readable
jsonRespText<-content(resp,as="text")
jsonRespText
## [1] "{\"page\":2,\"per_page\":3,\"total\":12,\"total_pages\":4,\"data\":[{\"id\":4,\"first_name\":\"Eve\",\"last_name\":\"Holt\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg\"},{\"id\":5,\"first_name\":\"Charles\",\"last_name\":\"Morris\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg\"},{\"id\":6,\"first_name\":\"Tracey\",\"last_name\":\"Ramos\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg\"}]}"
# Structurised data in form of R vectors and lists
jsonRespParsed<-content(resp,as="parsed")
jsonRespParsed
## $page
## [1] 2
##
## $per_page
## [1] 3
##
## $total
## [1] 12
##
## $total_pages
## [1] 4
##
## $data
## $data[[1]]
## $data[[1]]$id
## [1] 4
##
## $data[[1]]$first_name
## [1] "Eve"
##
## $data[[1]]$last_name
## [1] "Holt"
##
## $data[[1]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg"
##
##
## $data[[2]]
## $data[[2]]$id
## [1] 5
##
## $data[[2]]$first_name
## [1] "Charles"
##
## $data[[2]]$last_name
## [1] "Morris"
##
## $data[[2]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg"
##
##
## $data[[3]]
## $data[[3]]$id
## [1] 6
##
## $data[[3]]$first_name
## [1] "Tracey"
##
## $data[[3]]$last_name
## [1] "Ramos"
##
## $data[[3]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg"
Convert JSON reponse which is in text format to data frame using jsonlite package
fromJSON(jsonRespText)
## $page
## [1] 2
##
## $per_page
## [1] 3
##
## $total
## [1] 12
##
## $total_pages
## [1] 4
##
## $data
## id first_name last_name
## 1 4 Eve Holt
## 2 5 Charles Morris
## 3 6 Tracey Ramos
## avatar
## 1 https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg
## 2 https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg
## 3 https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg
We can extract required columns from parsed response of JSON and create our data frame using dplyr and base R packages.
modJson<-jsonRespParsed$data #. Access data element of whole list and ignore other vectors
modJson
## [[1]]
## [[1]]$id
## [1] 4
##
## [[1]]$first_name
## [1] "Eve"
##
## [[1]]$last_name
## [1] "Holt"
##
## [[1]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg"
##
##
## [[2]]
## [[2]]$id
## [1] 5
##
## [[2]]$first_name
## [1] "Charles"
##
## [[2]]$last_name
## [1] "Morris"
##
## [[2]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg"
##
##
## [[3]]
## [[3]]$id
## [1] 6
##
## [[3]]$first_name
## [1] "Tracey"
##
## [[3]]$last_name
## [1] "Ramos"
##
## [[3]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg"
#Using dplyr and base R
modJson%>%bind_rows%>%select(id,first_name,last_name,avatar)
## # A tibble: 3 x 4
## id first_name last_name avatar
## <int> <chr> <chr> <chr>
## 1 4 Eve Holt https://s3.amazonaws.com/uifaces/faces/twitt~
## 2 5 Charles Morris https://s3.amazonaws.com/uifaces/faces/twitt~
## 3 6 Tracey Ramos https://s3.amazonaws.com/uifaces/faces/twitt~
Using rList
package. Since we can see our data is converted into in form of list we use list.select and list.stack to filter columns and create a tibble respectively.
list.select(modJson,id,first_name)
## [[1]]
## [[1]]$id
## [1] 4
##
## [[1]]$first_name
## [1] "Eve"
##
##
## [[2]]
## [[2]]$id
## [1] 5
##
## [[2]]$first_name
## [1] "Charles"
##
##
## [[3]]
## [[3]]$id
## [1] 6
##
## [[3]]$first_name
## [1] "Tracey"
list.stack(modJson)
## id first_name last_name
## 1 4 Eve Holt
## 2 5 Charles Morris
## 3 6 Tracey Ramos
## avatar
## 1 https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg
## 2 https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg
## 3 https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg
Results obtained from dplyr
, base R and rlist
packages are very similar.
post_result <- POST(url="http://httpbin.org/post",body="this is a test") # where body argument accpets data we wish to send to server
Note: All APIs used in the example above are OPEN APIS.