In my last post had answers to some of the common questions in R that a person who has just begun exploring the language, needs to know. As we advance and immerse further, this post will contain some essential components whose basic understanding is the key to master R.
But before we start, wouldn’t it be great to know why we are investing so much of our time in R? What are its benefits after all?
It comes as free & Open source Code
This means that anyone can download and modify the code. You can download and use R free of charge. Anybody can access the source code, modify it, and improve it, due to which many programmers have contributed improvements and fixes to the R code. For this reason, R is very stable and reliable.
R can run everywhere
R is available for Windows, Unix systems (such as Linux), and Mac.
It supports Extensions
Developers can easily write their own software and distribute it in the form of add-on packages. Because of the relative ease of creating these packages, literally, thousands of them exist.
It’s not just about Statistics
R was developed by statisticians to make statistical processing easier. As R started to expand away from its origins in statistics, many people who would describe themselves as programmers rather than statisticians have become involved with R. The result is that R is now eminently suitable for a wide variety of non-statistical tasks, including data processing, graphic visualization, and analysis of all sorts. R is being used in the fields of finance, natural language processing, genetics, biology, and market research, to name just a few.
Runs without a Compiler
R is an interpreted language, which means that — contrary to compiled languages like C and Java — you don’t need a compiler to first create a program from your code before you can use it. R interprets the code you provide directly and converts it into lower-level calls to pre-compiled code/function
Now that we know why we are keen on learning R, let’s begin with some basic concepts!
What are Vectors in R?
A vector is the simplest type of data structure in R. Simply put, a vector is a sequence of data elements of the same basic type. Members of a vector are called Components.
Here is a vector containing three numeric values 2, 3 and 5 :
c(2, 3, 5) [1] 2 3 5
And here is a vector of logical values.
c(TRUE, FALSE, TRUE, FALSE, FALSE) [1] TRUE FALSE TRUE FALSE FALSE
the number of members in a vector is given by the length function.
length(c("aa", "bb", "cc", "dd", "ee")) [1] 5
There are several types of vectors, such as :
-
1) Numeric vectors, containing all kind of numbers.
2) Integer vectors, containing integer values. (An integer vector is a special kind of numeric vector.)
3) Logical vectors, containing logical values (TRUE and/or FALSE)
4) Character vectors, containing text
5) Datetime vectors, containing dates and times in different formats
6) Factors, a special type of vector to work with categories.
Combining Vectors
The c()
function stands for concatenate. It doesn’t create vectors — it just combines them.
You also can use the c()
function to combine vectors with more than one value, as in the following example:
fruits <- c("Apple", "oranges", "banana") vegetables <- c("cabbage", "spinach", "tomatoes") all_basket_items <- c(fruits, vegetables) all_basket_items [1] "Apple" "oranges" "banana" "cabbage" "spinach" [6] "tomatoes"
The result of this code is a vector with all 6 values. In this code, the c()
function maintains the order of the numbers. This example illustrates a second important feature of vectors: Vectors have an
order.
Repeating Vectors
You can combine a vector with itself if you want to repeat it, but if you want to repeat the values in a vector many times, using the c()
function becomes a bit impractical. R makes life easier by offering you a function for repeating a vector: rep()
.
You can use the rep()
function in several ways. If you want to repeat the complete vector, for example, you specify the argument times. To repeat the vector c(0, 0, 7)
three times, use this code:
rep(c(0, 0, 7), times = 3) [1] 0 0 7 0 0 7 0 0 7
You also can repeat every value by specifying the argument each, like this:
rep(c(2, 4, 2), each = 3) [1] 2 2 2 4 4 4 2 2 2
You can tell R for each value how often it has to be repeated.
Eg:
rep(c(0, 7), times = c(4,2)) [1] 0 0 0 0 7 7
And you can, like in seq
, use the argument length.out to tell R how long you want it to be. R will repeat the vector until it reaches that length even if the last repetition is incomplete, like so:
Eg :
rep(1:3,length.out=7) [1] 1 2 3 1 2 3 1
Changing values in a vector
Let’s see this through an example applied on the previous vector that we had created.
fruits[2] <- "strawberries" fruits [1] "Apple" "strawberries" "banana"
This replaces the second element of the vector with “Strawberries”
To have a quick snapshot on Vectors, you can just click here : Vectors in R
How do Functions work?
R comes with many functions that you can use to do sophisticated tasks like random sampling. For example, you can round a number with the round function, or calculate its factorial with the factorial function.
Using a function is pretty simple. Just write the name of the function and then the data you want the function to operate on in parentheses:
round(3.1415) [1] 3
You can give a function as many arguments as you like as long as you separate each argument with a comma.
There are numerous functions in R that can be used to serve our respective purposes for analysis. A good way of understanding their uses is to first know the available function options that R provides and then begin.
You can go the this link & explore the functions list used in R to understand what they do : Useful Functions in R
Now that we know what are Vectors & Functions in R, what do we mean by VECTORIZING YOUR FUNCTIONS?
A vectorized function works not just on a single value, but on a whole vector of values at the same time. This basically allows your functions to work on a whole vector of values at the same time, instead of just a single value.
Sounds complex? Well, it’s not! Just see the code right here to understand what I mean :
Suppose I want to store values of points scored by a team in 5 different rounds. I put them in a vector.
scores_Team_A <- c(13,14,17,19,18) scores_Team_A [1] 13 14 17 19 18
Now, we can just apply a function on this vector to perform what we want it to :
sum(scores_Team_A) [1] 81
You could get the same result by going over the vector number by number, adding each new number to the sum of the previous numbers, but that method
would require you to write more code and it would take longer to calculate. You won’t notice it on just five numbers, but the difference will be obvious when you have to sum a few thousand of them.
A less obvious example of a vectorized function is the paste()
function. If you make a vector with the first names of the members of your family, paste()
can add the last name to all of them with one command, as in the following example:
firstnames lastname paste(firstnames,lastname) [1] “Joris Meys” “Carolien Meys” “Koen Meys”
R takes the vector firstnames and then pastes the lastname into each value.
What information does the str()
function give?
str(scores_Team_A) num [1:5] 13 14 17 18 19
-
1) First, it tells you that this is a num (numeric) type of vector.
2) Next to the vector type, R gives you the dimensions of the vector. This example
has only one dimension, and that dimension has indices ranging from 1 to 5.
3) Finally, R gives you the first few values of the vector. In this example, the vector
has only 5 values, so you see all of them.
Apart from the str()
function, R contains a set of functions that allow you to test for the type of a vector. All these functions have the same syntax: is a dot and then the name of the type.
To know more about str()
, visit Str function
Eg :
is.numeric(scores_Team_A) [1] TRUE
Of course, Functions is a huge topic & needs a lot of self-exploration to understand. You can visit this link to see some basic operations related to functions and start exploring and manipulating data using them : Functions in R
Let’s talk about Packages Now
Until now, we’ve spoken about functions that are available in the basic installation of R. But the real power of R lies in the fact that everyone can write his or her own functions and share them with other R users in an organized manner. Many knowledgeable people have written convenient functions with R, and often a new statistical method is published together with R code. Most of these authors distribute their code as R packages (collections of R code, Help files, datasets, and so on, that can be incorporated easily into R itself).
To go through the most used list of packages in R, click here Packages in R
To install all you have to do is use the install.pckages()
function.
After a while, you can end up with a collection of many packages. If R loaded all of them at the beginning of each session, that would take a lot of memory and time. So, before you can use a package, you have to load it into R by using the library()
function. The library is the directory where the packages are installed.
If you want to unload a package, the detach()
function will let you do it, but you have to specify that it’s a package you’re detaching, like this:
detach(package:**type the package name here**)
Vectors & Functions basics covered here were just to kick-start that fire in the belly to explore R. You can now begin creating vectors, manipulating them, using functions to understand better.