R was developed by Ross Ihaka and Robert in the University of Auckland in New Zealand. They started working on the tool in 1933 with the intetion to help their students. However, they were then encouraged to make it open source. The language is based on another single letter programming language called as S, primarily it is called as S+ and it still exits.
One of the major reasons for the popularity of R is that R and its packages are Open Source and Free.
Getting Help in R
R has an extencive help system and this is one the best feautures of R programming. One can access the documentation of functions, and packages by using
?. These functions provide access to the documentation pages for R functions, data sets, and other objects. Almost all the documents of R packages and functions contain couple of examples showcasing how to use the function.
List Topics covered
Things You Will Master
- Operators in R
- Working with numbers and strings
2.1 Working With Numbers
2.2 Working With Strings
- Data types and structure
3.1 Data Types in R Programming
3.2 Data Structures in R Programming
3.2.1 Vector Maniputaions and important functions
22.214.171.124 Defining vectors
126.96.36.199 Verifying and checking the class of the vectors
188.8.131.52 Accessing the elements of a vector
184.108.40.206 Replacing and adding values to a vector
220.127.116.11 Getting the index of a particular element
18.104.22.168 sorting, subsetting, and removing vectors
3.3 List manipulation functions
3.3.1 Defining list - simple and named lists
3.3.2 Referencing and replacing values of a list
3.3.3 Fatten out a list using unlist() function
3.3.4 Checking the class of each vector in a list
- Matrix Manipulation
4.1 Defining Matrix
4.2 List of important matrix manipulation functions
Operators in R
R supports almost all the popupar binary and logical operators. I am sure you will be familer with almost all of them.
|** or ^||Exponentiation|
|X%%Y||Modulus gives remainder|
The operators mentioned above can be used with scalars, vectors and matrices.
Arithmatic operators in action
# adding two values 2 + 2 # Multipling 23*34 # Integer division 1990%/%23 # Calculating Modulus 7%%2
Although R is a remarkable statistical tool, there is one exasperating thing about R that it is a case-sensitive language. This means that view and View are considered as two different objects.
|>=||Greater than or equal to|
|<=||Less than or equal to|
|!=||Not equal to|
|x & y||x and y|
Logical Operators in action
# Using Great than 10 > 11 # Using equal to "Hanna"=="hanna" # Using not x !10 == 11 #Using AND operator (10 == 10) & (2 ==2)
Assignment operator is used in programming languages to save/assign a value to the variable. This variable can then be used for further processing. In R we use assignment operator (<-) to assign a value. We can also use equal to (=) symbol as well. However, assignment operator (<-) are far more popular the equal to sign.
# Assigning number values num <- 23 num # Assigning string value strng <- "Hanna" strng
Strings like most other programming languages are defined using double or single quoates.
Numbers and Strings
Numbers and strings are what constitute any dataset in general. So it becomes important we understand some of the most common tasks and functions you will be required to execute while dealing with them in general.
Working With Numbers
Generating sequence of numbers
To generate sequence of numbers one can either use semicolon(:) or can use
# Using (:) to generate sequence of integer numbers 1:10 # Using seq() function to generate sequence of numbers seq(10, 20, by = 0.7)
Generating uniformaly distributed random numbers
Among many functions the functions which I like the most are
# Using runif() function to generate 10 random numbers # By default generates number between 0 and 1 runif(10) # Generating numbers between 200 to 500 runif(10, min = 200, max = 500) # Generating four random numbers **REPLACEMENT** sample(10:15, 4, replace=TRUE) # Generating three random numbers **WITHOUT REPLACEMENT** sample(10:15, 4, replace=FALSE)
Generate random numbers from normal distribution
A normal distribution is a distribution which follows a bell curve. Statistically speaking its mean, median and mode are all same.
# Using rnorm() function to generate 10 random numbers rnorm(10) # Setting the desired standard deviation and mean rnorm(10, mean = 5, sd = 2)
Generating same sequence of random numbers
This can be achieved by using
set.seed(). A very useful function which ensures that you are able to produce same results. The function takes one argument which is any interger number. Keeping that number same gives you same results.
# With set.seed() # Output 1 set.seed(23) rnorm(10, mean = 5, sd = 2) # Output 2 set.seed(23) rnorm(10, mean = 5, sd = 2) # Without set.see() # Output 1 rnorm(10, mean = 5, sd = 2) # Output 2 rnorm(10, mean = 5, sd = 2)
Rounding numers to nearest value
We have couple of ways to achieve this. One can round the values to nearest integer, to upper side, to lower side, or towards zero. Following set of functions can be used to achieve either of the said task.
# Generating a sequence of numbers numerSeq<- seq(0, 1, by=.05) # Rounding to nearest integer - it uses that .5 rule round(numerSeq) # Rounding to one decimal point round(numerSeq, 1) # Rounding towards upper side value ceiling(numerSeq) # Rounding towards lower side value floor(numerSeq) # Rounding towards Zero trunc(numerSeq)
Working With Strings
The two tasks which are very critical from the data analysis point of view are
Knowing how to combine strings or a string with a number can be of great help. I often use this to represent or print my final output. Another use comes from the analysis point of view. Considering these two tasks in mind the two most widely used functions are
paste()(space is a default separator) or
paste0()(there is no separator) function and
# Combing two strings usingusing paste() function paste("Hanna", "Ask") # Choosing different separator paste("Hanna", "Ask", sep = "$") # Using paste0() function paste0("Hanna", "Ask")
You can also pass a collection of string inside the
paste() function. This collection of similar elements in R is formally called as vector. More on this later.
# Creating a vector of string strgVec <- c("Cat", "Dog", "Fish", "Cow") # Combing the values by + paste(strgVec, collapse = "+")
# Using sprintf() funtion to combine two string sprintf("My name is %s", "Hanna") # Combining a string and an integer sprintf("My name is %s and I am %d years old", "Hanna", 30)
Searching and Replacing strings
We will cover three very usefull functions here are those are
# Defining a string strng <- "You’re gonna need a bigger boat boat." # Replacing boat with car sub("boat", "car", strng) # Replacing boat by with car at all instances gsub("boat", "car", strng) # Returns the index where the string matches grep("[car]", letters)
Data types and structure
In R there are six data types and four data structure.
- Character - it the collection of string. Example - “Hanna”, “Dog”, “Male”.
- Numeric - it is a numeric value which is represented by decimal points. Example - 10.4, 12.45.
- Integer - its is also a number but only the integer part. Example - 109, 123, 34.
- Logical - the boolean values. Example - TRUE, FALSE
- Factor - qualitative variable which can be either of nominal or ordinal type. If it is ordinal then it is called as ordered factor. Example Nominal - “Male” and “Female”. Example Ordinal - “Good”, “Average” and “Best”.
- Complex - a number which has got an imaginary part to it.
Like anyother other programming language the data structres in R also are defined based on the dimentionality and homogenity of data type it can hold.
Vector - They are also formally know as Atomic Vectors. A Vector can hold only one type of data and is one-dimensional.
List - List is also one-dimensional structure however it can be used to save multiple data types.
Matrix - Matrix is two-dimensional structure and can only save one data type.
Data Frame - Data Frame is also two-dimensional structure but can save multiple types of data.
Now we will learn about some of the most basic data manipulaton functions. The knowledge of these fuctions is absolute must for any one to more forward and perform any kind of data analysis task.
Here is a collection of all the functions which are used to define different data types and structures in R programming.
# Defining character vectors characterVector <- c("Football", "Cricket", "Tennis", "Badminton") # Defining numeric vectors numericVector <- c(12.3, 23.4, 17.9, 89.7) # Defining integer vectors integerVector <- c(12L, 23L, 17L, 89L) # Defining logical vectors logicVector <- c(TRUE, FALSE, TRUE, TRUE) # Defining factor - nominal factorVector <- factor(characterVector) # Defining factor - ordinal orderedFactorVector <- factor(characterVector, ordered = TRUE)
Verifying and checking the class of the vectors
For vectors when we check the data structure type it returns the type of the data which it holds. For chcking the class of the vector we can use either
class() function or
typeof() function. There are other functions but these are common ones.
# Using class() function to check the object type class(numericVector) # Using type of function to check the object type typeof(numericVector)
If you just wish to certain about the type of vetor then we ca use
is family functions. These functions will return TRUE if the vector belongs to specific type else it returns FALSE.
# Checking if the vector is character type is.character(numericVector) # Checking if the vector is numeric type is.numeric(numericVector)
Accessing the elements of a vector
The elements inside a vector can be accessed using index. Unlike other programming languages like C and Pythong the indexing in R starts from 1.
# Extracting third elements characterVector # Extracting multiple elements characterVector[c(1,3)] # Deleting element characterVector[-1] # Deleting multiple element characterVector[-c(1,3)]
One is not allowed to pass both positive and negative index values.
Replacing and adding values to a vector
To replace exsisting values in a vector. First, call the value using square  and then simply assign a new value to it.
# Replacing football with basketball characterVector <- "Basketball" characterVector # Replacing more than one values numericVector[c(1,4)] <- c(55, 66) numericVector
To add new values to a vector you can use either of the below approaches based upon your requirement.
The numericVector contains 4 elements. We will add new element to this vector by using index. However this method only allows us to add a new element at the end of the vector.
# Adding element at the end. numericVector <- 77 numericVector
Using c() function
c() function you can add new element either at the beginning or at the end.
# Adding element at the end. numericVector <- c(numericVector, 99) numericVector # Adding element at the beginning numericVector <- c(99, numericVector) numericVector
Using append() function
If you wish to add new element at any given index in a vector then
append() function is the correct choice. The function takes three arguments.
# Using append function to add value after 4th positon numericVector <- append(numericVector, # vector 99, # element to be interted 4) # index after which to be inserted
Getting the index of a particular element
# Printing the index of values which are equal to 99 which(numericVector == 99.0)
Other important vector manipulation functions
Below the list of functions which you will be using day in day out for tasks related to data analysis or maniupulation.
Sorting a vector
# Sorting in ascending order numericVector[order(numericVector)] # Sorting in descending order numericVector[order(numericVector, decreasing = TRUE)]
Checking and Removing missing values
# Adding NA value to a vector numericVector <- NA # Checking if missing value is present is.na(numericVector) # Removing NA values using ! not numericVector[!is.na(numericVector)] # Removing NA values using na.omit() function na.omit(numericVector)
Subsetting the vector and getting length of a vector
# Getting elements greater than 30 numericVector <- numericVector[numericVector > 30] # Checking total number of elements in the new vector length(numericVector)
To define a list we use
list() function. The function can be used to create simple list or named list.
# Defining list example1 <- list(c(2,3,4), c("aa", "bb", "cc", "dd"), c(TRUE, TRUE)) example1 # Defining vectors empName <- c("Chris", "Robin", "Matt") empSalary <- c(2000, 4000, 6000) bonusGiven <- c(TRUE, TRUE, FALSE) # Defining list using vectors listStruct <- list(empName, empSalary, bonusGiven) listStruct # Defining Named list namedListStruct <- list("empName" = empName, "empSalary" = empSalary, "bonusGiven" = bonusGiven) namedListStruct
Referencing values of a list
A value inside a list can be accessed usingindex or by using the name(if it is a named list). Fundamentally list is nothing but a collection of vectors. This means we can aply all the data manipulations which we have just learned in the Vector Manipulation section.
Extracting values from a list
# Extracting a value list of emp names from unnamed list listStruct[] # Extracting a value list of emp names from named list namedListStruct$empName # Extract Robin from the emp names listStruct[] # Extracting a value list of emp names from named list namedListStruct$empName
Replacing values in a list
# Replace salary for Robin by 8000 listStruct[] <- 8000 listStruct # or in named list namedListStruct$empSalary <- 8000 namedListStruct
Unlisting the list
Unlist() function can be used to flatten out the list to one level.
Checking the class of each vector in a list
A list can consist of mutiple levels and one can also create a nested list. Also, lists can be used to bundle objects of diferent classes fn lengths.
As it is a two dimentional structure while defining we need to mention the number of rows and number of columns.
# Defining a matrix matStruct <- matrix(integerVector, nrow = 2, ncol = 2, byrow = TRUE) # Defining a matrix matStruct1 <- matrix(integerVector, nrow = 2, ncol = 2, byrow = FALSE)
Basic operations related to matrix
In the below code snippet we are sharing some functions which are good to know and will help you with your data science work.
# naming columns colnames(matStruct) <- c("col1", "col2") # naming rows rownames(matStruct) <- c("row1", "row2") # Getting the dimension of the matrix dim(matStruct) # Getting the count of the rows nrow(matStruct) # Getting the count of the columns ncol(matStruct) # Accessing 2 column values matStruct[, 2] # Accessing 1 row values matStruct[1, ] # Combing two matrix by columns cbind(matStruct, matStruct1) # Combing two matrix by rows - appending rbind(matStruct, matStruct1)