# R Basics

## Overview

R was developed by Ross Ihaka and Robert in the University of Auckland in New Zealand. They started working on the tool in 1933 with the intetion to help their students. However, they were then encouraged to make it open source. The language is based on another single letter programming language called as S, primarily it is called as S+ and it still exits.

One of the major reasons for the popularity of R is that R and its packages are **Open Source and Free**.

### Fact

## Getting Help in R

R has an extencive help system and this is one the best feautures of R programming. One can access the documentation of functions, and packages by using `help()`

or `?`

. These functions provide access to the documentation pages for R functions, data sets, and other objects. Almost all the documents of R packages and functions contain couple of examples showcasing how to use the function.

```
help(mean)
?mean
```

## List Topics covered

### Things You Will Master

- Operators in R
- Working with numbers and strings

2.1 Working With Numbers

2.2 Working With Strings - Data types and structure

3.1 Data Types in R Programming

3.2 Data Structures in R Programming

3.2.1 Vector Maniputaions and important functions

3.2.1.1 Defining vectors

3.2.1.2 Verifying and checking the class of the vectors

3.2.1.3 Accessing the elements of a vector

3.2.1.4 Replacing and adding values to a vector

3.2.1.5 Getting the index of a particular element

3.2.1.6 sorting, subsetting, and removing vectors

3.3 List manipulation functions

3.3.1 Defining list - simple and named lists

3.3.2 Referencing and replacing values of a list

3.3.3 Fatten out a list using unlist() function

3.3.4 Checking the class of each vector in a list - Matrix Manipulation

4.1 Defining Matrix

4.2 List of important matrix manipulation functions

## Operators in R

R supports almost all the popupar binary and logical operators. I am sure you will be familer with almost all of them.

### Binary/Arthematic Operators

Operator | Description |
---|---|

+ | Addition |

- | Subtraction |

* | Multiplication |

/ | Division |

** or ^ | Exponentiation |

X%/%Y | Integer Division |

X%%Y | Modulus gives remainder |

The operators mentioned above can be used with scalars, vectors and matrices.

#### Arithmatic operators in action

```
# adding two values
2 + 2
# Multipling
23*34
# Integer division
1990%/%23
# Calculating Modulus
7%%2
```

### Caution

Although R is a remarkable statistical tool, there is one exasperating thing about R that it is a case-sensitive language. This means that **view** and **View** are considered as two different objects.

### Logical Operators

Operator | Description |
---|---|

> | Greater than |

>= | Greater than or equal to |

< | Less than |

<= | Less than or equal to |

== | Equal to |

!= | Not equal to |

x | y |

x & y | x and y |

!x | Not x |

#### Logical Operators in action

```
# Using Great than
10 > 11
# Using equal to
"Hanna"=="hanna"
# Using not x
!10 == 11
#Using AND operator
(10 == 10) & (2 ==2)
```

#### Assignment operator

Assignment operator is used in programming languages to save/assign a value to the variable. This variable can then be used for further processing. In R we use **assignment operator (<-)** to assign a value. We can also use **equal to (=) symbol** as well. However, **assignment operator (<-)** are far more popular the equal to sign.

```
# Assigning number values
num <- 23
num
# Assigning string value
strng <- "Hanna"
strng
```

Strings like most other programming languages are defined using double or single quoates.

## Numbers and Strings

Numbers and strings are what constitute any dataset in general. So it becomes important we understand some of the most common tasks and functions you will be required to execute while dealing with them in general.

### Working With Numbers

#### Generating sequence of numbers

To generate sequence of numbers one can either use **semicolon(:)** or can use `seq()`

function.

```
# Using (:) to generate sequence of integer numbers
1:10
# Using seq() function to generate sequence of numbers
seq(10, 20, by = 0.7)
```

#### Generating uniformaly distributed random numbers

Among many functions the functions which I like the most are `runif()`

and `sample()`

functions.

```
# Using runif() function to generate 10 random numbers
# By default generates number between 0 and 1
runif(10)
# Generating numbers between 200 to 500
runif(10, min = 200, max = 500)
# Generating four random numbers **REPLACEMENT**
sample(10:15, 4, replace=TRUE)
# Generating three random numbers **WITHOUT REPLACEMENT**
sample(10:15, 4, replace=FALSE)
```

### Usage

#### Generate random numbers from normal distribution

A normal distribution is a distribution which follows a bell curve. Statistically speaking its mean, median and mode are all same.

```
# Using rnorm() function to generate 10 random numbers
rnorm(10)
# Setting the desired standard deviation and mean
rnorm(10, mean = 5, sd = 2)
```

#### Generating same sequence of random numbers

This can be achieved by using `set.seed()`

. A very useful function which ensures that you are able to produce same results. The function takes one argument which is any interger number. Keeping that number same gives you same results.

```
# With set.seed()
# Output 1
set.seed(23)
rnorm(10, mean = 5, sd = 2)
# Output 2
set.seed(23)
rnorm(10, mean = 5, sd = 2)
# Without set.see()
# Output 1
rnorm(10, mean = 5, sd = 2)
# Output 2
rnorm(10, mean = 5, sd = 2)
```

#### Rounding numers to nearest value

We have couple of ways to achieve this. One can round the values to nearest integer, to upper side, to lower side, or towards zero. Following set of functions can be used to achieve either of the said task.

```
# Generating a sequence of numbers
numerSeq<- seq(0, 1, by=.05)
# Rounding to nearest integer - it uses that .5 rule
round(numerSeq)
# Rounding to one decimal point
round(numerSeq, 1)
# Rounding towards upper side value
ceiling(numerSeq)
# Rounding towards lower side value
floor(numerSeq)
# Rounding towards Zero
trunc(numerSeq)
```

### Working With Strings

The two tasks which are very critical from the data analysis point of view are

### Combining strings

Knowing how to combine strings or a string with a number can be of great help. I often use this to represent or print my final output. Another use comes from the analysis point of view. Considering these two tasks in mind the two most widely used functions are `paste()`

(space is a default separator) or `paste0()`

(there is no separator) function and `sprintf()`

function.

```
# Combing two strings usingusing paste() function
paste("Hanna", "Ask")
# Choosing different separator
paste("Hanna", "Ask", sep = "$")
# Using paste0() function
paste0("Hanna", "Ask")
```

You can also pass a collection of string inside the `paste()`

function. This collection of similar elements in R is formally called as vector. More on this later.

```
# Creating a vector of string
strgVec <- c("Cat", "Dog", "Fish", "Cow")
# Combing the values by +
paste(strgVec, collapse = "+")
```

### Fact

```
# Using sprintf() funtion to combine two string
sprintf("My name is %s", "Hanna")
# Combining a string and an integer
sprintf("My name is %s and I am %d years old", "Hanna", 30)
```

### Searching and Replacing strings

We will cover three very usefull functions here are those are `sub()`

, `gsub()`

and `grep()`

.

```
# Defining a string
strng <- "Youâ€™re gonna need a bigger boat boat."
# Replacing boat with car
sub("boat", "car", strng)
# Replacing boat by with car at all instances
gsub("boat", "car", strng)
# Returns the index where the string matches
grep("[car]", letters)
```

## Data types and structure

In R there are six data types and four data structure.

### Data Types

**Character**- it the collection of string. Example - “Hanna”, “Dog”, “Male”.**Numeric**- it is a numeric value which is represented by decimal points. Example - 10.4, 12.45.**Integer**- its is also a number but only the integer part. Example - 109, 123, 34.**Logical**- the boolean values. Example - TRUE, FALSE**Factor**- qualitative variable which can be either of nominal or ordinal type. If it is ordinal then it is called as ordered factor. Example Nominal - “Male” and “Female”. Example Ordinal - “Good”, “Average” and “Best”.**Complex**- a number which has got an imaginary part to it.

### Data Structure

Like anyother other programming language the data structres in R also are defined based on the dimentionality and homogenity of data type it can hold.

**Vector**- They are also formally know as**Atomic Vectors**. A Vector can hold only**one type of data**and is**one-dimensional**.**List**- List is also**one-dimensional**structure however it can be used to save**multiple data types**.**Matrix**- Matrix is**two-dimensional**structure and can only save**one data type**.**Data Frame**- Data Frame is also**two-dimensional**structure but can save**multiple types of data**.

### Vector manipulation

Now we will learn about some of the most basic data manipulaton functions. The knowledge of these fuctions is absolute must for any one to more forward and perform any kind of data analysis task.

#### Defining vectors

Here is a collection of all the functions which are used to define different data types and structures in R programming.

```
# Defining character vectors
characterVector <- c("Football", "Cricket", "Tennis", "Badminton")
# Defining numeric vectors
numericVector <- c(12.3, 23.4, 17.9, 89.7)
# Defining integer vectors
integerVector <- c(12L, 23L, 17L, 89L)
# Defining logical vectors
logicVector <- c(TRUE, FALSE, TRUE, TRUE)
# Defining factor - nominal
factorVector <- factor(characterVector)
# Defining factor - ordinal
orderedFactorVector <- factor(characterVector, ordered = TRUE)
```

#### Verifying and checking the class of the vectors

For vectors when we check the data structure type it returns the type of the data which it holds. For chcking the class of the vector we can use either `class()`

function or `typeof()`

function. There are other functions but these are common ones.

```
# Using class() function to check the object type
class(numericVector)
# Using type of function to check the object type
typeof(numericVector)
```

If you just wish to certain about the type of vetor then we ca use `is`

family functions. These functions will return TRUE if the vector belongs to specific type else it returns FALSE.

```
# Checking if the vector is character type
is.character(numericVector)
# Checking if the vector is numeric type
is.numeric(numericVector)
```

#### Accessing the elements of a vector

The elements inside a vector can be accessed using index. Unlike other programming languages like C and Pythong the indexing in R starts from 1.

```
# Extracting third elements
characterVector[3]
# Extracting multiple elements
characterVector[c(1,3)]
# Deleting element
characterVector[-1]
# Deleting multiple element
characterVector[-c(1,3)]
```

One is not allowed to pass both positive and negative index values.

#### Replacing and adding values to a vector

To replace exsisting values in a vector. First, call the value using **square []** and then simply assign a new value to it.

```
# Replacing football with basketball
characterVector[1] <- "Basketball"
characterVector
# Replacing more than one values
numericVector[c(1,4)] <- c(55, 66)
numericVector
```

To add new values to a vector you can use either of the below approaches based upon your requirement.

#### Using Index

The **numericVector** contains 4 elements. We will add new element to this vector by using index. However this method only allows us to add a new element at the end of the vector.

```
# Adding element at the end.
numericVector[5] <- 77
numericVector
```

#### Using c() function

By using `c()`

function you can add new element either at the beginning or at the end.

```
# Adding element at the end.
numericVector <- c(numericVector, 99)
numericVector
# Adding element at the beginning
numericVector <- c(99, numericVector)
numericVector
```

#### Using append() function

If you wish to add new element at any given index in a vector then `append()`

function is the correct choice. The function takes three arguments.

```
# Using append function to add value after 4th positon
numericVector <- append(numericVector, # vector
99, # element to be interted
4) # index after which to be inserted
```

#### Getting the index of a particular element

```
# Printing the index of values which are equal to 99
which(numericVector == 99.0)
```

#### Other important vector manipulation functions

Below the list of functions which you will be using day in day out for tasks related to data analysis or maniupulation.

##### Sorting a vector

```
# Sorting in ascending order
numericVector[order(numericVector)]
# Sorting in descending order
numericVector[order(numericVector, decreasing = TRUE)]
```

##### Checking and Removing missing values

```
# Adding NA value to a vector
numericVector[2] <- NA
# Checking if missing value is present
is.na(numericVector)
# Removing NA values using ! not
numericVector[!is.na(numericVector)]
# Removing NA values using na.omit() function
na.omit(numericVector)
```

##### Subsetting the vector and getting length of a vector

```
# Getting elements greater than 30
numericVector <- numericVector[numericVector > 30]
# Checking total number of elements in the new vector
length(numericVector)
```

### List manipulation

#### Defining list

To define a list we use `list()`

function. The function can be used to create simple list or named list.

```
# Defining list
example1 <- list(c(2,3,4), c("aa", "bb", "cc", "dd"), c(TRUE, TRUE))
example1
# Defining vectors
empName <- c("Chris", "Robin", "Matt")
empSalary <- c(2000, 4000, 6000)
bonusGiven <- c(TRUE, TRUE, FALSE)
# Defining list using vectors
listStruct <- list(empName, empSalary, bonusGiven)
listStruct
# Defining Named list
namedListStruct <- list("empName" = empName,
"empSalary" = empSalary,
"bonusGiven" = bonusGiven)
namedListStruct
```

#### Referencing values of a list

A value inside a list can be accessed usingindex or by using the name(if it is a named list). Fundamentally list is nothing but a collection of vectors. This means we can aply all the data manipulations which we have just learned in the **Vector Manipulation** section.

##### Extracting values from a list

```
# Extracting a value list of emp names from unnamed list
listStruct[[1]]
# Extracting a value list of emp names from named list
namedListStruct$empName
# Extract Robin from the emp names
listStruct[[1]][2]
# Extracting a value list of emp names from named list
namedListStruct$empName[2]
```

##### Replacing values in a list

```
# Replace salary for Robin by 8000
listStruct[[2]][2] <- 8000
listStruct
# or in named list
namedListStruct$empSalary[2] <- 8000
namedListStruct
```

##### Unlisting the list

`Unlist()`

function can be used to flatten out the list to one level.

`unlist(listStruct)`

##### Checking the class of each vector in a list

`lapply(listStruct, class)`

A list can consist of mutiple levels and one can also create a nested list. Also, lists can be used to bundle objects of diferent classes fn lengths.

#### Matrix Manipulation

##### Defining Matrix

As it is a two dimentional structure while defining we need to mention the number of rows and number of columns.

```
# Defining a matrix
matStruct <- matrix(integerVector,
nrow = 2, ncol = 2,
byrow = TRUE)
# Defining a matrix
matStruct1 <- matrix(integerVector,
nrow = 2, ncol = 2,
byrow = FALSE)
```

##### Basic operations related to matrix

In the below code snippet we are sharing some functions which are good to know and will help you with your data science work.

```
# naming columns
colnames(matStruct) <- c("col1", "col2")
# naming rows
rownames(matStruct) <- c("row1", "row2")
# Getting the dimension of the matrix
dim(matStruct)
# Getting the count of the rows
nrow(matStruct)
# Getting the count of the columns
ncol(matStruct)
# Accessing 2 column values
matStruct[, 2]
# Accessing 1 row values
matStruct[1, ]
# Combing two matrix by columns
cbind(matStruct, matStruct1)
# Combing two matrix by rows - appending
rbind(matStruct, matStruct1)
```