R Statistics Blog

Data Science From R Programmers Point Of View

Its All About Functions

By now you must have figured it out that R programming is not a traditional programming language. The R language is a collection of functions which are packed together to form a package. All thanks to the open and free community which has contributed to over 9000 packages over the years. Today one can find any function to achieve almost any statistical task in R by doing a little research. However, you may still be interested in creating your own custom function in R. This function is called as User Defined Function in R. In this chapter we will how to define a new function.

R Packages - A collection of similar functions

Before we move forward and see how to define a custom function. Let us see how to dowload an R package and start using the functions from those packages. The task is fairly easy and one can use install.packages function to download the R packages from CRAN to your local system. Once you have the package downloaded all you need to do is load the package in your enviorment on need basis and bang you will have access to the functions from those packages.

To load a package you can use either library() or require() function.

Remember

These two functions are tad different in terms that the require() function returns FALSE and gives a warning if the package does not exists. On the other hand, library() function returns an error. Taking this into account it is recommended to use require() function inside of a function in R.

Installing and loading R Packages

# Installing {devtools} package, passing arguments
install.packages("devtools", # Package name
                  dependencies = TRUE, # Download Pacakes used to build this package
                  quiet = TRUE) # Control the info printed on the console
                  
# Loading a package
library(devtools)
# Loading using require()
require(devtools)

Things You Will Master

  1. Installing packaes from github using devtools
  2. Defining custom functions in R
  3. Defining anonymous function in R
  4. Apply Family functions

Installing R Packages from github

One can download an R packges from github repository directly by using install_github() function which is avaiable in devtools package in R.

# Installing Package from github
install_github("tidyverse/dplyr")

How to create user defined function in R

A function in R consists of three parts. First is the name of the function. Second, function() keyword for defining the function. Inside this keyword we pass our parameters. Third is the execution code and this is passed inside the curly brackets.

For example - Let’s create a function to add two number. This function thus requires two parameter.

# Creating a function to add two numbers
addTwoNumber <- function(a = 1, b = 1){
  return(a + b)
}

# Calling this function 
addTwoNumber(a = 10, b = 10)
  1. addTwoNumber - is the name of my function
  2. function(a = 1, b = 1) - here a and b are the arguments which this function can take. We have also provided one as default value to both the parameters.
  3. return(a + b) - this section return the value after adding a and b.

Defining anonymous function in R

Anonymous function is a function which do not have any name and are also at times called as inline functions. Let’s see how we can create and use these functions. For example - You want to calculate the sum of each column in a data frame by ignoring the NA values in them

# Getting column wise sum for each variable
output <- apply(mtcars, 2, function(x){ sum(x, na.rm = TRUE)})

Apply family functions

In the previous example, we just saw a apply function which executes like a for loop but in a vectorized way. This makes the apply family functions faster and easy to write in most cases. Let me share my favorate functions from apply family.

Using apply function

Using apply function one can apply almost any function to either all the rows or columns. These function can also be user defined functions. In the above code we saw how to use apply function to apply a function on all columns. In the next example, we will apply the same function to all the rows. This should return 32 values as we have 32 rows in the mtcars dataset.

# Getting row wise sum for each variable
output <- apply(mtcars, 1, function(x){ sum(x, na.rm = TRUE)})

Using lapply and sapply functions

The lapply() and sapply() functions are very similar to the apply() function which we just learned. However these functions apply the requested function only on columns. Also these two functions are different from each other in terms of the output the produce. So the lapply functions functions generates a list output whereas sapply functions generates a vector output.

Working example

In this example, we will achive the same task of getting sum by each column using sapply() and tapply functions. This way you should be able to compare the final outputs of the three

# Using lapply() function
lapply(mtcars, FUN = function(x){ sum(x, na.rm = TRUE)})

# Using sapply() function
sapply(mtcars, FUN = function(x){ sum(x, na.rm = TRUE)})

Using tapply functions

The tapply() function is used when you need to apply a function by a grouped variable. The functions splits the data by a factor variables and returns the function output by the levels.

Working example

In this example, we will see what is the average sepal length of flowers by a the species.

# Using tapply function to get the average 
# sepal length by flower species
tapply(iris$Sepal.Length, iris$Species, mean)

Closing Note

In this chapter, we reviewed the concept of functions, packages, and learned how to use some of the important apply family functions. Now you’ve gathered enough knowledge about R tool and are ready to enter the exciting world of data analysis! In last chapters, we intend to create a list of useful functions for working with data objects. This will be a continuos effort and we request you to contribute to the same.
Last updated on 4 Jan 2019 / Published on 17 Oct 2017