R Statistics Blog

Data Science From R Programmers Point Of View

Quick ggplot2 Tutorial

ggplot2 is a reliable system for describing and building graphs. The package is capable of creating elegant and aesthetically pleasing graphis. The frame work of ggplot2 is quite different (in comparision to graphics package) and is based on grammer of graphics(originally introduced by Leland Wilkinson). At first you may not find it intutive but dont worry we are here to help. Together we will master it to the core.

Basic plotting framework for ggplot

ggplot(data = dataset name) + 
  <GEOM_FUNCTION>(mapping = aes(variable name))

Things You Will Master

  1. Mapping the aesthetics(using aes)
  2. Mapping Geometric shapes(using geom)
  3. Using Facets
  4. Mapping colors to variable
  5. Coordinate systems
  6. Statistical Transformation Support
  7. Themes Themes Themes

Mapping the aesthetics(using aes)

An asethetic is used to represent the object which you wish to plot in your graph. In other words, asethetics represents different ways in which you can represent your data points. So to showcase the data points you can change things like size, shape or color of the points. Thus by using aesthetics (represented by aes()) you can convey the information which is hidden in your dataset.

For Example, you can map color to cylinder variable to reveal the relationship between mileage and weight. So let us take our framework and add the aesthetics. Here we have three variable and thus we have to pass three arguments to the aes() function.

# Loading the library
library(ggplot2)

# loading data and converting cyl variable to factor
data(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
# Adding aesthetics
ggplot(data = mtcars) + 
  geom_point(mapping = aes(x = mpg, y = wt))

An example of aesthetic

Mapping Geometric shapes(using geom)

The geometric shapes in ggplot are visual objects which you can use to describe your data. For example, one can plot histogram or boxplot to describe the distribution of a variable.

These two plots provide almost same information but through different visual objects. These objects are defined in ggplot using geom. That means you can use geom to define your plot. For Example, histogram uses histogram geom, barplot uses bar geom, line plot uses line geom and so on. There is one exception, we use point geom to plot scatter plots.

Let’s see how we can draw the charts which we mentioned in the above example using geoms for the total sleep hours of animals.

Attention

Every geom function requires you to map an aesthetic to it. However, not every aesthetic requires a geom. For example, one can set the shape of a point but you cannot set the shape of a line.

Building histogram

# Building a histogram
ggplot(data = msleep) + 
  geom_histogram(mapping = aes(x = sleep_total, col = "orange"))

Histogram Using geom

Building boxplot

# Building a histogram
ggplot(data = msleep) + 
  geom_boxplot(mapping = aes(y = sleep_total))

Boxplot Using geom

Using Facets in ggplot2

Facets is a way in which you can add additional categorical variables to your plot. The facet helps in building the chart by dividing the data two or more groups. The data from these groups is used for plotting the data.

Now there are two ways in which you can use facets:

A. If you want to split the data by only one variable then use facet_wrap(). In the following syntax you will notice tilder(~), by default this is the first argument. After this should mention the variable name by which you want to make the split.

Checking the destribution of total sleep by kind of animal.

# Working example of facet_wrap
ggplot(data = msleep) + 
  geom_histogram(mapping = aes(x = sleep_total)) +
  facet_wrap(~ vore)

face_wrap from ggplot2

B. If you want to split the data by a combination two variable then you can use facet_grid(). Here the two variables should be separated by the tilder(~).

Checking the scatter plot between mpg and disp variable by splitting the data by cyl and am type.

# loading data
data(mtcars)
# Converting cylinder(cyl) and automatic(am) variable to factor variables.
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)

# Working example of facet_grid
ggplot(data = mtcars) + 
  geom_point(mapping = aes(x = mpg, y = disp)) +
  facet_grid(cyl ~ am)

facet_grid from ggplot2

Mapping colors to variables in ggplot2

Colors can play a game changer role in any data visualization and thus it becomes important for us to learn about it. Apart from this in ggplot the default color is on grey scale and at times this may make things difficult to read and distinguish from one another.

In ggplot there are couple of ways in which you can use color.

A. You can simply assign the colors to the objects, line and points. To add colors to the objects, like bar, use fill argument. And to set colors to the lines and points you can use color argument. Below are quick example of both cases.

Using color argument

# Making the points blue color in the scatter plot
ggplot(data = mtcars) + 
  geom_point(mapping = aes(x = mpg, y = wt), color = "blue")

Assigning color to the points

Using fill argument

# Making the bars of histogram blue
ggplot(data = iris) + 
  geom_histogram(mapping = aes(x = Sepal.Width), fill = "blue")

Assigning color to objects like bars

B. we can use color to map the values of third variable which we have already learned in the very first example under mapping aesthetics.

Attention

By default the ggplot2 uses scale_fill_hue() and scale_colour_hue() for color selection. However you can choose to change the luminance of these colors. Also there are other color scales available in R from RColorBrew package.

Example 1 - Showcasing Default RColorBrew setup

ggplot(data = mtcars) + 
  geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
  scale_fill_brewer()

Example 1 RColorBrew Default

Example 2 - Showcasing Set1 pallette colors

ggplot(data = mtcars) + 
  geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
  scale_fill_brewer(palette="Set1")

Example 2 RColorBrew Set1

Example 3 - Showcasing Spectral pallette colors

ggplot(data = mtcars) + 
  geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
  scale_fill_brewer(palette="Spectral")

Example 3 RColorBrew Spectral

For Your reference sharing the RBrewColor Pallet chart.

RColorBrew

Understanding the Coordinate System of ggplot2

The coordinates sytem of ggplot is little complicated. But dont worry we will not dig into too much as of now and if would provide you with few coordinate systems to start with. If you can remember them I think most of the job is done and this should not stop you from creating awesome charts using ggplot2. As you use ggplot more and more I am sure you will be able to take deeper pluges into ggplot coordinate system. However to start with I have short listed some 5 coordinate functions which are as mentioned below:

1. coord_cartesian() - This is the default coordinate system in ggplot2. According to this system the X and Y positions of each point act independently to determine its location on the graph.

2. coord_flip() - This is helpful in cases when you want to build horizontal graphs. This function switches the X and Y axis. For example, you can use coord_flip to draw horizontal boxplots.

ggplot(data = mtcars) + 
  geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
  scale_fill_brewer(palette="Set1") +
  coord_flip()

Coordinate system fliping the side

3. coord_polar() - This creates a nice combination charts of bar and coxcomb or pie graphs by using polar coordinates.

# Loading girbExtra Library
library(gridExtra)
# Generating a barplot
bar <- ggplot(data = mtcars) + 
  geom_bar(
    mapping = aes(x = cyl, fill = cyl),
    width = 1
  )

# Saving the two plots
plot1 <- bar + coord_flip()
plot2 <- bar + coord_polar()
# Plotting the graphs by cloumn
grid.arrange(plot1, plot2, ncol = 2)

Attention

In the above code we have used a gridExtra package. I love this package it makes plotting multiple charts on the same canvas very easy.

Coordinate system for building polar graphs

4. coord_map() - This functions creates a 2D map of the desired earth location. We use coor_polygon along with coord_map to a map with maintained aspect ratio. If you do not understand what this means then just run the code once without the coord_map part.

# An example showcasing the map of USA
italy <- map_data("italy")
ggplot(italy, aes(long, lat, group = group)) +
   geom_polygon(fill = "lightblue", colour = "black") +
   coord_map()

Coordinate system for plotting map

5. coord_fixed() - This coordinate system ensures that the aspect ratio of axes is kept inside the specified range. Check out the below examples:

# Building a scatter plot
plot <- ggplot(mtcars, aes(mpg, wt)) + geom_point()

# Setting the ratio to 1
ratio1 <- plot + coord_fixed(ratio = 1)
# Setting ratio to 10
ratio10 <- plot + coord_fixed(ratio = 3)
# plotting then in grid
grid.arrange(ratio1, ratio10, ncol = 2)

Adjusting the aspect ratio using coord_fixed

Support for Statistical Transformation in ggplot

Among many useful features of ggplot2 the one which may become dear to you is the support for statistical transformations. These functions save a lot of time as you dont have to prepare the data for it and the statistical calculations can be done on the go. Again there are mnay statistical function and we encourage you to explore them. However below I have listed some of the most widely used statistical functions.

1. stat_count - Creates a bar plot showcasing the frequency count of each level of categorical variable.

# Plotting the bar chart of cylinder counts
ggplot(data = mtcars) + 
  stat_count(mapping = aes(x = cyl))

Stat_count function

2. stat_density() - Creates a kernel density plot. Kernel density estimate is a smoothed version of histogram. A very useful alternative for histogram to plot the histogram.

# Plotting the bar chart of cylinder counts
ggplot(data = iris) + 
  stat_density(mapping = aes(x = Petal.Length))

Creating density plot in ggplot2

3. stat_summary() - The function summarises the Y Variable for each unique values of X Variable.

# Plotting the bar chart of cylinder counts
ggplot(data = iris) + 
  stat_summary(mapping = aes(x = Species, y = Petal.Length),
    fun.ymin = min,
    fun.ymax = max,
    fun.y = mean)

Plotting summary statistics in ggplot2

4. stat_smooth() - Adds a smooth line to a scatter plot.

# Adding smooth line to the scatter plot
ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth()

Adding Smooth line to a scatter plot in ggplot2

Themes Themes Themes

You must have noticed that the default theme for ggplot2 is pretty much greish in color. If you are not a great fan of grey color then dont worry, ggplot2 has couple of themes for you to choose from.

library(gridExtra)
p1 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_bw") +
  theme_bw()
  
p2 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_linedraw") +
  theme_linedraw()
  
p3 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_gray") +
  theme_gray()
  
p4 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_dark") +
  theme_dark()

p5 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_minimal") +
  theme_minimal()
  
p6 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth() +
  ggtitle("theme_void") +
  theme_void()

grid.arrange(p1,p2,p3,p4,p5,p6, ncol = 3, nrow = 2)

ggplot themes examples

Closing Note

In this chapter,
Last updated on 5 Jan 2019 / Published on 17 Oct 2017