# R Statistics Blog

Data Science From R Programmers Point Of View

# Quick ggplot2 Tutorial

ggplot2 is a reliable system for describing and building graphs. The package is capable of creating elegant and aesthetically pleasing graphis. The frame work of ggplot2 is quite different (in comparision to graphics package) and is based on grammer of graphics(originally introduced by Leland Wilkinson). At first you may not find it intutive but dont worry we are here to help. Together we will master it to the core.

## Basic plotting framework for ggplot

``````ggplot(data = dataset name) +
<GEOM_FUNCTION>(mapping = aes(variable name))``````

### Things You Will Master

1. Mapping the aesthetics(using aes)
2. Mapping Geometric shapes(using geom)
3. Using Facets
4. Mapping colors to variable
5. Coordinate systems
6. Statistical Transformation Support
7. Themes Themes Themes

## Mapping the aesthetics(using aes)

An asethetic is used to represent the object which you wish to plot in your graph. In other words, asethetics represents different ways in which you can represent your data points. So to showcase the data points you can change things like size, shape or color of the points. Thus by using aesthetics (represented by `aes()`) you can convey the information which is hidden in your dataset.

For Example, you can map color to cylinder variable to reveal the relationship between mileage and weight. So let us take our framework and add the aesthetics. Here we have three variable and thus we have to pass three arguments to the `aes()` function.

``````# Loading the library
library(ggplot2)

data(mtcars)
mtcars\$cyl <- as.factor(mtcars\$cyl)
ggplot(data = mtcars) +
geom_point(mapping = aes(x = mpg, y = wt))`````` ## Mapping Geometric shapes(using geom)

The geometric shapes in ggplot are visual objects which you can use to describe your data. For example, one can plot histogram or boxplot to describe the distribution of a variable.

These two plots provide almost same information but through different visual objects. These objects are defined in ggplot using geom. That means you can use geom to define your plot. For Example, histogram uses histogram geom, barplot uses bar geom, line plot uses line geom and so on. There is one exception, we use point geom to plot scatter plots.

Let’s see how we can draw the charts which we mentioned in the above example using geoms for the total sleep hours of animals.

### Attention

Every geom function requires you to map an aesthetic to it. However, not every aesthetic requires a geom. For example, one can set the shape of a point but you cannot set the shape of a line.

### Building histogram

``````# Building a histogram
ggplot(data = msleep) +
geom_histogram(mapping = aes(x = sleep_total, col = "orange"))`````` ### Building boxplot

``````# Building a histogram
ggplot(data = msleep) +
geom_boxplot(mapping = aes(y = sleep_total))`````` ## Using Facets in ggplot2

Facets is a way in which you can add additional categorical variables to your plot. The facet helps in building the chart by dividing the data two or more groups. The data from these groups is used for plotting the data.

Now there are two ways in which you can use facets:

A. If you want to split the data by only one variable then use `facet_wrap()`. In the following syntax you will notice tilder(~), by default this is the first argument. After this should mention the variable name by which you want to make the split.

Checking the destribution of total sleep by kind of animal.

``````# Working example of facet_wrap
ggplot(data = msleep) +
geom_histogram(mapping = aes(x = sleep_total)) +
facet_wrap(~ vore)`````` B. If you want to split the data by a combination two variable then you can use `facet_grid()`. Here the two variables should be separated by the tilder(~).

Checking the scatter plot between mpg and disp variable by splitting the data by cyl and am type.

``````# loading data
data(mtcars)
# Converting cylinder(cyl) and automatic(am) variable to factor variables.
mtcars\$cyl <- as.factor(mtcars\$cyl)
mtcars\$am <- as.factor(mtcars\$am)

# Working example of facet_grid
ggplot(data = mtcars) +
geom_point(mapping = aes(x = mpg, y = disp)) +
facet_grid(cyl ~ am)`````` ## Mapping colors to variables in ggplot2

Colors can play a game changer role in any data visualization and thus it becomes important for us to learn about it. Apart from this in ggplot the default color is on grey scale and at times this may make things difficult to read and distinguish from one another.

In ggplot there are couple of ways in which you can use color.

A. You can simply assign the colors to the objects, line and points. To add colors to the objects, like bar, use `fill` argument. And to set colors to the lines and points you can use `color` argument. Below are quick example of both cases.

### Using `color` argument

``````# Making the points blue color in the scatter plot
ggplot(data = mtcars) +
geom_point(mapping = aes(x = mpg, y = wt), color = "blue")`````` ### Using `fill` argument

``````# Making the bars of histogram blue
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Sepal.Width), fill = "blue")`````` B. we can use color to map the values of third variable which we have already learned in the very first example under mapping aesthetics.

### Attention

By default the ggplot2 uses scale_fill_hue() and scale_colour_hue() for color selection. However you can choose to change the luminance of these colors. Also there are other color scales available in R from RColorBrew package.

#### Example 1 - Showcasing Default RColorBrew setup

``````ggplot(data = mtcars) +
geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
scale_fill_brewer()`````` #### Example 2 - Showcasing Set1 pallette colors

``````ggplot(data = mtcars) +
geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
scale_fill_brewer(palette="Set1")`````` #### Example 3 - Showcasing Spectral pallette colors

``````ggplot(data = mtcars) +
geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
scale_fill_brewer(palette="Spectral")`````` For Your reference sharing the RBrewColor Pallet chart. ## Understanding the Coordinate System of ggplot2

The coordinates sytem of ggplot is little complicated. But dont worry we will not dig into too much as of now and if would provide you with few coordinate systems to start with. If you can remember them I think most of the job is done and this should not stop you from creating awesome charts using ggplot2. As you use ggplot more and more I am sure you will be able to take deeper pluges into ggplot coordinate system. However to start with I have short listed some 5 coordinate functions which are as mentioned below:

1. `coord_cartesian()` - This is the default coordinate system in ggplot2. According to this system the X and Y positions of each point act independently to determine its location on the graph.

2. `coord_flip()` - This is helpful in cases when you want to build horizontal graphs. This function switches the X and Y axis. For example, you can use `coord_flip` to draw horizontal boxplots.

``````ggplot(data = mtcars) +
geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) +
scale_fill_brewer(palette="Set1") +
coord_flip()`````` 3. `coord_polar()` - This creates a nice combination charts of bar and coxcomb or pie graphs by using polar coordinates.

``````# Loading girbExtra Library
library(gridExtra)
# Generating a barplot
bar <- ggplot(data = mtcars) +
geom_bar(
mapping = aes(x = cyl, fill = cyl),
width = 1
)

# Saving the two plots
plot1 <- bar + coord_flip()
plot2 <- bar + coord_polar()
# Plotting the graphs by cloumn
grid.arrange(plot1, plot2, ncol = 2)``````

### Attention

In the above code we have used a gridExtra package. I love this package it makes plotting multiple charts on the same canvas very easy. 4. `coord_map()` - This functions creates a 2D map of the desired earth location. We use `coor_polygon` along with coord_map to a map with maintained aspect ratio. If you do not understand what this means then just run the code once without the coord_map part.

``````# An example showcasing the map of USA
italy <- map_data("italy")
ggplot(italy, aes(long, lat, group = group)) +
geom_polygon(fill = "lightblue", colour = "black") +
coord_map()`````` 5. `coord_fixed()` - This coordinate system ensures that the aspect ratio of axes is kept inside the specified range. Check out the below examples:

``````# Building a scatter plot
plot <- ggplot(mtcars, aes(mpg, wt)) + geom_point()

# Setting the ratio to 1
ratio1 <- plot + coord_fixed(ratio = 1)
# Setting ratio to 10
ratio10 <- plot + coord_fixed(ratio = 3)
# plotting then in grid
grid.arrange(ratio1, ratio10, ncol = 2)`````` ## Support for Statistical Transformation in ggplot

Among many useful features of ggplot2 the one which may become dear to you is the support for statistical transformations. These functions save a lot of time as you dont have to prepare the data for it and the statistical calculations can be done on the go. Again there are mnay statistical function and we encourage you to explore them. However below I have listed some of the most widely used statistical functions.

1. `stat_count` - Creates a bar plot showcasing the frequency count of each level of categorical variable.

``````# Plotting the bar chart of cylinder counts
ggplot(data = mtcars) +
stat_count(mapping = aes(x = cyl))`````` 2. `stat_density()` - Creates a kernel density plot. Kernel density estimate is a smoothed version of histogram. A very useful alternative for histogram to plot the histogram.

``````# Plotting the bar chart of cylinder counts
ggplot(data = iris) +
stat_density(mapping = aes(x = Petal.Length))`````` 3. `stat_summary()` - The function summarises the Y Variable for each unique values of X Variable.

``````# Plotting the bar chart of cylinder counts
ggplot(data = iris) +
stat_summary(mapping = aes(x = Species, y = Petal.Length),
fun.ymin = min,
fun.ymax = max,
fun.y = mean)`````` 4. `stat_smooth()` - Adds a smooth line to a scatter plot.

``````# Adding smooth line to the scatter plot
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth()`````` ## Themes Themes Themes

You must have noticed that the default theme for ggplot2 is pretty much greish in color. If you are not a great fan of grey color then dont worry, ggplot2 has couple of themes for you to choose from.

``````library(gridExtra)
p1 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_bw") +
theme_bw()

p2 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_linedraw") +
theme_linedraw()

p3 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_gray") +
theme_gray()

p4 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_dark") +
theme_dark()

p5 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_minimal") +
theme_minimal()

p6 <- ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth() +
ggtitle("theme_void") +
theme_void()

grid.arrange(p1,p2,p3,p4,p5,p6, ncol = 3, nrow = 2)`````` ### Closing Note

In this chapter,
Last updated on 23 Nov 2019
Published on 17 Oct 2017
Edit on GitHub