Quick ggplot2 Tutorial
ggplot2 is a reliable system for describing and building graphs. The package is capable of creating elegant and aesthetically pleasing graphis. The frame work of ggplot2 is quite different (in comparision to graphics package) and is based on grammer of graphics(originally introduced by Leland Wilkinson). At first you may not find it intutive but dont worry we are here to help. Together we will master it to the core.
Basic plotting framework for ggplot
ggplot(data = dataset name) + <GEOM_FUNCTION>(mapping = aes(variable name))
Things You Will Master
- Mapping the aesthetics(using aes)
- Mapping Geometric shapes(using geom)
- Using Facets
- Mapping colors to variable
- Coordinate systems
- Statistical Transformation Support
- Themes Themes Themes
Mapping the aesthetics(using aes)
An asethetic is used to represent the object which you wish to plot in your graph. In other words, asethetics represents different ways in which you can represent your data points. So to showcase the data points you can change things like size, shape or color of the points. Thus by using aesthetics (represented by
aes()) you can convey the information which is hidden in your dataset.
For Example, you can map color to cylinder variable to reveal the relationship between mileage and weight. So let us take our framework and add the aesthetics. Here we have three variable and thus we have to pass three arguments to the
# Loading the library library(ggplot2) # loading data and converting cyl variable to factor data(mtcars) mtcars$cyl <- as.factor(mtcars$cyl) # Adding aesthetics ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = wt))
Mapping Geometric shapes(using geom)
The geometric shapes in ggplot are visual objects which you can use to describe your data. For example, one can plot histogram or boxplot to describe the distribution of a variable.
These two plots provide almost same information but through different visual objects. These objects are defined in ggplot using geom. That means you can use geom to define your plot. For Example, histogram uses histogram geom, barplot uses bar geom, line plot uses line geom and so on. There is one exception, we use point geom to plot scatter plots.
Let’s see how we can draw the charts which we mentioned in the above example using geoms for the total sleep hours of animals.
Every geom function requires you to map an aesthetic to it. However, not every aesthetic requires a geom. For example, one can set the shape of a point but you cannot set the shape of a line.
# Building a histogram ggplot(data = msleep) + geom_histogram(mapping = aes(x = sleep_total, col = "orange"))
# Building a histogram ggplot(data = msleep) + geom_boxplot(mapping = aes(y = sleep_total))
Using Facets in ggplot2
Facets is a way in which you can add additional categorical variables to your plot. The facet helps in building the chart by dividing the data two or more groups. The data from these groups is used for plotting the data.
Now there are two ways in which you can use facets:
A. If you want to split the data by only one variable then use
facet_wrap(). In the following syntax you will notice tilder(~), by default this is the first argument. After this should mention the variable name by which you want to make the split.
Checking the destribution of total sleep by kind of animal.
# Working example of facet_wrap ggplot(data = msleep) + geom_histogram(mapping = aes(x = sleep_total)) + facet_wrap(~ vore)
B. If you want to split the data by a combination two variable then you can use
facet_grid(). Here the two variables should be separated by the tilder(~).
Checking the scatter plot between mpg and disp variable by splitting the data by cyl and am type.
# loading data data(mtcars) # Converting cylinder(cyl) and automatic(am) variable to factor variables. mtcars$cyl <- as.factor(mtcars$cyl) mtcars$am <- as.factor(mtcars$am) # Working example of facet_grid ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = disp)) + facet_grid(cyl ~ am)
Mapping colors to variables in ggplot2
Colors can play a game changer role in any data visualization and thus it becomes important for us to learn about it. Apart from this in ggplot the default color is on grey scale and at times this may make things difficult to read and distinguish from one another.
In ggplot there are couple of ways in which you can use color.
A. You can simply assign the colors to the objects, line and points. To add colors to the objects, like bar, use
fill argument. And to set colors to the lines and points you can use
color argument. Below are quick example of both cases.
# Making the points blue color in the scatter plot ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = wt), color = "blue")
# Making the bars of histogram blue ggplot(data = iris) + geom_histogram(mapping = aes(x = Sepal.Width), fill = "blue")
B. we can use color to map the values of third variable which we have already learned in the very first example under mapping aesthetics.
Example 1 - Showcasing Default RColorBrew setup
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer()
Example 2 - Showcasing Set1 pallette colors
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Set1")
Example 3 - Showcasing Spectral pallette colors
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Spectral")
For Your reference sharing the RBrewColor Pallet chart.
Understanding the Coordinate System of ggplot2
The coordinates sytem of ggplot is little complicated. But dont worry we will not dig into too much as of now and if would provide you with few coordinate systems to start with. If you can remember them I think most of the job is done and this should not stop you from creating awesome charts using ggplot2. As you use ggplot more and more I am sure you will be able to take deeper pluges into ggplot coordinate system. However to start with I have short listed some 5 coordinate functions which are as mentioned below:
coord_cartesian() - This is the default coordinate system in ggplot2. According to this system the X and Y positions of each point act independently to determine its location on the graph.
coord_flip() - This is helpful in cases when you want to build horizontal graphs. This function switches the X and Y axis. For example, you can use
coord_flip to draw horizontal boxplots.
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Set1") + coord_flip()
coord_polar() - This creates a nice combination charts of bar and coxcomb or pie graphs by using polar coordinates.
# Loading girbExtra Library library(gridExtra) # Generating a barplot bar <- ggplot(data = mtcars) + geom_bar( mapping = aes(x = cyl, fill = cyl), width = 1 ) # Saving the two plots plot1 <- bar + coord_flip() plot2 <- bar + coord_polar() # Plotting the graphs by cloumn grid.arrange(plot1, plot2, ncol = 2)
coord_map() - This functions creates a 2D map of the desired earth location. We use
coor_polygon along with coord_map to a map with maintained aspect ratio. If you do not understand what this means then just run the code once without the coord_map part.
# An example showcasing the map of USA italy <- map_data("italy") ggplot(italy, aes(long, lat, group = group)) + geom_polygon(fill = "lightblue", colour = "black") + coord_map()
coord_fixed() - This coordinate system ensures that the aspect ratio of axes is kept inside the specified range. Check out the below examples:
# Building a scatter plot plot <- ggplot(mtcars, aes(mpg, wt)) + geom_point() # Setting the ratio to 1 ratio1 <- plot + coord_fixed(ratio = 1) # Setting ratio to 10 ratio10 <- plot + coord_fixed(ratio = 3) # plotting then in grid grid.arrange(ratio1, ratio10, ncol = 2)
Support for Statistical Transformation in ggplot
Among many useful features of ggplot2 the one which may become dear to you is the support for statistical transformations. These functions save a lot of time as you dont have to prepare the data for it and the statistical calculations can be done on the go. Again there are mnay statistical function and we encourage you to explore them. However below I have listed some of the most widely used statistical functions.
stat_count - Creates a bar plot showcasing the frequency count of each level of categorical variable.
# Plotting the bar chart of cylinder counts ggplot(data = mtcars) + stat_count(mapping = aes(x = cyl))
stat_density() - Creates a kernel density plot. Kernel density estimate is a smoothed version of histogram. A very useful alternative for histogram to plot the histogram.
# Plotting the bar chart of cylinder counts ggplot(data = iris) + stat_density(mapping = aes(x = Petal.Length))
stat_summary() - The function summarises the Y Variable for each unique values of X Variable.
# Plotting the bar chart of cylinder counts ggplot(data = iris) + stat_summary(mapping = aes(x = Species, y = Petal.Length), fun.ymin = min, fun.ymax = max, fun.y = mean)
stat_smooth() - Adds a smooth line to a scatter plot.
# Adding smooth line to the scatter plot ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()
Themes Themes Themes
You must have noticed that the default theme for ggplot2 is pretty much greish in color. If you are not a great fan of grey color then dont worry, ggplot2 has couple of themes for you to choose from.
library(gridExtra) p1 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_bw") + theme_bw() p2 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_linedraw") + theme_linedraw() p3 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_gray") + theme_gray() p4 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_dark") + theme_dark() p5 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_minimal") + theme_minimal() p6 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_void") + theme_void() grid.arrange(p1,p2,p3,p4,p5,p6, ncol = 3, nrow = 2)