Correlation coefficient are used to describe the degree of association between quantitaive variables. The value of correlation lies between +1 to -1. The signs only idicate the direction of the relationship. That means a +0.86 value is equal to -0.86. However, the -ve sign indicates that if one variable increases the other decreases and +ve indicates that if one variable increases the other also increases. A value in the range of +0.20 to -0.20 indicates very weak or no correlation.
R Programming supports variety of correlations. However, we will only be discussing about Pearson, Spearman, and Kendall correlation as these are used most of the time. You can briefly learn about the correlation and types, HERE
cor function produces all the above mentioned correlation coefficients. Although the
cor function finds the correlation for a matrix, it does not provide any information related to significance of correlation. If you are interested in that you can use
Things You Will Master
- Quick look - Types of correlation
- Generate correlation matrix
- Testing correlation for significance
- Visualizing correlation matrix using
Quick Look - Types of correlation
Let us quickly learn when to use which correlation.
1. Pearson correlation - Pearson correlation is used when we want to assess the degree of association between two quantitative variables.
cor(x, method = "pearson")
2. Spearman correlation - Use spearman correlation when you want to assess the degree of association between rank-ordered variables.
cor(x, method = "spearman")
3. Kendall’s correlation - Kendall’s correlation can also be used to assess the degree of association between rank-ordered variables. However, it is non-parametric measure.
cor(x, method = "kendall")
Generate correlation matrix
One can generate correlation matrix given any correlation type using
cor function. However, just ensure that you have carefully looked into the data type. This will ensure that you produce the correct results using appropriate correlation. Let us generate correlation between the variables of
# Computing correlation corMat <- cor(x= iris[, -5], method = "pearson") # Rounding the values to two decimal round(corMat, 2)
# Output Sepal.Length Sepal.Width Petal.Length Petal.Width Sepal.Length 1.00 -0.12 0.87 0.82 Sepal.Width -0.12 1.00 -0.43 -0.37 Petal.Length 0.87 -0.43 1.00 0.96 Petal.Width 0.82 -0.37 0.96 1.00
Above matrix suggests that
Sepal.Length have high correlation. Similarly, you find other variables which show high correlation between each other.
The same function can be used to print the correlation matrix between the two ranked variables. For example, we can calculate the correlation between the cylinder type and gear. Both these variables are ranked variables.
# Looking into the spearman correlation cor(mtcars[, c("cyl", "gear")], method = "spearman")
# Output cyl gear cyl 1.0000000 -0.5643105 gear -0.5643105 1.0000000
Testing correlation for significance
To check the statistical significance of of the correlation we can use
cor.test function. The function generates the p-value which when compared to alpha value reveals if the correlation is statistically significant or not.
According to the decision rule, if p-value is less than alpha(0.05) we reject the null hypothesis. Here null hypothesis is - that correlation between the two variables is equal to zero.
# Checking significance of correlation cor.test(mtcars$mpg, mtcars$disp)
# Output Pearson's product-moment correlation data: mtcars$mpg and mtcars$disp t = -8.7472, df = 30, p-value = 0.000000000938 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.9233594 -0.7081376 sample estimates: cor -0.8475514
Based upon the above test results we conclude that the correlation between mileage and displacement variable is significant.
Visualizing correlation Matrix
Visualization is a powerful tool. It speeds up the process of understanding and digesting the important points. As your dataset grows it gets more and more difficult to go through the numbers present in your correlation matrix. So the best way to represent your insights about the relationship between variables is through correlation charts.
We are sharing some of the examples below and that means you can use whatever suits your need. For building these graphs we are using a package called as
corrplot. If you dont have this R Package then use
install.packages() to install it on your local system.
# loading package require(corrgram) corrgram(mtcars[, c("mpg", "wt", "disp", "hp", "qsec")], order=TRUE)
In the above graph:
The Red Shade indicates negative correlation between the variables, darker the shade stronger the association.
The Blue Shade indicates positive correlation between the variables, darker the shade stronger the association.
Example 2 - Visualizing correlation matrix using
There are seven different shapes or you can say ways in which you can represent the information - “pie”, “circle”, “square”, “number”, “ellipse”, “shade”, “color”.
The first argument to the function is a correlation matrix
# loading library require(corrplot) # Generating correlation matrix corMat <- cor(mtcars[, c("mpg", "wt", "disp", "hp", "qsec")]) # Building the correlation plot corrplot(corMat, method="pie")
Example 3 - Changing the shape to square
# Generating correlation matrix corMat <- cor(mtcars[, c("mpg", "wt", "disp", "hp", "qsec")]) # Building the correlation plot corrplot(corMat, method="square")
Example 4 - Representing the correlation information using numbers
# Generating correlation matrix corMat <- cor(mtcars[, c("mpg", "wt", "disp", "hp", "qsec")]) # Building the correlation plot corrplot(corMat, method="number")
Example 5 - Changing the layout of the correlation graph
So far we have been drawing the full correlation matrix. However as we know that upper triangle matrix and lower triangle matrix are similar so you can choose to represent only one half of the table.
# Generating correlation matrix corMat <- cor(mtcars[, c("mpg", "wt", "disp", "hp", "qsec")]) # Building the correlation plot corrplot(corMat, method="circle", type = "upper")