The Excel correlation formula is used to find a mutual connection between two or more data sets. In statistics, a value between -1 and +1 describes how strongly two variables are related. A value +1 indicates a positive correlation, which means that there are two variables, x and y, and when x increases, y also increases. On the other hand, a negative correlation coefficient means that when a variable x increases, y decreases. A correlation coefficient of 0 indicates no correlation between the two data sets.
There are different types of correlation coefficients in statistics, but we will look at the most common one, the Pearson correlation coefficient. In Excel, the function CORREL is used to find the correlation coefficient. It is a statistical function that was introduced in Excel 2007. In this article, you will learn how to find the correlation coefficient of two data sets.
Table of Contents
- What Is the Correlation Formula in Excel?
- Where The Correlation Coefficient Is Used
- How to Find the Correlation Coefficient in Excel
- Closing Thoughts
What Is the Correlation Formula in Excel?
In statistics, Pearson’s correlation, also called Pearson’s r, is used to calculate the correlation formula. The formula is given below:
r = √[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]
Where The Correlation Coefficient Is Used
An example of a positive correlation is when the temperature rises, there is a rise in the sales of ice cream. An example of a negative correlation is a student with many absences and a decrease in grade point averages. The amount of coffee an individual consumes does not correlate to their IQ level. This is an example of a neutral correlation or zero correlation.
For a financial analyst, the CORREL function is useful to find the correlation between two variables, e.g. a particular stock and the market index.
Let’s take an example and calculate the correlation coefficient of two data sets by hand. The following table shows two values: age and the heart rate calculated per minute. Here, the column Age is considered x, and the column Heart rate is y. We need the values xy, x2, and y2 and Σy, Σxy, Σx, Σy, (Σx)2, (Σy)2 and to substitute them in the formula given below.
r = √ [nΣx2 – (Σx)2] [nΣy2 – (Σy)2]
The following steps explain how to calculate the correlation coefficient by hand. Take these values from the table and substitute them in the formula. Σx = 15, Σy = 590, Σxy = 1700, Σx2 = 55, Σy2 = 70150, and n = 5.
5(1700) – (15)(590)
r = √ [5(55) – (15)2] [5(70150) – (590)2]
r = -0.96
Since r is a negative number, it implies that it is negatively correlated. You would arrive at the same value when using the Excel CORREL formula, which we cover in the following section.
Excel CORREL Function
The CORREL function is categorised under Excel’s statistical functions, which calculates the correlation coefficient between two variables.
The two arguments are:
Array1 — is a cell range of values where the values can be numbers, names, arrays, or references that contain numbers.
Array2 — is the second range of values where the values can be numbers, names, arrays, or references that contain numbers.
Let’s take the same example to calculate CORREL using Excel. Click on the variable fx near the formula bar. It displays the Insert Function dialogue box, where you must select the appropriate function from the drop-down menu.
It lists common functions such as SUM, AVERAGE, IF, PROPER, and MAXIFS. Choose the CORREL function from the list of functions. You should get the window where you have to mention the function arguments, array1 and array2. On the Excel sheet, drag the columns A2:A6 in array1 and the cells B2:B6 to be mentioned in array2. Click OK. The correlation coefficient is displayed in the cell and the formula bar.
There’s another way of invoking the Excel correlation coefficient formula. Click on Formulas → More Functions → Statistical → CORREL. Make sure to place the cursor in the cell where you want the formula’s result to appear before selecting the formula itself.
The above data sets resulted in a value of -0.96. It implies that the two data sets are negatively correlated. The meaning of the statement negatively correlated is that as x increases, y decreases, which is evident from the data sets.
Let’s look at another example to understand how correlation in Excel works. In this example, the two random variables for which we need to find the correlation coefficient are the columns Wages and CPI (Consumer Price Index).
Click on the function fx, insert the function CORREL, mention the variables array1 and array2, and click OK. You get the result as 0.92, indicating that it is strongly positively correlated.
A positive correlation indicates that as the value of array1, wages increases, the other variable, CPI, also increases.
How to Find the Correlation Coefficient in Excel
Go to File → Options, and you should get the dialogue box Excel Options. There, on the left side of the dialogue box, click Add-ins. You should get the list of Microsoft Office Add-ins. From that, choose Analysis ToolPak. This is a tool package that provides data analysis for statistical and engineering analysis.
At the bottom of the dialogue box, you have a menu Manage. It has different types of Add-ins that can be included in Excel. Choose Excel Add-ins and click on the Go… button next to it.
Check the option Analysis ToolPak, then click OK.
After this step, a new group Data Analysis is included in the Data tab, which was not available till now. Now, go to the Data tab, and you will find a new group, Analysis, that is included after you click on the Add-ins. Click on Data Analysis → Correlation → OK. You get this window popping up where you have to insert the arguments for the formula.
In the Input Range, mention the cell range of the data. Include the cell location of the column labels also. Check the box that prompts whether to include the labels in the first row. In the dialogue box, mention where you need the result to appear. In this example, cell A8 is mentioned where you get the result displayed.
1. How do you use correlation in Excel?
The correlation formula finds the mutual connection between two or more data sets. It finds out how much two random variables are correlated with each other. In statistics, it is referred to as bivariate data, which denotes how often two random variables correlate.
2. How do you plot correlation coefficients in Excel charts?
A scatter plot is used for data analysis of random variables. The chart consists of two variables, X and Y. The variable X is independent, and the second variable, Y, is dependent on X. A positive correlation, negative correlation, and no correlation can be represented in the chart.
3. What possible errors can you expect from a CORREL function?
- The #N/A error occurs when the given arrays are of different lengths. If array1 and array2 have different numbers of data sets, CORREL will return the #N/A error value.
- #DIV/0 error is an error that occurs if the arrays are empty or if the standard deviation of their values equals zero.
- The cells with value zero are accepted; however, an array containing a text, logical value, or left empty are ignored.
This article explored finding the correlation coefficient between two random data sets in Excel. It is a statistical function widely used by data analysts to predict the outcomes of an event or an experiment.
There are a lot of other functions and features in Excel that you can learn, which can help you manage data more efficiently. Check our courses in Excel and Microsoft Office Applications to learn what a Pivot table is and how to insert slicers. Also, you can learn what lookup functions are and how to use them using our Lookups in Excel – Beginner’s Guide. You will earn Micro-credentials on the completion of the course.