What Is Correlation Analysis? (And How To Find It In Excel)

By Indeed Editorial Team

Published 12 October 2022

The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.

Researchers use different statistical methods to study and interpret data. Correlation analysis is one of these methods that help analysts discover how variables relate to one another. Understanding how to use this statistical method can help you identify the strength of relationships between variables. In this article, we answer the question, 'What is correlation analysis?', outline its benefits and discuss how to calculate it in Excel using two methods.

What Is Correlation Analysis?

The answer to the question, 'What is correlation analysis?' is that it is a statistical method that data analysts use to discover a relationship or a lack of a relationship between data sets and variables. If a relationship is present, this method can also determine its strength. The quantitative data that this analysis strategy uses most often comes from polls, surveys and other research methods. There are several ways to calculate correlation, but the most prevalent in data analysis is the Pearson correlation coefficient. The formula for this method is:

r = (covariance) / (standard deviation of x)(standard deviation of y)

Related: Frequently Asked Questions: What Is A Data Analyst?

What Are The Different Correlation Possibilities?

When studying the correlation between two variables, you can notice a positive correlation, a negative correlation or a lack of correlation. Here is a more in-depth explanation of each of the three correlation possibilities:

Positive correlation

A positive correlation indicates that both variables within a data set move in the same direction. If one variable increases, the other also increases if there is a positive correlation. Note that if the value of 'r' equals a positive one, there is a perfect positive correlation present. A positive value that is closer to zero still represents a positive correlation, but this correlation is not as strong.

Related: How To Become A Data Analyst: A Complete Guide

Negative correlation

A negative correlation indicates that the two variables move in opposite directions. As one variable increases, the other decreases. An 'r' value that equals a negative one means that a perfect negative correlation is present.

Related: The Difference Between Data Science And Data Analytics

No correlation

No correlation means that there is no relationship present between two variables. The 'r' value equals zero when there is no correlation present. It is best to have a large sample size so that you can most accurately determine if there is truly a lack of correlation.

Related: How Much Do Data Analysts Make? (Plus How To Become One)

Benefits Of Finding Correlation In A Data Set

Here are some key benefits of finding correlation in a data set:

  • It helps data analysts identify strong and weak relationships within data.

  • It allows data analysts to determine if there are no relationships present in a data set.

  • It helps researchers establish a starting point as they begin to study relationships for the first time.

  • It allows researchers to narrow their findings as they investigate relationships further.

  • It simplifies research findings, as the correlation value ranges from -1 to 1 when using Pearson's correlation coefficient.

Related: Business Analyst Vs Data Analyst: What Is The Difference?

How To Calculate Correlation In Excel Using The CORREL Function

Here is a list of steps on how to calculate correlation in Excel using the CORREL function:

1. Enter your data into two columns

The first step to calculating correlation in Excel using the CORREL function is to enter your data into two adjacent columns. Enter all of your x-values in the left column. Enter all of your y-values in the right column.

2. Enter the CORREL function

The next step is to enter the CORREL function in a separate cell. Begin by typing an equal sign and follow it with the phrase 'CORREL.' Insert an open parenthesis. Insert the first array, typing the cell name of the first value, a colon and the cell name of the last value. Add a comma and type the name of the second array, ensuring that you follow the same format. Then, insert a close parenthesis.

3. Run the function

Confirm that you have entered the function correctly. Press the 'Enter' key on your keyboard. Notice that the correlation value is now visible within the same cell where you originally typed the formula.

How To Calculate Correlation In Excel Using The Data Analysis ToolPak

While the CORREL function is a useful way to find the correlation within a data set, it is not the only method. If you do not want to remember the CORREL function and how to compose it, you can use the Data Analysis ToolPak. Here is a list of steps on how to calculate correlation in Excel using the Data Analysis ToolPak:

1. Instal the Data Analysis ToolPak

The Data Analysis ToolPak is an optional feature in Excel, so you can instal this program if you do not already have it. Instal it by first opening the 'File' tab. Select 'Options.' Navigate to the 'Add-Ins' button on the sidebar.

Click on it and find the option that says 'Analysis ToolPak' underneath the 'Inactive Application Add-Ins' section. Select the 'Go' button on the bottom right of the screen. Select the 'Analysis ToolPak' option and click 'OK.' Return to your Excel file and click on the 'Data' tab. Go to the 'Analysis' group to confirm that the 'Data Analysis' program is now visible.

Related: What Is The Quick Analysis Tool In Excel? And How To Use It

2. Enter your data into your worksheet

Once you enable the Data Analysis ToolPak, you can type your data into your worksheet. Consider restricting your data to two columns if possible so you can easily use the Data Analysis ToolPak. Place all of your x-values in the left column and all of your y-values in the right column. Assign your columns custom names according to the variable types you have. Alternatively, you can name each column 'x' and 'y' for simplicity.

Related: 10 Valuable Data Analysis Skills

3. Select 'Correlation' in the 'Data Analysis' tool

View the tool ribbon on your spreadsheet. Click on the 'Data' tab. Navigate to the 'Analysis' group and choose the button that says 'Data Analysis.' Select the 'Correlation' option. Choose 'OK.'

4. Input the variables for which you want to find the correlation

The next step is to designate the variables for which you want to find the correlation. Locate the 'Input Range' box. You can manually type in the range using your keyboard. To follow this method, enter the range of your first column and use a colon to separate the first and last values.

Then, enter a comma and enter the range of your second column, similarly using a colon to separate the first and last values. Alternatively, you can highlight your variables using your cursor. As you highlight the appropriate cells, you can see their alphanumeric values appear in the 'Input Range' box.

Related: 11 Data Analysis Tools (Including Tips For Choosing One)

5. Clarify your cell selections

The next step is to clarify your cell selections. Ensure that the range for each column is accurate. Then, you can select either the 'Columns' or 'Rows' option. If you followed the data input advice in step two, you can choose the 'Columns' option. Most spreadsheet users input data into columns by default rather than rows, so the 'Columns' option is usually the most intuitive.

It is also important to consider if the data you selected has column headers. You do not need to reselect the data. Instead, you can ensure that you check the box that reads 'Labels in first row.' This action informs the program that you have selected a range of cells with irrelevant data in the first row. It can prevent errors from occurring with the formula.

6. Designate your output range

The final step to calculating correlation in Excel is to designate an output range. Locate the text box that says 'Output Range.' Type the name of the cell that you want to display the output result. Alternatively, you can use your cursor to click on a cell that you want to hold the output. Choose the 'OK' button to finalise your selections and allow the program to produce the data's correlation value.

Please note that none of the companies, institutions or organisations mentioned in this article are affiliated with Indeed.

Explore more articles