Overview
Test Series
Karl Pearson’s Correlation Coefficient is a method in statistics used to measure how strongly two sets of data are related. It tells us whether the change in one variable is connected to the change in another.
This coefficient is also called Pearson’s r, and it’s often used when studying relationships in linear regression.
This method gives a number between -1 and +1, which shows how strong or weak the connection is. It’s a helpful tool in comparing trends and patterns in data.
Maths Notes Free PDFs
Topic | PDF Link |
---|---|
Class 12 Maths Important Topics Free Notes PDF | Download PDF |
Class 10, 11 Mathematics Study Notes | Download PDF |
Most Asked Maths Questions in Exams | Download PDF |
Increasing and Decreasing Function in Maths | Download PDF |
Karl Pearson’s coefficient of correlation is a linear correlation coefficient that comes under the range of -1 to +1. A value of -1 signifies a strong negative correlation while +1 indicates a strong positive correlation.
There are 3 assumptions of Karl Pearson’s coefficient of correlation:
Degree of Correlation
No correlation: When the value is zero.
Karl Pearson’s correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is the normalization of the covariance between the two variables to give an interpretable score.
Karl Pearson’s correlation coefficient formula is given below:
\(r = {\sum(X – \bar{X})(Y – \bar{Y})\over{\sqrt{\sum(X – \bar{X})^2}\sqrt{(Y – \bar{Y})^2}}}\)
where \(\bar{X}\) = mean of X variable
\(\bar{Y}\) = mean of Y variable
Covariance Formula: \(Cov (X, Y) = r = {\sum(X – \bar{X})(Y – \bar{Y})\over{N}} = {\sum{xy}\over{N}}\)
There are 4 methods to calculate Karl Pearson’s Coefficient of Correlation which are given below:
In actual mean method, the actual mean is found by adding up all the numbers, then dividing by how many numbers there are. In other words, it is the sum divided by the count.
If the given data is large, then this method is recommended rather than a direct method for calculating the mean. This method helps in reducing the calculations and results in small numerical values. Under the assumed mean method, the correlation coefficient is calculated by taking the assumed mean only. Where dx = deviations of X from its assumed mean; dy= deviations of y from its assumed mean. Pearson’s Coefficient of correlation always lies between +1 and -1.
The step deviation method is the extended method of the assumed or short-cut method of obtaining the mean of large values. These values of deviations are divisible by a common factor that is reduced to a smaller value. The step deviation method is also called a change of origin or scale method. To calculate the Pearson product-moment correlation by Step Deviation Method, one must first determine the covariance of the two variables in question. Next, one must calculate each variable’s standard deviation. The correlation coefficient is determined by dividing the covariance by the product of the two variables’ standard deviations.
Steps involved in the procedure of calculation of Karl Pearson’s coefficient of correlation by the direct method.
The correlation coefficient shows how strongly two variables are related and in what direction. Based on this, there are three main types:
This means that both variables move in the same direction.
If one value increases, the other also increases.
Example: The more time you spend exercising, the more calories you burn.
Here, the two variables move in opposite directions.
If one value goes up, the other comes down.
Example: As the price of a product increases, the demand for it usually decreases.
This means there is no connection between the two variables.
A change in one does not affect the other.
Example: A person’s height has nothing to do with their intelligence.
The correlation coefficient tells us how strongly two variables are related. A key point to remember is that this value does not change if we change the scale or origin of the data.
Let’s say we have two variables:
Now, apply changes:
Karl Pearson’s coefficient of correlation shows the following properties with proof:
Property 1: Karl Pearson’s Coefficient of Correlation (r) lies between and -1 and 1 i.e. \(-1\geq{r}\geq1\)
Proof: Suppose, X and Y are two variables that take values \((x_i, y_i)\), i = 1, 2, 3, 4, …. n which means,
\(\begin{matrix}
{\bar{x}}, {\bar{y}}\text{ and }{\sigma_x} {\sigma_y} \text{ standard deviation respectively. }\\
\text{ Let us consider, }\\
\sum[{x-\bar{x}\over{\sigma_x}} \pm {y-\bar{y}\over{\sigma_y}}]^2\geq{0}\\
\sum[({x-\bar{x}\over{\sigma_x}})^2 + ({y-\bar{y}\over{\sigma_y}})^2 \pm 2{(x-\bar{x})(y-\bar{y})\over{\sigma_x\sigma_y}}]\geq{0}\\
{1\over{\sigma_x^2}}\sum{x-\bar{x}} + {1\over{\sigma_y^2}}\sum{y-\bar{y}} \pm {2\over{\sigma_x\sigma_y}}{\sum(x-\bar{x})(y-\bar{y})}\geq{0}\\
\text{ Dividing both sides by n, we get }\\
{1\over{\sigma_x^2}}{\sum{x-\bar{x}}\over{n}} + {1\over{\sigma_y^2}}{\sum{y-\bar{y}}\over{n}} \pm {2\over{\sigma_x\sigma_y}}{\sum(x-\bar{x})(y-\bar{y})\over{n}}\geq{0}\\
{1\over{\sigma_x^2}}\sigma_x^2 + {1\over{\sigma_y^2}}\sigma_y^2 \pm {2\over{\sigma_x\sigma_y}} cov(x,y)\geq{0}\\
1 + 1 \pm{2r} \geq{0}\\
2 \pm{2r} \geq{0}\\
2 (1 \pm{r}) \geq{0}\\
(1 \pm{r}) \geq{0}\\
\text{ Either } (1 + {r}) \geq{0} \text{ or } (1 – {r}) \geq{0}\\
r \geq -1 \text{ or } r \leq 1\\
\therefore -1\geq{r}\geq1
\end{matrix}\)
The least value of r is –1 and the most is +1. If r = +1, there is a perfect positive correlation between the two variables. If r = -1, there is a perfect negative correlation.
If r = 0, then there is no linear relation between the variables. However, there may be a non-linear relationship between the variables.
If it is positive but close to zero, then there will be a weak positive correlation and if is close to +1, then there will be a strong positive correlation.
Property 2: Correlation coefficient is independent of change in origin and scale
Proof: Suppose, X and Y are the original variables and after changing origin and scale, we have
\(\begin{matrix}
U = {X – a \over{h}} \text{ and } U = {Y – b \over{k}} \text{ where a, b, h, k are all constants. }\\
X – a = hU \text{ and } Y – b = kV\\
X = a + hU \text{ and } Y = b + kV\\
\bar{X} = a + h\bar{U} \text{ and } \bar{Y} = b + k\bar{V}\\
X – \bar{X} = h(U – \bar{U}) \text{ and } Y – \bar{Y} = k(V – \bar{V}) \\
\text{ Now, } r_{xy} = {\sum(x – \bar{x})(y – \bar{y})\over{\sqrt{\sum(x – \bar{x})^2}\sqrt{\sum(y – \bar{y})^2}}}\\
r_{xy} = {hk\sum(U – \bar{U})(V – \bar{V})\over{\sqrt{h^2}{\sum(U – \bar{U})^2}\sqrt{k^2}\sum{(V – \bar{V})^2}}}\\
r_{xy} = {\sum(x – \bar{x})(y – \bar{y})\over{h\sqrt{\sum(x – \bar{x})^2}k\sqrt{\sum(y – \bar{y})^2}}}\\
=r_{u,v}\\
r_{xy} = r_{u,v}
\end{matrix}\)
Learn about Limits and Continuity
Property 3: Two independent variables are uncorrelated but the converse is not true
Proof: If two variables are independent then their covariance is zero, i.e., cov (X, Y) = 0
\(\therefore r_{xy} = {cov (X, Y)\over{\sigma_x\sigma_y}} = {0\over{\sigma_x\sigma_y}} = 0\)
Thus, if two variables are independent their co-efficient of correlation is zero, i.e., independent variables are uncorrelated.
But, the converse is not true. If \(r_{xy} = 0\), then there does not exist any linear correlation between the variables as Karl Pearson’s coefficient of correlation \(r_{xy} \) is a measure of an only linear relationship. However, there may be a strong non-linear or curvilinear relationship even though \(r_{xy} = 0\).
A correlation coefficient is a pure number independent of the unit of measurement.
The correlation coefficient is symmetric.
Example 1: Compute the correlation coefficient between x and y from the following data \(n = 10, \sum{xy} = 220, \sum{x^2} = 200, \sum{y^2} = 262, \sum{x} = 40 and \sum{y} = 50\)
Solution: \(\begin{matrix}
\text{ The formula to find the Pearson correlation coefficient is given by }\\
r = r_{xy} = \frac{Cov(x , y)}{S_x\times{S_y}}\\
Cov (x, y) = [{\sum{xy}\over{n}}] \text{ – mean of “x” . mean of “y”}\\
\text{ Mean of “x” }= [{\sum{x}\over{n}}] = {40\over{10}} = 4\\
\text{ Mean of “y” }= [{\sum{y}\over{n}}] = {50\over{10}} = 5\\
\text{ Cov (x, y) } = {50\over{10}} – 4 \times 5\\
\text{ Cov (x, y) } = 22 – 20\\
\text{ Cov (x, y) } = 2\\
\text{ SD of “x” } = \sqrt{ (\sum{x^2}/n) – (\bar{x})^2] }\\
\text{ SD of “x” } = \sqrt{ [(200/10) – (4)^2] }\\
\text{ SD of “x” } =\sqrt{ [20 – 16] }\\
\text{ SD of “x” } =\sqrt{ [4] }\\
\text{ SD of “x” } = 2\\
\text{ SD of “y” } = \sqrt{ [(∑y^2/n) – (\text{mean of y})^2] }\\
\text{ SD of “y” } = \sqrt{ [({262\over{10}}) – (5)^2] }\\
\text{ SD of “y” } = \sqrt{ [26.2 – 25] }\\
\text{ SD of “y” } = \sqrt{ [1.2] }\\
\text{ SD of “x” } = 1.0954\\
\text{ Pearson correlation coefficient is }\\
r = 2 / (2 \times 1.0954)\\
r = {2\over{2.1908}}\\
r = 0.91
\end{matrix}\)
Example 2: Find Karl Pearson’s Correlation Coefficient for the following data.
Solution:
\(\begin{matrix}
r = {{\sum{dxdy} – {\sum{dx}\times\sum{dy}\over{N}}}\over{\sqrt{\sum{dx^2} – {\sum{dx}^2\over{N}}}\times\sqrt{\sum{dy^2} – {\sum{dy}^2\over{N}}}}}\\
r = {{2116 – {47\times108\over{8}}}\over{\sqrt{1475 – {47^2\over{8}}}\times\sqrt{3468 – {108^2\over{8}}}}}\\
r = {2116 – 634.5 \over{\sqrt{1475 – 276.125} \times\sqrt{3468 – 1458}}}\\
r = {1481.5 \over{\sqrt{1198.875} \times\sqrt{2010}}}\\
r = {1481.5 \over{34.62\times44.83}}\\
r = {1481.5 \over{1552.0146}}\\
r = 0.955
\end{matrix}\)
If you are checking Karl Pearson’s Correlation Coefficient article, also check related maths articles: |
|
Download the Testbook APP & Get Pass Pro Max FREE for 7 Days
Download the testbook app and unlock advanced analytics.