Unit 3: Bivariate Data and Correlation

Table of Contents

1. Bivariate Data and Scatter Diagram

Bivariate Data

Data that involves two different variables, where we are interested in the relationship between them. Each observation consists of a pair of values (x, y).

Scatter Diagram (or Scatter Plot)

The simplest way to visualize bivariate data. It's a graph where each (x, y) pair is plotted as a single point on a 2D plane.

The pattern of the points helps us identify the type (linear, non-linear) and strength of the relationship.


2. Karl Pearson's Coefficient of Correlation (r)

Also known as the "product-moment correlation coefficient." It is a numerical measure of the strength and direction of the linear relationship between two quantitative variables.

Properties of 'r':

Formulas for 'r':

1. Covariance Method

r = Cov(x, y) / (σx * σy)

Where:
- Cov(x, y) = ( Σ[(x - x̄)(y - ȳ)] ) / n (Covariance of x and y)
- σx = sqrt( (Σ(x - x̄)²) / n ) (Standard deviation of x)
- σy = sqrt( (Σ(y - ȳ)²) / n ) (Standard deviation of y)

2. Raw Data (Computational) Formula

This is the most practical formula for calculations.

r = [ n(Σxy) - (Σx)(Σy) ] / sqrt[ [n(Σx²) - (Σx)²] * [n(Σy²) - (Σy)²] ]
Exam Tip: To use this formula, create a table with 5 columns: x, y, x², y², xy. Then, find the sum (Σ) of each column and plug the values into the formula along with 'n' (the number of pairs).

3. Spearman's Rank Correlation Coefficient (ρ or R)

This coefficient measures the strength and direction of the monotonic relationship (a relationship that consistently increases or decreases, but not necessarily in a straight line) between two variables.

It is used when:

  1. The data is ordinal (ranked), like "best," "second best."
  2. The quantitative data has significant outliers.

It is simply Pearson's 'r' calculated on the *ranks* of the data, not the values themselves.

Formula (when ranks are not tied):

R = 1 - [ ( 6 * Σd² ) / ( n * (n² - 1) ) ]

Where:
- d = Difference between the ranks of a pair: Rx - Ry
- n = Number of pairs of observations

Formula (when ranks are tied):

If two or more items have the same value, we assign them the average rank. (e.g., if 3 items are tied for 5th, they all get rank (5+6+7)/3 = 6).

When ties occur, a Correction Factor (CF) must be added to Σd².

CF = Σ [ m * (m² - 1) / 12 ]
- 'm' is the number of times an item is repeated (tied). You sum this for *all* tied groups in *both* x and y.

Corrected Formula:
R = 1 - [ ( 6 * (Σd² + CF) ) / ( n * (n² - 1) ) ]