Unit 2: Measures of Central Tendency & Dispersion

Table of Contents

1. Measures of Central Tendency (Averages)

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set. They are also known as "averages."

Mathematical Averages

1. Arithmetic Mean (AM or Mean)

The sum of all observations divided by the number of observations.

Ungrouped Data: x̄ = (x1 + x2 + ... + xn) / n = (Σx) / n
Grouped Data: x̄ = (Σf * x) / (Σf) = (Σf * x) / N
(where x = midpoint of class, f = frequency, N = total frequency)

2. Geometric Mean (GM)

The n-th root of the product of n observations. Used for averaging ratios, percentages, or growth rates.

Ungrouped Data: GM = (x1 * x2 * ... * xn)^(1/n)
Using Logs: log(GM) = (Σ log(x)) / n => GM = Antilog[ (Σ log(x)) / n ]

3. Harmonic Mean (HM)

The reciprocal of the arithmetic mean of the reciprocals of the observations. Used for averaging rates and speeds.

Ungrouped Data: HM = n / (Σ (1/x))
Grouped Data: HM = N / (Σ (f/x))
Note: For any set of positive numbers: AM ≥ GM ≥ HM.

Positional / Partitional Averages

1. Median (Md)

The middle value of a dataset that has been arranged in order (ascending or descending).

Ungrouped Data:
- If n is odd: Median = Value of the ((n+1)/2)-th item.
- If n is even: Median = Average of the (n/2)-th and ((n/2) + 1)-th items.

Grouped Data: Median = L + [ ( (N/2) - cf ) / f ] * h
(L = lower boundary of median class, N = total frequency, cf = cumulative frequency *before* median class, f = frequency of median class, h = class width)

2. Mode (Mo)

The value that appears most frequently in a dataset.

Grouped Data: Mode = L + [ (f1 - f0) / (2*f1 - f0 - f2) ] * h
(L = lower boundary of modal class, f1 = freq of modal class, f0 = freq of pre-modal class, f2 = freq of post-modal class, h = class width)
Empirical Relationship: For a moderately skewed distribution:
Mean - Mode ≈ 3 * (Mean - Median) or Mode ≈ 3*Median - 2*Mean

2. Partition Values

Values that divide an ordered dataset into a number of equal parts.

1. Quartiles (Q)

Divide the data into 4 equal parts.

2. Deciles (D)

Divide the data into 10 equal parts (D1, D2, ... D9).

3. Percentiles (P)

Divide the data into 100 equal parts (P1, P2, ... P99).

Note: Q1 = P25, Q2 = D5 = P50 = Median, Q3 = P75.
Formula for Grouped Data (Percentile 'k'):
Pk = L + [ ( (k*N/100) - cf ) / f ] * h
(To find Q1, use k=25. To find Q3, use k=75. To find D4, use k=40, etc.)

3. Measures of Dispersion (Variability)

Measures that describe the spread, scatter, or variation of data points in a dataset. A low dispersion means data is clustered tightly around the center.

Absolute Measures of Dispersion

(Expressed in the same units as the data)

1. Range

The simplest measure. The difference between the largest and smallest observation.

Range = Largest Value (L) - Smallest Value (S)

2. Inter-Quartile Range (IQR)

The range of the middle 50% of the data. It is a resistant measure of spread.

IQR = Q3 - Q1

3. Quartile Deviation (QD) or Semi-Interquartile Range

Half of the Inter-Quartile Range.

QD = (Q3 - Q1) / 2

4. Mean Deviation (MD)

The arithmetic mean of the absolute deviations of the observations from a measure of central tendency (mean, median, or mode).

MD (from mean): (Σ |x - x̄|) / n
MD (from median): (Σ |x - Md|) / n

5. Variance and Standard Deviation (SD)

The most important and widely used measures of dispersion.

Variance (σ² or s²): The average of the squared deviations from the mean.
Population Variance: σ² = (Σ (x - μ)²) / N
Sample Variance: s² = (Σ (x - x̄)²) / (n - 1) (Note the 'n-1' for unbiased estimate)
Standard Deviation (σ or s): The square root of the variance.
SD (σ or s) = sqrt(Variance)
Computational Formula: s = sqrt[ ( (Σx²) - ( (Σx)² / n ) ) / (n - 1) ]

4. Coefficient of Variation (Relative Dispersion)

Absolute measures (like SD) cannot be used to compare the variability of two different datasets if they have different units (e.g., heights vs. weights) or different means.

We use a relative measure, the Coefficient of Variation (CV).

Coefficient of Variation (CV): The ratio of the standard deviation to the mean, usually expressed as a percentage.
CV = (Standard Deviation / Mean) * 100
CV = (s / x̄) * 100
Exam Tip: A common question is: "Team A has a mean score of 80 with SD=5. Team B has a mean score of 50 with SD=4. Which team is more consistent?"
- CV(A) = (5 / 80) * 100 = 6.25%
- CV(B) = (4 / 50) * 100 = 8%
- Answer: Team A is more consistent because its CV is lower.

5. Graphical Representation of Measures

1. Ogives (Cumulative Frequency Curves)

An ogive is a graph of a cumulative frequency distribution. It is used to graphically locate partition values (Median, Quartiles, etc.).

Finding the Median: The Median is the X-coordinate of the intersection point of the "Less Than" and "More Than" ogives.
Alternatively, on a "Less Than" ogive, find the N/2 value on the Y-axis, draw a horizontal line to the curve, and then a vertical line down to the X-axis. This X-value is the Median.

2. Box Plot (Box-and-Whisker Plot)

A graphical summary of a distribution based on five numbers: Minimum, Q1, Median (Q2), Q3, and Maximum.

A box plot clearly shows the center (Median), spread (IQR/box length), and skewness (position of median in the box) of the data.


6. Moments

Moments are a set of statistical parameters used to describe the characteristics (shape, center, spread) of a distribution.

1. Raw Moments (μ'r) - Moments about Origin (Zero)

The r-th raw moment is the arithmetic mean of the r-th power of the observations.

μ'r = (Σ xr) / n (Ungrouped)
μ'r = (Σ f * xr) / N (Grouped)

2. Central Moments (μr) - Moments about the Mean

The r-th central moment is the arithmetic mean of the r-th power of the deviations from the mean.

μr = (Σ (x - x̄)r) / n (Ungrouped)
μr = (Σ f * (x - x̄)r) / N (Grouped)

7. Sheppard's Corrections for Moments

When calculating moments from grouped data (a continuous frequency distribution), we assume all values in a class are at the midpoint. This introduces a "grouping error."

Sheppard's corrections are used to adjust the calculated moments (μ'r) to get a more accurate estimate, assuming the distribution is continuous and tapers off to zero at both ends.

Let 'h' be the uniform class width.

Corrected μ1 = μ1 = 0 (No change)
Corrected μ2 = μ2 - (h² / 12)
Corrected μ3 = μ3 (No change)
Corrected μ4 = μ4 - (h² / 2) * μ2 + (7 * h4 / 240)
Exam Tip: You usually only need to remember the correction for the second moment (variance). Corrected Variance = Calculated Variance - (h²/12).

8. Measures of Skewness and Kurtosis

Skewness (Shape)

Measures the asymmetry or lack of symmetry of a distribution.

Measures of SkewGness:

  1. Karl Pearson's Coefficient (Skp):
    Skp = (Mean - Mode) / Standard Deviation
    (Approximate) Skp = 3 * (Mean - Median) / Standard Deviation
  2. Bowley's Coefficient (Skb): (Based on quartiles)
    Skb = (Q3 + Q1 - 2*Median) / (Q3 - Q1)
  3. Moment-based Coefficient (β1 and γ1):
    β1 = (μ3)² / (μ2
    γ1 = sqrt(β1) = μ3 / (μ2)1.5
    (If γ1 > 0, positive skew. If γ1 < 0, negative skew. If γ1 = 0, symmetrical)

Kurtosis (Peakedness)

Measures the peakedness or flatness of a distribution compared to the standard Normal distribution.

Measures of Kurtosis (β2 and γ2):

β2 = μ4 / (μ2
γ2 = β2 - 3 (This is "excess kurtosis")