Unit 3: Measures of Dispersion, Skewness & Kurtosis

1. Concept of Dispersion & Ideal Measure
2. Absolute Measures of Dispersion
3. Relative Measures of Dispersion
4. Moments (Raw and Central)
5. Measures of Skewness and Kurtosis

1. Concept of Dispersion & Ideal Measure

Concept of Dispersion

Measures of central tendency (like the mean) tell us the "center" of the data, but they don't tell us about its spread or variability.

Example:
Set A: {49, 50, 51} => Mean = 50
Set B: {1, 50, 99} => Mean = 50
Both sets have the same mean, but Set B is much more "dispersed" (spread out) than Set A.

Dispersion: The measure of the variation of the items, or the degree to which data points are scattered around the central value.

Criteria for an Ideal Measure of Dispersion

Similar to an ideal average:

Rigidly Defined
Easy to Understand and Calculate
Based on All Observations
Suitable for Further Algebraic Treatment
Not Unduly Affected by Extreme Values

2. Absolute Measures of Dispersion

These measures are expressed in the same units as the original data (e.g., cm, kg, $).

1. Range

The simplest measure. The difference between the largest and smallest observation.

Range = Largest Value (L) - Smallest Value (S)

Pros: Easy to calculate.
Cons: Based on only two values, highly affected by outliers.

2. Quartile Deviation (QD)

Also known as the Semi-Interquartile Range. It measures the spread of the middle 50% of the data.

QD = (Q3 - Q1) / 2
(where Q1 = 1st Quartile, Q3 = 3rd Quartile)

Pros: Not affected by outliers.
Cons: Ignores 50% of the data (the extremes).

3. Mean Deviation (MD)

The arithmetic mean of the absolute deviations of the observations from a measure of central tendency (usually the median or mean).

MD (from mean) = (Σ |x - x̄|) / n
MD (from median) = (Σ |x - Median|) / n

Pros: Uses all data points.
Cons: Ignores negative signs, which is mathematically problematic.

4. Variance and Standard Deviation (SD)

The most important and widely used measures of dispersion. The Standard Deviation is the "gold standard" of dispersion measures.

Variance (s²): The average of the squared deviations from the mean.

Sample Variance: s² = (Σ (x - x̄)²) / (n - 1)
Population Variance: σ² = (Σ (x - μ)²) / N

Standard Deviation (s or σ): The positive square root of the variance.

SD (s) = sqrt(Variance)

Pros: Uses all data, mathematically sound (by squaring, it fixes the sign problem of MD), basis for many other statistical methods.
Cons: Affected by outliers (due to squaring).

3. Relative Measures of Dispersion

Absolute measures (like SD) cannot be used to compare the variability of two different datasets if they have different units (e.g., heights in cm vs. weights in kg) or very different means.

We use relative measures, which are unit-free ratios or percentages.

1. Coefficient of Dispersion (General)

This is a general term for any relative measure of dispersion. Each absolute measure has a corresponding relative measure.

Coefficient of Range = (L - S) / (L + S)
Coefficient of Quartile Deviation = (Q3 - Q1) / (Q3 + Q1)
Coefficient of Mean Deviation = MD / (Mean or Median)

2. Coefficient of Variation (CV)

This is the most important and widely used relative measure. It corresponds to the Standard Deviation.

Coefficient of Variation (CV): The ratio of the standard deviation to the mean, expressed as a percentage.

CV = (Standard Deviation / Mean) * 100
CV = (s / x̄) * 100

A lower CV means the data is more consistent, more stable, or less variable.
A higher CV means the data is less consistent, less stable, or more variable.

Exam Tip: A common question is: "Team A has a mean score of 80 with SD=5. Team B has a mean score of 50 with SD=4. Which team is more consistent?"
- CV(A) = (5 / 80) * 100 = 6.25%
- CV(B) = (4 / 50) * 100 = 8%
- Answer: Team A is more consistent because its CV is lower.

4. Moments (Raw and Central)

Moments are a set of statistical parameters used to describe the characteristics (shape, center, spread) of a distribution.

1. Raw Moments (μ'_r) - Moments about Origin (Zero)

The r-th raw moment is the arithmetic mean of the r-th power of the observations.

μ'_r = (Σ x^r) / n (Ungrouped)
μ'_r = (Σ f * x^r) / N (Grouped)

The first raw moment (r=1) is the Arithmetic Mean: μ'₁ = x̄

2. Central Moments (μ_r) - Moments about the Mean

The r-th central moment is the arithmetic mean of the r-th power of the deviations from the mean.

μ_r = (Σ (x - x̄)^r) / n (Ungrouped)
μ_r = (Σ f * (x - x̄)^r) / N (Grouped)

The first central moment (r=1) is always zero: μ₁ = 0
The second central moment (r=2) is the Variance: μ₂ = σ²

5. Measures of Skewness and Kurtosis

These measures describe the shape of the distribution.

Skewness (Asymmetry)

Measures the asymmetry or lack of symmetry of a distribution.

Symmetrical Distribution: The "bell" shape is identical on both sides.
- Mean = Median = Mode
- Skewness = 0
Positively Skewed (Skewed to the Right): The "tail" is longer on the right.
- Mean > Median > Mode
- Skewness > 0
Negatively Skewed (Skewed to the Left): The "tail" is longer on the left.
- Mean < Median < Mode
- Skewness < 0

Moment-based Coefficient (β₁):

β₁ = (μ₃)² / (μ₂)³

(If β₁ = 0, symmetrical. If β₁ > 0, skewed.)

Kurtosis (Peakedness)

Measures the peakedness or flatness of a distribution compared to the standard Normal distribution.

Leptokurtic: More peaked, sharper peak, and heavier/fatter tails.
Mesokurtic: The "normal" bell shape (like the Normal Distribution).
Platykurtic: Flatter, more rounded peak, and lighter/thinner tails.

Moment-based Coefficient (β₂):

β₂ = μ₄ / (μ₂)²

If β₂ = 3, it is Mesokurtic (Normal).
If β₂ > 3, it is Leptokurtic (Leap/Peaked).
If β₂ < 3, it is Platykurtic (Flat/Plateau).