Unit 3: Measures of Dispersion, Skewness & Kurtosis
        
        
            1. Concept of Dispersion & Ideal Measure
            
            Concept of Dispersion
            Measures of central tendency (like the mean) tell us the "center" of the data, but they don't tell us about its spread or variability.
            Example:
            
            Set A: {49, 50, 51}  => Mean = 50
            
            Set B: {1, 50, 99}  => Mean = 50
            
            Both sets have the same mean, but Set B is much more "dispersed" (spread out) than Set A.
            
            
                Dispersion: The measure of the variation of the items, or the degree to which data points are scattered around the central value.
            
            
            Criteria for an Ideal Measure of Dispersion
            Similar to an ideal average:
            
                - Rigidly Defined
- Easy to Understand and Calculate
- Based on All Observations
- Suitable for Further Algebraic Treatment
- Not Unduly Affected by Extreme Values
        
            2. Absolute Measures of Dispersion
            These measures are expressed in the same units as the original data (e.g., cm, kg, $).
            1. Range
            The simplest measure. The difference between the largest and smallest observation.
            Range = Largest Value (L) - Smallest Value (S)
            
                - Pros: Easy to calculate.
- Cons: Based on only two values, highly affected by outliers.
2. Quartile Deviation (QD)
            Also known as the Semi-Interquartile Range. It measures the spread of the middle 50% of the data.
            
                QD = (Q3 - Q1) / 2
                
                (where Q1 = 1st Quartile, Q3 = 3rd Quartile)
            
            
                - Pros: Not affected by outliers.
- Cons: Ignores 50% of the data (the extremes).
3. Mean Deviation (MD)
            The arithmetic mean of the absolute deviations of the observations from a measure of central tendency (usually the median or mean).
            
                MD (from mean) = (Σ |x - x̄|) / n
                
                MD (from median) = (Σ |x - Median|) / n
            
            
                - Pros: Uses all data points.
- Cons: Ignores negative signs, which is mathematically problematic.
4. Variance and Standard Deviation (SD)
            The most important and widely used measures of dispersion. The Standard Deviation is the "gold standard" of dispersion measures.
            
                Variance (s²): The average of the squared deviations from the mean.
            
            
                Sample Variance: s² = (Σ (x - x̄)²) / (n - 1)
                
                Population Variance: σ² = (Σ (x - μ)²) / N
            
            
                Standard Deviation (s or σ): The positive square root of the variance.
            
            
                SD (s) = sqrt(Variance)
            
            
                - Pros: Uses all data, mathematically sound (by squaring, it fixes the sign problem of MD), basis for many other statistical methods.
- Cons: Affected by outliers (due to squaring).
        
            3. Relative Measures of Dispersion
            Absolute measures (like SD) cannot be used to compare the variability of two different datasets if they have different units (e.g., heights in cm vs. weights in kg) or very different means.
            We use relative measures, which are unit-free ratios or percentages.
            
            1. Coefficient of Dispersion (General)
            This is a general term for any relative measure of dispersion. Each absolute measure has a corresponding relative measure.
            
                - Coefficient of Range = (L - S) / (L + S)
- Coefficient of Quartile Deviation = (Q3 - Q1) / (Q3 + Q1)
- Coefficient of Mean Deviation = MD / (Mean or Median)
2. Coefficient of Variation (CV)
            This is the most important and widely used relative measure. It corresponds to the Standard Deviation.
            
                Coefficient of Variation (CV): The ratio of the standard deviation to the mean, expressed as a percentage.
            
            
                CV = (Standard Deviation / Mean) * 100
                
                CV = (s / x̄) * 100
            
            
                - A lower CV means the data is more consistent, more stable, or less variable.
- A higher CV means the data is less consistent, less stable, or more variable.
                Exam Tip: A common question is: "Team A has a mean score of 80 with SD=5. Team B has a mean score of 50 with SD=4. Which team is more consistent?"
                
                - CV(A) = (5 / 80) * 100 = 6.25%
                
                - CV(B) = (4 / 50) * 100 = 8%
                
                - Answer: Team A is more consistent because its CV is lower.
            
        
        
        
            4. Moments (Raw and Central)
            Moments are a set of statistical parameters used to describe the characteristics (shape, center, spread) of a distribution.
            
            1. Raw Moments (μ'r) - Moments about Origin (Zero)
            The r-th raw moment is the arithmetic mean of the r-th power of the observations.
            
                μ'r = (Σ xr) / n   (Ungrouped)
                
                μ'r = (Σ f * xr) / N   (Grouped)
            
            
                - The first raw moment (r=1) is the Arithmetic Mean: μ'1 = x̄
2. Central Moments (μr) - Moments about the Mean
            The r-th central moment is the arithmetic mean of the r-th power of the deviations from the mean.
            
                μr = (Σ (x - x̄)r) / n   (Ungrouped)
                
                μr = (Σ f * (x - x̄)r) / N   (Grouped)
            
            
                - The first central moment (r=1) is always zero: μ1 = 0
- The second central moment (r=2) is the Variance: μ2 = σ²
        
            5. Measures of Skewness and Kurtosis
            These measures describe the shape of the distribution.
            
            Skewness (Asymmetry)
            Measures the asymmetry or lack of symmetry of a distribution.
            
                - Symmetrical Distribution: The "bell" shape is identical on both sides.
                    - Mean = Median = Mode
- Skewness = 0
 
- Positively Skewed (Skewed to the Right): The "tail" is longer on the right.
                    - Mean > Median > Mode
- Skewness > 0
 
- Negatively Skewed (Skewed to the Left): The "tail" is longer on the left.
                    - Mean < Median < Mode
- Skewness < 0
 
Moment-based Coefficient (β1):
            
                β1 = (μ3)² / (μ2)³
            
            (If β1 = 0, symmetrical. If β1 > 0, skewed.)
            
            Kurtosis (Peakedness)
            Measures the peakedness or flatness of a distribution compared to the standard Normal distribution.
            
                - Leptokurtic: More peaked, sharper peak, and heavier/fatter tails.
- Mesokurtic: The "normal" bell shape (like the Normal Distribution).
- Platykurtic: Flatter, more rounded peak, and lighter/thinner tails.
Moment-based Coefficient (β2):
            
                β2 = μ4 / (μ2)²
            
            
                - If β2 = 3, it is Mesokurtic (Normal).
- If β2 > 3, it is Leptokurtic (Leap/Peaked).
- If β2 < 3, it is Platykurtic (Flat/Plateau).