Unit 3: Descriptive Statistics in Excel

Table of Contents

1. Introduction to Descriptive Statistics

Descriptive Statistics: A set of methods used to summarize and describe the main features of a dataset. It does not make inferences or predictions, it simply *describes* what the data shows.

It is typically broken down into two categories:

  1. Measures of Central Tendency: These find the "center" or "average" of the data (e.g., Mean, Median, Mode).
  2. Measures of Dispersion (or Variability): These describe the "spread" or "scatter" of the data (e.g., Variance, Standard Deviation, Range).

2. Measures of Central Tendency (Using Functions)

Excel has built-in functions for all major measures of central tendency. Assume your data is in the range A2:A101.

Mean (Arithmetic Average)

The sum of all values divided by the count of values.

=AVERAGE(A2:A101)

Interpretation: The "balancing point" of the data. It is sensitive to outliers.

Median

The middle value when the data is sorted. 50% of data is above it, 50% is below.

=MEDIAN(A2:A101)

Interpretation: The "true middle" of the data. It is not sensitive to outliers, making it a "robust" measure.

Mode

The value that occurs most frequently in the dataset.

=MODE.SNGL(A2:A101)

Interpretation: The most "typical" or "popular" value. A dataset can have no mode or multiple modes (=MODE.MULT).

Mean vs. Median: A key analysis step is to compare the mean and median.
  • If Mean ≈ Median: The data is likely symmetric.
  • If Mean > Median: The data is positively skewed (pulled right by high outliers).
  • If Mean < Median: The data is negatively skewed (pulled left by low outliers).

3. Measures of Dispersion (Using Functions)

These functions measure how spread out your data is. Assume data in A2:A101.

Range

The difference between the maximum and minimum values.

=MAX(A2:A101) - MIN(A2:A101)

Interpretation: The total spread of the data. Very sensitive to outliers.

Variance

The average of the squared differences from the Mean. It measures the average spread.

Sample vs. Population: Always use the .S (Sample) version unless you are 100% certain you have data for the *entire* population. The .S version divides by (n-1).
=VAR.S(A2:A101)
=VAR.P(A2:A101)

Interpretation: Hard to interpret directly because its units are squared (e.g., "squared dollars"). Its main use is to calculate the standard deviation.

Standard Deviation

The square root of the variance. This is the most common measure of spread, expressed in the same units as the data.

=STDEV.S(A2:A101)
=STDEV.P(A2:A101)

Interpretation: A "typical" or "average" distance of a data point from the mean.


4. Introduction to the Data Analysis ToolPak

The Analysis ToolPak is a powerful Excel add-in that can perform complex statistical analysis with a few clicks, generating a full report.

How to Enable the ToolPak (One-time setup):

  1. Go to File > Options.
  2. Click Add-ins on the left.
  3. At the bottom, in the Manage: box, select Excel Add-ins and click Go...
  4. Check the box for Analysis ToolPak and click OK.
  5. The Data Analysis button will now appear on your Data tab.

Using the "Descriptive Statistics" Tool:

  1. Go to the Data tab and click Data Analysis.
  2. Select Descriptive Statistics from the list and click OK.
  3. Input Range: Select your data column (e.g., A1:A101).
  4. Labels in first row: Check this box if you selected the header.
  5. Output Range: Select a single empty cell where you want the report to start.
  6. Check the box: Summary statistics.
  7. Click OK.

5. Interpretation of Descriptive Statistics

When the ToolPak generates its report, you must be able to interpret it.

Typical Data Analysis ToolPak Output
Statistic Interpretation
Mean The "average" value.
Standard Error A measure of how accurate the sample mean is as an estimate of the population mean. (Smaller is better).
Median The 50th percentile, or "middle" value. Not sensitive to outliers.
Mode The most frequent value. (#N/A means no mode).
Standard Deviation The average spread around the mean. (Use this to understand variability).
Sample Variance The square of the standard deviation.
Kurtosis Measures "peakedness". (This is Excess Kurtosis, so 0 is normal, >0 is peaked, <0 is flat).
Skewness Measures asymmetry. (0 is symmetric, >0 is positive skew, <0 is negative skew).
Range Total spread (Max - Min).
Minimum The smallest value in the dataset.
Maximum The largest value in the dataset.
Sum The total of all values.
Count The number of observations (n).

Key Analytical Questions: