Unit 1: Statistical Methods

Table of Contents

1. Statistical Methods: Definition, Scope, and Limitations

Definition of Statistics

Statistics is a branch of science that deals with the collection, organization, presentation, analysis, and interpretation of data to make effective decisions.

Scope of Statistics

Statistics is used in almost every field:

Limitations of Statistics

Exam Tip: Be prepared to define statistics and list its five stages (Collection to Interpretation). The limitations are a very common short-answer question.

2. Concepts of Statistical Population and Sample

Population

Population (or Universe): The entire group of individuals, items, or objects of interest in a statistical study.

Sample

Sample: A subset or a part of the population selected to represent the characteristics of the whole population.

We study samples because it is often too costly, time-consuming, or impossible to study the entire population. The process of selecting a sample is called sampling.

Parameter vs. Statistic

This is a crucial distinction:

Mnemonic: Parameter for Population. Statistic for Sample.

3. Types of Data

Data can be broadly classified into two main types:

1. Qualitative (or Categorical) Data

Data that represents characteristics or attributes. It cannot be measured numerically but can be sorted into categories.

2. Quantitative (or Numerical) Data

Data that is numerical and represents a measurable quantity.

a) Discrete Data

Data that can only take specific, distinct values (usually integers). It is "counted." There are gaps between possible values.

b) Continuous Data

Data that can take any value within a given range. It is "measured." There are no gaps between possible values (though our measurements are limited by our tools).

Common Mistake: Don't confuse "discrete" with "finite." Shoe size (e.g., 7, 7.5, 8, 8.5) is discrete because it can only take specific values, not *any* value between 7 and 9. Money is also technically discrete (you can't have $10.502), but it's often treated as continuous due to the large number of possible values.

Other Data Classifications

a) Cross-Sectional Data

Data collected on different subjects (people, firms, countries) at the same point in time or over the same period.

b) Time Series Data

Data collected on the same subject or variable over a period of time, usually at regular intervals.


4. Scales of Measurement

These scales (or levels) describe the nature of information within the values assigned to variables. They are hierarchical (each level up adds more properties).

1. Nominal Scale

The simplest scale. Data consists of categories or names only. There is no natural order or ranking.

2. Ordinal Scale

Data can be categorized and these categories have a natural order or rank. However, the *differences* between the ranks are not meaningful or uniform.

3. Interval Scale

Data is numerical, ordered, and the differences between values are meaningful and uniform. However, there is no true zero point (zero is arbitrary and doesn't mean "absence").

4. Ratio Scale

The highest level of measurement. It has all the properties of the interval scale, plus a true zero point, which indicates the "absence" of the quantity.

Exam Tip: A classic question is "Differentiate between Interval and Ratio scales." The key answer is the true zero point. Ask yourself: "Does 0 mean the absence of the thing?" If yes, it's Ratio. If no, it's Interval.

5. Collection of Data

1. Primary Data

Data collected for the first time by the researcher, specifically for the purpose of the study. It is original, raw data.

Major Sources (Methods of Collection):

2. Secondary Data

Data that has already been collected by someone else for some other purpose, but is used by the researcher for their current study. It is "second-hand" data.

Major Sources:

Precautions in Using Secondary Data

Before using secondary data, you must check its:

  1. Reliability: Who collected the data? What was their reputation? Were the methods sound?
  2. Suitability: Does the data fit your research purpose? The original purpose might be different.
  3. Adequacy: Is the data sufficient for your study? Is the sample size large enough? Is it up-to-date?

6. Presentation of Data: Classification and Tabulation

After collection, raw data is unorganized and hard to understand. We must organize it.

Classification

The process of sorting data into groups or classes based on their common characteristics.

Tabulation

The systematic arrangement of classified data into rows and columns with a title and headings.

Main Parts of a Statistical Table:

  1. Table Number: For identification (e.g., "Table 1.1").
  2. Title: A clear and concise description of the table's contents.
  3. Headnote: A brief note below the title explaining the unit of measurement (e.g., "in '000s" or "in USD").
  4. Stubs: The headings for the rows (usually on the left).
  5. Captions: The headings for the columns.
  6. Body: The main part of the table containing the numerical data.
  7. Footnote: To clarify any specific item in the table.
  8. Source Note: To indicate the source of the data (especially for secondary data).

Example of a Table:

Table 1: Student Enrollment by Gender and Course, 2025
Course Male Female Total
Statistics 50 70 120
Economics 80 60 140
Total 130 130 260
Source: College Admission Records

7. Frequency Distributions and Graphical Representations

Frequency Distribution

A table that organizes data into classes (or groups) and shows the number of observations (frequency) that fall into each class.

1. Discrete Frequency Distribution

Used for discrete data. We list each distinct value and its corresponding frequency.

Example: Number of children in 20 families: 0, 1, 2, 2, 1, 3, 0, 1, 1, 2, 3, 2, 1, 0, 1, 2, 2, 1, 1, 0

Discrete Frequency Distribution
Number of Children (x) Tally Marks Frequency (f)
0 |||| 4
1 |||| ||| 8
2 |||| | 6
3 || 2
Total 20

2. Continuous Frequency Distribution

Used for continuous data (or discrete data with a wide range). Data is grouped into class intervals.

Graphical Representations

1. Histogram

A graph of a continuous frequency distribution. It consists of adjacent rectangles.

2. Frequency Polygon

A line graph representing a frequency distribution.

3. Frequency Curve

A smoothed version of a frequency polygon. It is drawn as a freehand curve through the points of a frequency polygon.

It gives a better idea of the shape of the distribution (e.g., normal, skewed).

Exam Tip: Know the key difference: A Histogram uses class *boundaries* on the X-axis and has adjacent bars. A Bar Graph (used for categorical data) uses class *names* and has gaps between the bars. A Frequency Polygon uses class *midpoints*.