Unit 1: Introduction and Data Presentation

1. Introduction, Scope, and Limitations of Statistics
2. Concepts of Statistical Population and Sample
3. Primary and Secondary Data
4. Types of Data (Qualitative, Quantitative, etc.)
5. Scales of Measurement
6. Presentation of Data by Tables and Diagrams
7. Frequency Distribution, Histogram, and Frequency Polygon
8. Cumulative Frequency Curves (Ogives)

1. Introduction, Scope, and Limitations of Statistics

Introduction to Statistics

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

Scope of Statistics

Statistics is applied in nearly every field:

Business and Economics: Market research, quality control, financial analysis, forecasting.
Government: Policy making, census, public health.
Science: Medical studies, engineering, biology (biostatistics).
Social Sciences: Analyzing survey data, studying behavioral trends.

Limitations of Statistics

Deals with aggregates: Statistics provides insights about a group, not a single individual.
Deals with quantitative data: It cannot directly measure qualitative attributes like "honesty" or "beauty" (though they can be quantified).
Prone to misuse: Statistical results can be manipulated to present a biased view.
Statistical laws are not exact: They are true "on average" or in the long run.

Exam Tip: Be prepared to define statistics and list its five main functions (collection to interpretation). The limitations are a very common short-answer question.

2. Concepts of Statistical Population and Sample

Population

Population (or Universe): The complete set of all individuals, items, or objects of interest in a study.

Finite Population: A population with a countable number of units (e.g., students in a college).
Infinite Population: A population with an uncountable or theoretically infinite number of units (e.g., all possible outcomes of rolling a die).

Sample

Sample: A subset or a part of the population selected to represent the characteristics of the whole population.

We study samples because it is often too costly, time-consuming, or impossible to study the entire population.

Parameter vs. Statistic

Parameter: A numerical value summarizing a characteristic of the population (e.g., population mean μ, population standard deviation σ).
Statistic: A numerical value summarizing a characteristic of the sample (e.g., sample mean x̄, sample standard deviation s). We use statistics to estimate parameters.

Mnemonic: Parameter for Population. Statistic for Sample.

3. Primary and Secondary Data

1. Primary Data

Data collected for the first time by the researcher, specifically for the purpose of the study. It is original, raw data.

Methods: Surveys, personal interviews, experiments, direct observation.
Pros: Specific to the research needs, more reliable.
Cons: Expensive, time-consuming to collect.

2. Secondary Data

Data that has already been collected by someone else for some other purpose, but is used by the researcher for their current study.

Sources: Government publications (e.g., census), company records, journals, websites, books.
Pros: Cheaper and faster to obtain.
Cons: May not be perfectly suitable, data may be outdated or inaccurate.

4. Types of Data (Qualitative, Quantitative, etc.)

1. Qualitative (or Categorical) Data

Represents characteristics or attributes. It is non-numerical.

Example: Gender (Male, Female), Eye Color (Blue, Brown), Blood Type (A, B, O).

2. Quantitative (or Numerical) Data

Represents a measurable quantity. It is numerical.

a) Discrete Data

Data that can only take specific, distinct values (usually integers). It is "counted."

Example: Number of children in a family (0, 1, 2, ...).
Example: Number of cars sold by a dealer.

b) Continuous Data

Data that can take any value within a given range. It is "measured."

Example: Height (170.1 cm, 170.11 cm, ...).
Example: Temperature, time.

3. Time Series Data

Data collected on the same subject or variable over successive periods of time.

Example: Monthly sales of a company from 2020 to 2025.
Example: Daily stock price of Google for one year.

4. Ordinal Data

This is a specific type of data that has a natural order or ranking, but the differences between the ranks are not meaningful. It bridges qualitative and quantitative.

Example: Customer satisfaction (Poor, Fair, Good, Excellent).
Example: Educational level (High School, Bachelor's, Master's).

5. Scales of Measurement

These scales describe the properties of the data. They are hierarchical.

1. Nominal Scale

Data consists of categories or names only. There is no order.

Properties: Identity (e.g., 'Male' is different from 'Female').
Examples: Gender, Jersey Numbers, Zip Codes.

2. Ordinal Scale

Data can be categorized and these categories have a natural order or rank. The differences between ranks are not uniform.

Properties: Identity + Order.
Examples: Ranks in a competition (1st, 2nd, 3rd), education level, satisfaction ratings. (This is the "Ordinal Data" from the previous section).

3. Interval Scale

Data is numerical, ordered, and the differences between values are meaningful. However, there is no true zero point (zero is arbitrary).

Properties: Identity + Order + Meaningful Differences.
Examples: Temperature in Celsius/Fahrenheit (0°C doesn't mean "no heat"), IQ scores, calendar years.

4. Ratio Scale

The highest level. It has all the properties of the interval scale, plus a true zero point, which means "absence" of the quantity.

Properties: Identity + Order + Meaningful Differences + True Zero.
Examples: Height, weight, age, income, distance. (0 kg means "no weight").

Exam Tip: A classic question is "Differentiate between Interval and Ratio scales." The key answer is the true zero point.

6. Presentation of Data by Tables and Diagrams

Tabulation (Tables)

The systematic arrangement of data into rows and columns with a title and headings. A good table is clear, concise, and self-explanatory.

Main Parts of a Table:

Table Number & Title: What the table is about.
Headnote: Unit of measurement (e.g., "in '000s").
Stubs: Row headings.
Captions: Column headings.
Body: The numerical data.
Footnote & Source Note: Clarifications and source of data.

Diagrams

Visual representations of data. Used for qualitative or categorical data.

Bar Diagram (or Bar Chart): Uses rectangular bars (of equal width) to represent frequencies. The height of the bar is proportional to the frequency. There are gaps between bars.
Pie Diagram (or Pie Chart): A circle divided into sectors, where the angle of each sector is proportional to the frequency of that category.
- Angle = (Frequency / Total Frequency) * 360°

7. Frequency Distribution, Histogram, and Frequency Polygon

Frequency Distribution

A table that organizes data into classes (or groups) and shows the number of observations (frequency) that fall into each class. Used for quantitative data.

Histogram

A graphical representation of a continuous frequency distribution. It consists of adjacent rectangles.

The X-axis represents the class boundaries (to ensure no gaps).
The Y-axis represents the frequency.
The area of each rectangle is proportional to the frequency.
There are no gaps between the bars.

Histogram vs. Bar Chart: A Bar Chart is for categorical data (has gaps), while a Histogram is for continuous/grouped data (no gaps).

Frequency Polygon

A line graph representing a frequency distribution.

It is drawn by plotting the class marks (midpoints) on the X-axis against the frequencies on the Y-axis.
The points are then joined by straight lines.
The polygon is "closed" by joining the first and last points to hypothetical class marks at either end with zero frequency.

8. Cumulative Frequency Curves (Ogives)

An ogive (or cumulative frequency curve) is a graph of a cumulative frequency distribution. It is very useful for finding partition values like the median and quartiles graphically.

1. Less Than Ogive

Plotting: We plot Upper Class Boundaries on the X-axis against their corresponding "Less Than" Cumulative Frequencies on the Y-axis.
Shape: The curve rises from left to right, starting from 0 and ending at the total frequency (N).

2. More Than Ogive

Plotting: We plot Lower Class Boundaries on the X-axis against their corresponding "More Than" Cumulative Frequencies on the Y-axis.
Shape: The curve falls from left to right, starting from the total frequency (N) and ending at 0.

Finding the Median: The Median is the X-coordinate of the intersection point of the "Less Than" and "More Than" ogives.
Alternatively, on a "Less Than" ogive, find the N/2 value on the Y-axis, draw a horizontal line to the curve, and then a vertical line down to the X-axis. This X-value is the Median.