Unit 1: Data Representation

1. Concepts of Statistical Population and Sample
2. Primary and Secondary Data
3. Qualitative and Quantitative Data
4. Scales of Measurement
5. Presentation of Data by Tables and Diagrams
6. Frequency Distributions
7. Graphical Representation (Histogram, Frequency Polygon, Ogive)

1. Concepts of Statistical Population and Sample

Population

Population (or Universe): The entire group of individuals, items, or objects of interest in a statistical study.

Finite Population: A population where the number of units is countable and finite.
- Example: Number of students in your college.
Infinite Population: A population where the number of units is theoretically infinite or so large that it is considered infinite.
- Example: The set of all possible outcomes of rolling a die repeatedly.

Sample

Sample: A subset or a part of the population selected to represent the characteristics of the whole population.

We study samples because it is often too costly, time-consuming, or impossible to study the entire population. The process of selecting a sample is called sampling.

Example: To find the average height of all students in your college (population), you select 100 students (sample) and measure their heights.

2. Primary and Secondary Data

1. Primary Data

Data collected for the first time by the researcher, specifically for the purpose of the study. It is original, raw data.

Methods: Surveys, personal interviews, experiments, direct observation.
Pros: Specific to the research needs, more reliable and accurate.
Cons: Very expensive and time-consuming to collect.

2. Secondary Data

Data that has already been collected by someone else for some other purpose, but is used by the researcher for their current study.

Sources: Government publications (e.g., census reports), company records, journals, websites, books.
Pros: Cheaper and faster to obtain.
Cons: May not be perfectly suitable, data may be outdated or inaccurate. One must check its reliability and suitability.

3. Qualitative and Quantitative Data

1. Qualitative (or Categorical) Data

Data that represents characteristics, attributes, or qualities. It cannot be measured numerically but can be sorted into categories.

Example: Gender (Male, Female, Other)
Example: Eye Color (Blue, Brown, Green)
Example: Blood Type (A, B, AB, O)

2. Quantitative (or Numerical) Data

Data that is numerical and represents a measurable quantity. It answers questions of "how much" or "how many."

a) Discrete Data

Data that can only take specific, distinct values (usually integers). It is "counted." There are gaps between possible values.

Example: Number of children in a family (0, 1, 2, 3... but not 2.5)
Example: Number of cars passing a toll booth in an hour.

b) Continuous Data

Data that can take any value within a given range. It is "measured." There are no gaps between possible values (though our measurements are limited by our tools).

Example: Height of a student (e.g., 170.5 cm, 170.51 cm...)
Example: Temperature of a room.

4. Scales of Measurement

These scales (or levels) describe the nature of information within the values assigned to variables. They are hierarchical (each level up adds more properties).

1. Nominal Scale

The simplest scale. Data consists of categories or names only. There is no natural order or ranking.

Properties: Identity (e.g., 'Male' is different from 'Female').
Operations: Counting (frequency), finding the mode.
Examples: Jersey numbers, zip codes, eye color, marital status.

2. Ordinal Scale

Data can be categorized and these categories have a natural order or rank. However, the *differences* between the ranks are not meaningful or uniform.

Properties: Identity + Order.
Operations: Count, mode, median, rank correlation.
Examples: Customer satisfaction (Poor, Fair, Good, Excellent), educational level (High School, Bachelor's, Master's), grades (A, B, C).

3. Interval Scale

Data is numerical, ordered, and the differences between values are meaningful and uniform. However, there is no true zero point (zero is arbitrary and doesn't mean "absence").

Properties: Identity + Order + Meaningful Differences.
Operations: Count, mode, median, mean, standard deviation. Addition and subtraction are meaningful.
Examples: Temperature in Celsius or Fahrenheit (0°C doesn't mean "no heat"), calendar years (Year 0 is arbitrary), IQ scores.

4. Ratio Scale

The highest level of measurement. It has all the properties of the interval scale, plus a true zero point, which indicates the "absence" of the quantity.

Properties: Identity + Order + Meaningful Differences + True Zero.
Operations: All statistical operations, including multiplication and division (we can form ratios).
Examples: Height, weight, age, income, distance. (0 kg means "no weight," $0 means "no money").

Exam Tip: A classic question is "Differentiate between Interval and Ratio scales." The key answer is the true zero point. Ask yourself: "Does 0 mean the absence of the thing?" If yes, it's Ratio. If no, it's Interval.

5. Presentation of Data by Tables and Diagrams

After collection, raw data is unorganized. We must organize it for clarity.

Tabulation (Tables)

The systematic arrangement of classified data into rows and columns with a title and headings.

Example of a Table:

Table 1.1: Student Enrollment by Course, 2025
Course	Number of Students
Statistics	120
Economics	140
History	90
Total	350
Source: College Admission Records

Diagrams

Visual representations of data. Used for qualitative or categorical data.

Bar Diagram (or Bar Chart): Uses rectangular bars (of equal width) to represent frequencies. The height of the bar is proportional to the frequency. There are gaps between bars.
Pie Diagram (or Pie Chart): A circle divided into sectors, where the angle of each sector is proportional to the frequency of that category.
- Angle = (Frequency / Total Frequency) * 360°
Example: For Statistics in Table 1.1, Angle = (120 / 350) * 360° ≈ 123.4°

6. Frequency Distributions

A table that organizes data into classes (or groups) and shows the number of observations (frequency) that fall into each class.

1. Discrete Frequency Distribution

Used for discrete data. We list each distinct value and its corresponding frequency.

Discrete Frequency Distribution (No. of children in 20 families)
Number of Children (x)	Frequency (f)
0	4
1	8
2	6
3	2
Total	20

2. Continuous Frequency Distribution

Used for continuous data. Data is grouped into class intervals.

Class Limits: The lowest (Lower Limit) and highest (Upper Limit) values a class can have. (e.g., 10-19).
Exclusive Method: Class intervals like 10-20, 20-30, etc. The upper limit (20) is excluded from the first class and included in the next. This is the preferred method for continuous data.
Inclusive Method: Class intervals like 10-19, 20-29, etc. Both limits are included. To make it continuous for graphing, we must find Class Boundaries (e.g., 9.5 - 19.5, 19.5 - 29.5).
Class Mark (Midpoint): (Lower Limit + Upper Limit) / 2.

3. Cumulative Frequency Distribution

Shows the total frequency *up to* or *more than* a certain class boundary.

Less Than: Sum of frequencies from the top. (e.g., "Less than 20", "Less than 30").
More Than: Sum of frequencies from the bottom. (e.g., "More than 10", "More than 20").

7. Graphical Representation (Histogram, Frequency Polygon, Ogive)

1. Histogram

A graph of a continuous frequency distribution. It consists of adjacent rectangles.

The X-axis represents the class boundaries (using the exclusive method or boundaries from the inclusive method).
The Y-axis represents the frequency.
There are no gaps between the bars.

2. Frequency Polygon

A line graph representing a frequency distribution.

It is drawn by plotting the class marks (midpoints) on the X-axis against the frequencies on the Y-axis.
The points are then joined by straight lines.
The polygon is "closed" by joining the first and last points to hypothetical class marks at either end with zero frequency.

3. Ogives (Cumulative Frequency Curves)

An ogive is a graph of a cumulative frequency distribution. It is used to find partition values like the median and quartiles graphically.

Less Than Ogive:
- Plot Upper Class Boundaries on the X-axis.
- Plot "Less Than" Cumulative Frequencies on the Y-axis.
- The curve rises from left to right.
More Than Ogive:
- Plot Lower Class Boundaries on the X-axis.
- Plot "More Than" Cumulative Frequencies on the Y-axis.
- The curve falls from left to right.

Finding the Median: The Median is the X-coordinate of the intersection point of the "Less Than" and "More Than" ogives.