Unit 1: Data Representation

Table of Contents

1. Concepts of Statistical Population and Sample

Population

Population (or Universe): The entire group of individuals, items, or objects of interest in a statistical study.

Sample

Sample: A subset or a part of the population selected to represent the characteristics of the whole population.

We study samples because it is often too costly, time-consuming, or impossible to study the entire population. The process of selecting a sample is called sampling.


2. Primary and Secondary Data

1. Primary Data

Data collected for the first time by the researcher, specifically for the purpose of the study. It is original, raw data.

2. Secondary Data

Data that has already been collected by someone else for some other purpose, but is used by the researcher for their current study.


3. Qualitative and Quantitative Data

1. Qualitative (or Categorical) Data

Data that represents characteristics, attributes, or qualities. It cannot be measured numerically but can be sorted into categories.

2. Quantitative (or Numerical) Data

Data that is numerical and represents a measurable quantity. It answers questions of "how much" or "how many."

a) Discrete Data

Data that can only take specific, distinct values (usually integers). It is "counted." There are gaps between possible values.

b) Continuous Data

Data that can take any value within a given range. It is "measured." There are no gaps between possible values (though our measurements are limited by our tools).


4. Scales of Measurement

These scales (or levels) describe the nature of information within the values assigned to variables. They are hierarchical (each level up adds more properties).

1. Nominal Scale

The simplest scale. Data consists of categories or names only. There is no natural order or ranking.

2. Ordinal Scale

Data can be categorized and these categories have a natural order or rank. However, the *differences* between the ranks are not meaningful or uniform.

3. Interval Scale

Data is numerical, ordered, and the differences between values are meaningful and uniform. However, there is no true zero point (zero is arbitrary and doesn't mean "absence").

4. Ratio Scale

The highest level of measurement. It has all the properties of the interval scale, plus a true zero point, which indicates the "absence" of the quantity.

Exam Tip: A classic question is "Differentiate between Interval and Ratio scales." The key answer is the true zero point. Ask yourself: "Does 0 mean the absence of the thing?" If yes, it's Ratio. If no, it's Interval.

5. Presentation of Data by Tables and Diagrams

After collection, raw data is unorganized. We must organize it for clarity.

Tabulation (Tables)

The systematic arrangement of classified data into rows and columns with a title and headings.

Example of a Table:

Table 1.1: Student Enrollment by Course, 2025
Course Number of Students
Statistics 120
Economics 140
History 90
Total 350
Source: College Admission Records

Diagrams

Visual representations of data. Used for qualitative or categorical data.


6. Frequency Distributions

A table that organizes data into classes (or groups) and shows the number of observations (frequency) that fall into each class.

1. Discrete Frequency Distribution

Used for discrete data. We list each distinct value and its corresponding frequency.

Discrete Frequency Distribution (No. of children in 20 families)
Number of Children (x) Frequency (f)
0 4
1 8
2 6
3 2
Total 20

2. Continuous Frequency Distribution

Used for continuous data. Data is grouped into class intervals.

3. Cumulative Frequency Distribution

Shows the total frequency *up to* or *more than* a certain class boundary.


7. Graphical Representation (Histogram, Frequency Polygon, Ogive)

1. Histogram

A graph of a continuous frequency distribution. It consists of adjacent rectangles.

2. Frequency Polygon

A line graph representing a frequency distribution.

3. Ogives (Cumulative Frequency Curves)

An ogive is a graph of a cumulative frequency distribution. It is used to find partition values like the median and quartiles graphically.

Finding the Median: The Median is the X-coordinate of the intersection point of the "Less Than" and "More Than" ogives.