Unit 1: Introduction and Data Presentation

Table of Contents

1. Introduction, Scope, and Limitations of Statistics

Introduction to Statistics

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

Scope of Statistics

Statistics is applied in nearly every field:

Limitations of Statistics

Exam Tip: Be prepared to define statistics and list its five main functions (collection to interpretation). The limitations are a very common short-answer question.

2. Concepts of Statistical Population and Sample

Population

Population (or Universe): The complete set of all individuals, items, or objects of interest in a study.

Sample

Sample: A subset or a part of the population selected to represent the characteristics of the whole population.

We study samples because it is often too costly, time-consuming, or impossible to study the entire population.

Parameter vs. Statistic

Mnemonic: Parameter for Population. Statistic for Sample.

3. Primary and Secondary Data

1. Primary Data

Data collected for the first time by the researcher, specifically for the purpose of the study. It is original, raw data.

2. Secondary Data

Data that has already been collected by someone else for some other purpose, but is used by the researcher for their current study.


4. Types of Data (Qualitative, Quantitative, etc.)

1. Qualitative (or Categorical) Data

Represents characteristics or attributes. It is non-numerical.

2. Quantitative (or Numerical) Data

Represents a measurable quantity. It is numerical.

a) Discrete Data

Data that can only take specific, distinct values (usually integers). It is "counted."

b) Continuous Data

Data that can take any value within a given range. It is "measured."

3. Time Series Data

Data collected on the same subject or variable over successive periods of time.

4. Ordinal Data

This is a specific type of data that has a natural order or ranking, but the differences between the ranks are not meaningful. It bridges qualitative and quantitative.


5. Scales of Measurement

These scales describe the properties of the data. They are hierarchical.

1. Nominal Scale

Data consists of categories or names only. There is no order.

2. Ordinal Scale

Data can be categorized and these categories have a natural order or rank. The differences between ranks are not uniform.

3. Interval Scale

Data is numerical, ordered, and the differences between values are meaningful. However, there is no true zero point (zero is arbitrary).

4. Ratio Scale

The highest level. It has all the properties of the interval scale, plus a true zero point, which means "absence" of the quantity.

Exam Tip: A classic question is "Differentiate between Interval and Ratio scales." The key answer is the true zero point.

6. Presentation of Data by Tables and Diagrams

Tabulation (Tables)

The systematic arrangement of data into rows and columns with a title and headings. A good table is clear, concise, and self-explanatory.

Main Parts of a Table:

  1. Table Number & Title: What the table is about.
  2. Headnote: Unit of measurement (e.g., "in '000s").
  3. Stubs: Row headings.
  4. Captions: Column headings.
  5. Body: The numerical data.
  6. Footnote & Source Note: Clarifications and source of data.

Diagrams

Visual representations of data. Used for qualitative or categorical data.


7. Frequency Distribution, Histogram, and Frequency Polygon

Frequency Distribution

A table that organizes data into classes (or groups) and shows the number of observations (frequency) that fall into each class. Used for quantitative data.

Histogram

A graphical representation of a continuous frequency distribution. It consists of adjacent rectangles.

Histogram vs. Bar Chart: A Bar Chart is for categorical data (has gaps), while a Histogram is for continuous/grouped data (no gaps).

Frequency Polygon

A line graph representing a frequency distribution.


8. Cumulative Frequency Curves (Ogives)

An ogive (or cumulative frequency curve) is a graph of a cumulative frequency distribution. It is very useful for finding partition values like the median and quartiles graphically.

1. Less Than Ogive

2. More Than Ogive

Finding the Median: The Median is the X-coordinate of the intersection point of the "Less Than" and "More Than" ogives.
Alternatively, on a "Less Than" ogive, find the N/2 value on the Y-axis, draw a horizontal line to the curve, and then a vertical line down to the X-axis. This X-value is the Median.