Unit 5: Bivariate Analysis, Pivot Tables & Vlookup

Table of Contents

1. Introduction to Bivariate Analysis

Bivariate Analysis: The analysis of two variables (bi-variate) to determine the empirical relationship between them.

It helps us answer questions like:

The main tools for this are Correlation (to measure strength) and Regression (to model the relationship).


2. Correlation Coefficient in Excel (Karl Pearson's)

The Pearson correlation coefficient (r) measures the strength and direction of a *linear* relationship between two quantitative variables. It ranges from -1 to +1.

How to Calculate in Excel:

Assume X-data (e.g., 'Study Hours') is in A2:A51 and Y-data (e.g., 'Exam Score') is in B2:B51.

Method 1: CORREL Function

=CORREL(A2:A51, B2:B51)

Method 2: Data Analysis ToolPak

  1. Go to Data > Data Analysis.
  2. Select Correlation and click OK.
  3. Input Range: Select *both* columns at once (A1:B51, including headers).
  4. Check Labels in first row.
  5. Select an Output Range and click OK.

This will produce a correlation matrix, which is useful if you have many variables.

Interpretation:


3. Simple Linear Regression in Excel

This finds the best-fitting straight line (y = a + bx) to describe the relationship. This is the most powerful tool in the Data Analysis ToolPak.

How to Perform Regression Analysis:

  1. Go to Data > Data Analysis.
  2. Select Regression and click OK.
  3. Input Y Range: Select your dependent variable (e.g., 'Sales' in B1:B51).
  4. Input X Range: Select your independent variable (e.g., 'Advertising' in A1:A51).
  5. Check Labels if you included the headers.
  6. Select an Output Range.
  7. (Optional but recommended) Check Residuals and Residual Plots to check assumptions (see Unit 4).
  8. Click OK.

Excel will produce a detailed "Summary Output" table.


4. Interpretation of Regression Output

The "Summary Output" table from the Regression tool is the most important part of the analysis.

Part 1: Regression Statistics

R Square This is your value (see Unit 4). Interpretation: "85% of the variation in Sales is explained by Advertising."
Adjusted R Square A modified R² used in *multiple* regression (with many X variables). For simple regression, just use R Square.

Part 2: ANOVA Table

This table tests the overall significance of the model.

Significance F This is the p-value for the whole model. If Significance F < 0.05, it means your model is statistically significant (i.e., it's not useless).

Part 3: Coefficients Table (The Most Important Part)

This table gives you the equation of your line.

Coefficients P-value
Intercept 1000 0.002
X Variable (e.g., 'Advertising') 50 0.0001

Interpretation of Coefficients Table:


5. Introduction to Pivot Tables

Pivot Table: An interactive tool to quickly summarize, "pivot," and analyze large amounts of data. It lets you slice and dice data without using complex formulas.

Applications in Data Analysis:

How to Create a Pivot Table:

  1. Select your entire data range (including headers).
  2. Go to the Insert tab and click PivotTable.
  3. Confirm your data range and choose "New Worksheet". Click OK.
  4. A "PivotTable Fields" pane appears on the right.
  5. Drag and drop your fields into the four areas:
    • ROWS: What you want to group by (e.g., drag 'Region' here).
    • COLUMNS: A second way to group (e.g., drag 'Product Category' here).
    • VALUES: The number you want to aggregate (e.g., drag 'Sales' here). It will default to Sum of Sales. You can click it to change to Average, Count, etc.
    • FILTERS: A top-level filter for the whole report (e.g., drag 'Year' here).

The table automatically updates as you drag and drop fields, allowing for rapid, interactive analysis.


6. Using Vlookup to Join Datasets

VLOOKUP (Vertical Lookup): An Excel function that searches for a value in the first column of a table and returns a corresponding value from a different column in the same row.

It is the "join" of Excel, used to combine data from two different tables that share a common key.

Scenario:
- Table 1 (on Sheet1): StudentID (A), SaleAmount (B)
- Table 2 (on Sheet2): StudentID (A), StudentName (B), Region (C)
Goal: Add the 'StudentName' and 'Region' to Table 1.

Formula Syntax:

=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])

Example:

In cell C2 of Sheet1 (to get the name):

=VLOOKUP(A2, Sheet2!$A:$C, 2, FALSE)

In cell D2 of Sheet1 (to get the region):

=VLOOKUP(A2, Sheet2!$A:$C, 3, FALSE)

You can then drag these formulas down for all rows.