Unit 5: Bivariate Analysis, Pivot Tables & Vlookup
        
        
            1. Introduction to Bivariate Analysis
            
                Bivariate Analysis: The analysis of two variables (bi-variate) to determine the empirical relationship between them.
            
            It helps us answer questions like:
            
                - Is there a relationship between Advertising (X) and Sales (Y)?
- How strong is the relationship?
- What is the direction of the relationship (positive or negative)?
The main tools for this are Correlation (to measure strength) and Regression (to model the relationship).
        
        
        
            2. Correlation Coefficient in Excel (Karl Pearson's)
            The Pearson correlation coefficient (r) measures the strength and direction of a *linear* relationship between two quantitative variables. It ranges from -1 to +1.
            
            How to Calculate in Excel:
            Assume X-data (e.g., 'Study Hours') is in A2:A51 and Y-data (e.g., 'Exam Score') is in B2:B51.
            
            Method 1: CORREL Function
            =CORREL(A2:A51, B2:B51)
            
            Method 2: Data Analysis ToolPak
            
                - Go to Data > Data Analysis.
- Select Correlation and click OK.
- Input Range: Select *both* columns at once (A1:B51, including headers).
- Check Labels in first row.
- Select an Output Range and click OK.
This will produce a correlation matrix, which is useful if you have many variables.
            
            Interpretation:
            
                - r ≈ +1: Strong positive linear relationship.
- r ≈ -1: Strong negative linear relationship.
- r ≈ 0: Very weak or no *linear* relationship.
        
            3. Simple Linear Regression in Excel
            This finds the best-fitting straight line (y = a + bx) to describe the relationship. This is the most powerful tool in the Data Analysis ToolPak.
            
            How to Perform Regression Analysis:
            
                - Go to Data > Data Analysis.
- Select Regression and click OK.
- Input Y Range: Select your dependent variable (e.g., 'Sales' in B1:B51).
- Input X Range: Select your independent variable (e.g., 'Advertising' in A1:A51).
- Check Labels if you included the headers.
- Select an Output Range.
- (Optional but recommended) Check Residuals and Residual Plots to check assumptions (see Unit 4).
- Click OK.
Excel will produce a detailed "Summary Output" table.
        
        
        
            4. Interpretation of Regression Output
            The "Summary Output" table from the Regression tool is the most important part of the analysis.
            
            Part 1: Regression Statistics
            
                
                    | R Square | This is your R² value (see Unit 4). Interpretation: "85% of the variation in Sales is explained by Advertising." | 
                
                    | Adjusted R Square | A modified R² used in *multiple* regression (with many X variables). For simple regression, just use R Square. | 
            
            Part 2: ANOVA Table
            This table tests the overall significance of the model.
            
                
                    | Significance F | This is the p-value for the whole model. If Significance F < 0.05, it means your model is statistically significant (i.e., it's not useless). | 
            
            Part 3: Coefficients Table (The Most Important Part)
            This table gives you the equation of your line.
            
                
                    
                        |  | Coefficients | P-value | 
                
                
                    
                        | Intercept | 1000 | 0.002 | 
                    
                        | X Variable (e.g., 'Advertising') | 50 | 0.0001 | 
                
            
            
            Interpretation of Coefficients Table:
            
                - Coefficients: These give you the equation.
                    
                        - The Intercept value (1000) is your 'a'.
- The X Variable value (50) is your slope 'b'.
- Equation: Sales = 1000 + 50 * Advertising
 
- P-value: This tells you if a specific coefficient is statistically significant.
                    
                        - Rule: If P-value < 0.05, the variable is significant.
- Intercept P-value (0.002): This is < 0.05, so the intercept is significant.
- Advertising P-value (0.0001): This is < 0.05, so 'Advertising' is a significant predictor of 'Sales'. If this value was large (e.g., 0.30), it would mean Advertising has no statistically significant effect on Sales.
 
        
            5. Introduction to Pivot Tables
            
                Pivot Table: An interactive tool to quickly summarize, "pivot," and analyze large amounts of data. It lets you slice and dice data without using complex formulas.
            
            
            Applications in Data Analysis:
            
                - Summarizing sales by region, salesperson, and product.
- Counting frequencies of categorical variables.
- Finding the average (or sum, max, min) for different groups.
How to Create a Pivot Table:
            
                - Select your entire data range (including headers).
- Go to the Insert tab and click PivotTable.
- Confirm your data range and choose "New Worksheet". Click OK.
- A "PivotTable Fields" pane appears on the right.
- Drag and drop your fields into the four areas:
                    
                        - ROWS: What you want to group by (e.g., drag 'Region' here).
- COLUMNS: A second way to group (e.g., drag 'Product Category' here).
- VALUES: The number you want to aggregate (e.g., drag 'Sales' here). It will default to Sum of Sales. You can click it to change toAverage,Count, etc.
- FILTERS: A top-level filter for the whole report (e.g., drag 'Year' here).
 
The table automatically updates as you drag and drop fields, allowing for rapid, interactive analysis.
        
        
        
            6. Using Vlookup to Join Datasets
            
                VLOOKUP (Vertical Lookup): An Excel function that searches for a value in the first column of a table and returns a corresponding value from a different column in the same row.
            
            It is the "join" of Excel, used to combine data from two different tables that share a common key.
            
            Scenario:
            
            - Table 1 (on Sheet1): StudentID (A), SaleAmount (B)
            
            - Table 2 (on Sheet2): StudentID (A), StudentName (B), Region (C)
            
            Goal: Add the 'StudentName' and 'Region' to Table 1.
            
            Formula Syntax:
            
                =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
            
            
                - lookup_value: The value you are looking for (e.g., cell A2in Sheet1, which contains the StudentID).
- table_array: The *entire* second table you are searching in (e.g., Sheet2!$A:$C). Important: Use dollar signs ($) to "lock" the range so it doesn't move when you copy the formula down.
- col_index_num: The column number in Table 2 that has the data you want to *return*. (In our example, StudentNameis column 2 andRegionis column 3).
- [range_lookup]: Always use FALSE or 0. This forces Excel to find an exact match.
Example:
            In cell C2 of Sheet1 (to get the name):
            
                =VLOOKUP(A2, Sheet2!$A:$C, 2, FALSE)
            
            In cell D2 of Sheet1 (to get the region):
            
                =VLOOKUP(A2, Sheet2!$A:$C, 3, FALSE)
            
            You can then drag these formulas down for all rows.