Unit 1: Introduction to Excel for Data Analysis

Table of Contents

1. Introduction to Microsoft Excel and its Features

What is Excel?

Microsoft Excel is a spreadsheet program used to record, organize, analyze, and visualize quantitative data. It is a fundamental tool for data analysis.

Key Features and Components:


2. Basic Data Entry and Editing Techniques

Data Entry

Editing Techniques


3. Importing and Exporting Data

Importing Data

Often, your data is in an external file, like a .csv (Comma Separated Values) or text file.

  1. Go to the Data tab.
  2. In the Get & Transform Data group, select From Text/CSV.
  3. Browse for your file and click Import.
  4. A preview window appears. Check the Delimiter (e.g., Comma, Tab) to ensure Excel is splitting the columns correctly.
  5. Click Load.

Exporting Data

You can save your Excel sheet in other formats.

  1. Go to File > Save As.
  2. Choose a location.
  3. In the "Save as type" dropdown menu, select the format you need (e.g., CSV (Comma delimited), PDF).
  4. Click Save.

4. Sorting and Filtering Data

Sorting

Arranging your data in a specific order (e.g., A-Z, Smallest to Largest).

  1. Select your entire data range (including headers).
  2. Go to the Data tab > Sort & Filter group.
  3. Click the Sort button for multi-level sorting (e.g., sort by 'State', then by 'City').
  4. Check "My data has headers".
  5. Choose the column to sort by, the sort criteria (Values, Color, etc.), and the order (A-Z).
Common Pitfall: Always select *all* your data before sorting a single column, or you will scramble your records!

Filtering

Hiding rows that do not meet your criteria, allowing you to focus on a subset of data.

  1. Select your data range (or just the header row).
  2. Go to the Data tab > Sort & Filter group.
  3. Click the Filter button. Dropdown arrows will appear on your headers.
  4. Click the arrow on a header to see filter options. You can uncheck values you don't want to see, or use Number Filters (e.g., "Greater Than", "Top 10").

5. Removing Duplicates

This tool permanently deletes rows that are identical based on the columns you select.

  1. Select the data range.
  2. Go to the Data tab > Data Tools group.
  3. Click Remove Duplicates.
  4. A dialog box appears. Check the columns you want Excel to use to identify a duplicate row. (e.g., if you only check 'Email', it will remove any row with the same email, even if the names are different).
  5. Click OK. Excel will report how many duplicates were found and removed.

6. Data Validation

A tool to restrict what can be entered into a cell, ensuring data integrity and preventing errors.

  1. Select the cell(s) where you want to apply the rule.
  2. Go to the Data tab > Data Tools group.
  3. Click Data Validation.
  4. In the Settings tab:
    • Allow: Choose the data type (e.g., Whole number, List, Date).
    • Data: Set the criteria (e.g., "between", "greater than").
    • Example: To allow only 'Male' or 'Female' in a cell, choose Allow: List and in Source: type Male,Female.
  5. (Optional) In the Input Message tab, you can write a prompt for the user.
  6. (Optional) In the Error Alert tab, you can write a custom error message if invalid data is entered.

7. Handling Missing Observations

Real-world data is often incomplete ("missing"). We cannot simply run calculations on cells that are blank or contain errors (like #N/A).

Methods for Handling Missing Data:

  1. Find Blank Cells:
    • Select your data column.
    • Go to Home > Find & Select > Go To Special...
    • Select Blanks and click OK. All blank cells will be highlighted. You can then color them or enter a value.
  2. Deletion:
    • Listwise Deletion: Delete the entire row. This is easy but can remove a lot of good data.
  3. Imputation (Replacing the value):
    • Mean/Median Imputation: Replace the missing value with the mean or median of the entire column. This is a common statistical technique.
      1. Calculate the mean of the column (e.g., =AVERAGE(A2:A100)).
      2. Use the "Go To Special... Blanks" method above.
      3. Type the mean value (e.g., 50.5) and press Ctrl+Enter to fill all highlighted blank cells at once.
  4. Using Functions that Ignore Blanks:
    • Functions like AVERAGE, SUM, MEDIAN, STDEV.S, etc., automatically ignore blank cells.
    • If your "missing" data is text (e.g., "NA"), these functions will fail. Use AVERAGEA (which treats text as 0) or filter/replace "NA" first.