Clean, Prepare and Transform Data for Analysis

In this Tableau Academy course you'll look at how you can use Tableau Prep to load and combine data from multiple sources and then clean, prepare and transform it into the required data model ready for your analysis. You'll also learn how these steps can then be repeated and performed automatically to save you time and reduce manual effort and errors.

Course Type:
Data Preparation
Updated:
January 2022
Instructor:
Alan Murray
Course Length:
2h

WHAT YOU'LL LEARN

  • What Tableau Prep Builder is and how to get it
  • Review data sources to identify data prep requirements
  • Combining data from multiple sources
  • Quick and easy data cleaning in Tableau Prep
  • Splitting data out into separate fields
  • Using custom calculations to transform data
  • How to use fuzzy matching and lookups in Tableau Prep
  • Correcting common spelling errors and grouping data
  • Using joins to add additional data
  • Investigating and resolving issues with your data
  • Using wildcards to filter out records
  • Automating data preparation into one click
  • Using joins, aggregations and filters to transform data
  • Outputting and connecting to your cleaned data.

COURSE PREVIEW

COURSE DESCRIPTION

A key part of any data analysis or visualisation project is getting the data you need for the project into a state where it's fit for analysis or visualisation. Creating outputs based on inaccurate data is a leading cause for dis-satisfaction for end users and no matter how good the final report, dashboard or visualisation looks it's unlikely to be used if the data isn't trusted and of a high quality.

In this course we take you step by step through a project to collect and then clean, prepare and transform the data we need for a project including:

  • Combing data from multiple csv files, Excel Sheets or databases into a single consistent table.
  • Cleaning up date formats and removing spaces, currency symbols and characters from numerical fields.
  • Identifying duplicated records and investigation and resolving data quality issues.
  • Automatically correcting spelling errors or typo's in names, countries and other text fields.
  • Automatically combining, or splitting out, data fields to meet the needs of your project.
  • Pivoting and aggregating data into the most appropriate format and size for your analysis.
  • Removing unrequired records and data fields to size the data appropriately for your needs.
  • Transforming and re-distributing your data without changing the underlying data source.
  • Automating your steps into a single click that can be run whenever your data is updated.

We'll do all of this in Tableau Prep Builder which brings the same easy to use drag and drop functionality from Tableau Desktop to your data preparation tasks and it's all included within your Tableau license!

Save yourself the hard work and headache of trying to wrestle and wrangle your data into the format you need it and reduce the amount of manual effort in trying to spot and correct issues with your data by letting Tableau Prep take care of it for you.

In the course we'll introduce you to all of the functionality of Tableau Prep by showing you real hands on practical applications of using input steps, data cleansing steps, aggregation, pivots and transformations.

Once you've completed the course you'll have complete confidence in your abilities to make the most of Tableau Prep and you'll never resort to manually cleaning your data in Excel ever again!

The Course Project

This course follows a real life project to give you hands on experience of cleaning, preparing and transforming your data. In this project we need to create a report that shows how much of our employees time is being spent on specific projects each month within our organisation and also show how much this equates to in terms of our employee costs.

First we'll need to combine timesheet data from multiple teams, countries and regions which shows how much time each person has booked to each project each month. These come from a variety of sources and aren't consistent so we'll look at how we can combine and clean these into a single consistent table.

A lot of the data is manually entered so we'll also look at dealing with typos, differences in spellings and removing unnecessary data from date and numerical fields as well as splitting out data that is combined into single columns and grouping data together.

An additional challenge in this project is that within the timesheet data employees also book time they spend on admin, meetings, annual leave etc. that is not directly attributed to a project and this appears as a separate project within the timesheet booking system. We'll need to find a way of distributing this time across the projects each person works on as their time and costs need to add up to 100% of their time/costs for the report. This gives us the opportunity to look at transforming, calculating and distributing data without changing the underlying data source.

Finally, we'll need to clean, filter and add in additional data from our HR files as well as employee costs to be able to complete our final report.

The report will be run at least monthly and may also need to be updated within the month at short notice so we also show you how to automate theses steps so they can be run on demand at the click of a button!

COURSE CURRICULUM

WHAT YOU'LL BUILD

The images below show what you'll build in this project, click any of the images for a closer look.

ONCE YOU'VE COMPLETED THIS COURSE YOU WILL HAVE

  • Been introduced to Tableau Prep
  • Connected to Excel files in Tableau Prep
  • Reviewed data sources to determine preparation steps
  • Combined data from multiple sources into a single table
  • Used an input step to load data
  • Used a union step to combine data
  • Used a wildcard untion to combine data from multiple sheets
  • Renamed a Tableau Prep step &¬†added a description
  • Connected to a text/csv file in Tableau Prep
  • Used a clean step to remove text from numbers
  • Used a clean step to remove spaces from numbers
  • Changed the data type of a field
  • Edited field names
  • Created a new (duplicated) field
  • Used split values/custom split to separate data into columns
  • Used a custom calculation to convert date in text format into a date field
  • Used the group values feature to remove spelling mistakes and typos and clean and match values
  • Used the group values manually function to group countries into regions
  • Used a join step to add additional data to your table
  • Manually corrected spelling mistakes and typos
  • Used the matches panel to see where there are issues with join or missing/duplicate records
  • Used the review changes panel to see/edit your changes
  • Manually selected fields to join data on
  • Used the summary of join results panel to check data has been correctly joined
  • Used a data aggregation step
  • Used filters to remove unrequired data
  • Used a calculated field to concatenate data
  • Investigated and resolved issues with your data
  • Identified none unique records
  • Used a wildcard match filter to remove records containing a given word or string
  • Used an output step to export your data
  • Seen how Tableau Prep steps can be automated and run with one click when data is refreshed
  • Aggregated data into a pivot table
  • Used joins, aggregations and filters to transform and re-allocate your data
  • Removed fields that aren't needed in your output
  • Re-ordered your columns for ease of use
  • Reviewed data output and export options
  • Reviewed options to overwrite or append data on output
  • Exported data to a .hyper Tableau Extract file
  • Exported data to an Excel file
  • Saved and re-run a Tableau Prep Flow file
  • Connected to a Tableau Extract file
  • Used a fixed Level of Detail calculation in Tableau Prep
  • Connected to your cleaned and prepared data output in Tableau Desktop