Data Preparation & Cleaning for ML

3h 7m 23s
English
Paid

Course description

Have you ever heard the expression "data preparation and cleaning"? This is perhaps the most important part of the entire machine learning process. Real-world data is often "messy" - it can contain errors, omissions, duplicates, and outliers, leading to distortions, issues, and failures in model performance. That is why it is crucial that data is cleaned and ready for analysis.
Read more about the course
Simply put, data preparation and cleaning are implementations of the principle of "garbage in, garbage out." Identifying and correcting errors, removing damaged and duplicate records, filling in missing values, and handling outliers are all essential steps in preparation. This process can be labor-intensive, but it is quality data that determines a project's success. Even the most advanced machine learning algorithms cannot be trained on unstructured or "dirty" data. To ensure you feel confident in your ML projects, this mini-course will cover everything you need to know about data preparation. - We'll start with an 8-key-step checklist to keep in mind when launching any project. - We'll delve into theory, including missing values, outliers, feature selection, and more. - We'll move on to practice, where for each segment you'll complete tasks in Python, working with real data.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 18 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing

Watch Online Data Preparation & Cleaning for ML

0:00
/
#1: Introduction

All Course Lessons (18)

#Lesson TitleDurationAccess
1
Introduction Demo
01:02
2
ML Prep Checklist
07:18
3
Theory Missing Values
08:48
4
Missing Values with Pandas
12:43
5
Missing Values with SimpleImputer
11:06
6
Missing Values with KNNImputer
11:50
7
Theory Categorical Variables
08:19
8
Categorical Variables One-Hot-Encoding
10:51
9
Theory Outliers
08:56
10
Outliers hands-on
13:35
11
Theory Feature Scaling
09:20
12
Feature Scaling hands-on
08:19
13
Theory Feature Selection
12:05
14
Practical Correlation Matrix
04:27
15
Practical Univariate Testing
17:54
16
Practical RFECV
13:49
17
Theory Model Validation
08:54
18
Practical Model Validation
18:07

Unlock unlimited learning

Get instant access to all 17 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Books

Read Book Data Preparation & Cleaning for ML

#Title
1Note to students from Andrew Jones

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

Machine Learning & Containers on AWS

Machine Learning & Containers on AWS

Sources: Andreas Kretz
In this practical course, you will learn how to build a complete data pipeline on the AWS platform - from obtaining data from the Twitter API to analysis, stora
1 hour 33 minutes 34 seconds
Learn to Build Machine Learning Systems That Don't Suck

Learn to Build Machine Learning Systems That Don't Suck

Sources: Santiago Valdarrama
A live, interactive course that will teach you from scratch how to design, create, and implement ready-to-use ML systems - no fluff and academic...
32 hours 6 minutes 40 seconds
Build a Simple Neural Network & Learn Backpropagation

Build a Simple Neural Network & Learn Backpropagation

Sources: zerotomastery.io
Learn backpropagation and gradient descent by writing a simple neural network from scratch in Python - without libraries, just the basics. Ideal...
4 hours 34 minutes 9 seconds
Machine Learning with Spark ML

Machine Learning with Spark ML

Sources: zerotomastery.io
Learn to use Spark ML for creating scalable machine learning solutions. Practice with regression, classification, feature engineering...
2 hours 7 minutes 29 seconds
Predictive Analytics & Machine Learning

Predictive Analytics & Machine Learning

Sources: LunarTech
Predictive analytics and machine learning is a course that will help you master key concepts and practical skills in data forecasting...
55 minutes 15 seconds