Data Engineering with Hadoop

7h 3m
English
Paid

Big Data is not just a buzzword, but a real phenomenon. Every day, companies around the world collect and process vast amounts of data at high speeds. This data is often unstructured and inconsistent, making it nearly impossible to process using traditional methods.

One of the platforms that has proven itself for working with big data is Apache Hadoop. This is an open-source framework in Java that allows processing and storing large volumes of data in clusters using simple programming models. Hadoop is a flexible, fast, and affordable architecture capable of detecting and handling failures at the application level.

Read more about the course

What You Will Learn

In this course led by Suyog Nagaokar, you will gain a comprehensive understanding of the Hadoop architecture and its components:

  • HDFS
  • YARN
  • MapReduce
  • Hive
  • Sqoop

The course includes theoretical foundations and practical lab exercises. You will learn to:

  • Understand the concept of the Hadoop ecosystem
  • Use basic Hadoop commands
  • Implement solutions based on each Hadoop component to solve real business problems

You will install and configure a full Hadoop environment using Cloudera Quickstart VM right on your computer. In practice, you will learn to:

  • Store and query data using Sqoop, Hive, and MySQL
  • Write Hive queries to analyze data on Hadoop
  • Work with data clusters using HDFS, MapReduce, and YARN
  • Manage clusters using Hue

Requirements

  • A PC with a 64-bit version of Windows or Linux and internet access
  • At least 8 GB of free (not total) RAM to complete practical tasks (if less, you can follow along with the training but without practice)
  • Basic programming skills, preferably in Python
  • Familiarity with the Linux command line will be a big plus

The course is suitable for both beginners and those who want to deepen their knowledge in Big Data and learn to work with one of the most popular frameworks in the industry.

Watch Online Data Engineering with Hadoop

Join premium to watch
Go to premium
# Title Duration
1 What can you expect from this course? 02:10
2 Introduction to Big Data 14:50
3 What is Hadoop? Why Hadoop? 05:38
4 Hadoop Architecture – Overview 02:39
5 Hadoop Architecture – Key services 07:13
6 Storage/Processing characteristics 07:51
7 Store and process data in HDFS 03:56
8 Handling failures - Part 1 05:10
9 Handling failures - Part 2 07:33
10 Rack Awareness 05:59
11 Hadoop 1 v/s Hadoop 2 12:51
12 Hadoop Ecosystem 03:36
13 Vanilla/HDP/CDH/Cloud distributions 10:12
14 Install Cloudera Quickstart Docker 07:19
15 Hands-on with Linux and Hadoop commands 05:49
16 Hive Overview 04:54
17 How Hive works 05:57
18 Hive query execution flow 04:59
19 Creating a Data Warehouse & Loading data 05:10
20 Creating a Hive Table 21:19
21 Load data from local & HDFS 17:19
22 Internal tables vs External tables 17:20
23 Partitioning & Bucketing. (Cardinality concept) 16:24
24 Static Partitioning - Lab 14:58
25 Dynamic Partitioning - Lab 13:55
26 Bucketting - Lab 22:32
27 Storing Hive query output 11:34
28 Hive SerDe 14:26
29 ORC File Format 14:10
30 Sqoop overview 03:52
31 Sqoop list-databases and list-tables 06:31
32 Scoop Eval? 03:59
33 Import RDBMS table with Sqoop 11:40
34 Handling parallelism in Sqoop 09:02
35 Import table without primary key 11:01
36 Custom Query for Sqoop Import 08:48
37 Incremental Sqoop Import - Append 09:52
38 Incremental Sqoop Import - Last Modified 13:55
39 Scoop Job 08:01
40 Sqoop Import to a Hive table 10:59
41 Sqoop Import all tables - Part 1 06:20
42 Sqoop Import all tables - Part 2 14:03
43 Sqoop Export 06:14
44 Export Hive table 04:36
45 Export with Staging table 06:24

Similar courses to Data Engineering with Hadoop

Getting Started with Embedded AI | Edge AI

Getting Started with Embedded AI | Edge AIudemy

Category: Data processing and analysis
Duration 3 hours 33 minutes 42 seconds
Python for Business Data Analytics & Intelligence

Python for Business Data Analytics & Intelligencezerotomastery.io

Category: Python, Data processing and analysis
Duration 15 hours 25 minutes 6 seconds
Dockerized ETL With AWS, TDengine & Grafana

Dockerized ETL With AWS, TDengine & GrafanaAndreas Kretz

Category: Data processing and analysis
Duration 29 minutes 12 seconds
Apache Spark Certification Training

Apache Spark Certification TrainingFlorian Roscheck

Category: Python, Data processing and analysis
Duration 15 hours 13 minutes 1 second
PyTorch for Deep Learning with Python Bootcamp

PyTorch for Deep Learning with Python Bootcampudemy

Category: Python, Data processing and analysis
Duration 17 hours 2 minutes 14 seconds
Platform & Pipeline Security

Platform & Pipeline SecurityAndreas Kretz

Category: Data processing and analysis
Duration 34 minutes 46 seconds
Complete Machine Learning and Data Science: Zero to Mastery

Complete Machine Learning and Data Science: Zero to Masteryudemyzerotomastery.io

Category: Data processing and analysis
Duration 43 hours 22 minutes 23 seconds
dbt for Data Engineers

dbt for Data EngineersAndreas Kretz

Category: Data processing and analysis
Duration 1 hour 52 minutes 55 seconds
Statistics for Data Science and Business Analysis

Statistics for Data Science and Business Analysisudemy

Category: Data processing and analysis
Duration 4 hours 49 minutes 30 seconds
Data Analysis for Beginners: Python & Statistics

Data Analysis for Beginners: Python & Statisticszerotomastery.io

Category: Python, Data processing and analysis
Duration 6 hours 34 minutes 20 seconds