Data Engineering with Hadoop
Big Data is not just a buzzword, but a real phenomenon. Every day, companies around the world collect and process vast amounts of data at high speeds. This data is often unstructured and inconsistent, making it nearly impossible to process using traditional methods.
One of the platforms that has proven itself for working with big data is Apache Hadoop. This is an open-source framework in Java that allows processing and storing large volumes of data in clusters using simple programming models. Hadoop is a flexible, fast, and affordable architecture capable of detecting and handling failures at the application level.
Read more about the course
What You Will Learn
In this course led by Suyog Nagaokar, you will gain a comprehensive understanding of the Hadoop architecture and its components:
- HDFS
- YARN
- MapReduce
- Hive
- Sqoop
The course includes theoretical foundations and practical lab exercises. You will learn to:
- Understand the concept of the Hadoop ecosystem
- Use basic Hadoop commands
- Implement solutions based on each Hadoop component to solve real business problems
You will install and configure a full Hadoop environment using Cloudera Quickstart VM right on your computer. In practice, you will learn to:
- Store and query data using Sqoop, Hive, and MySQL
- Write Hive queries to analyze data on Hadoop
- Work with data clusters using HDFS, MapReduce, and YARN
- Manage clusters using Hue
Requirements
- A PC with a 64-bit version of Windows or Linux and internet access
- At least 8 GB of free (not total) RAM to complete practical tasks (if less, you can follow along with the training but without practice)
- Basic programming skills, preferably in Python
- Familiarity with the Linux command line will be a big plus
The course is suitable for both beginners and those who want to deepen their knowledge in Big Data and learn to work with one of the most popular frameworks in the industry.
Watch Online Data Engineering with Hadoop
# | Title | Duration |
---|---|---|
1 | What can you expect from this course? | 02:10 |
2 | Introduction to Big Data | 14:50 |
3 | What is Hadoop? Why Hadoop? | 05:38 |
4 | Hadoop Architecture – Overview | 02:39 |
5 | Hadoop Architecture – Key services | 07:13 |
6 | Storage/Processing characteristics | 07:51 |
7 | Store and process data in HDFS | 03:56 |
8 | Handling failures - Part 1 | 05:10 |
9 | Handling failures - Part 2 | 07:33 |
10 | Rack Awareness | 05:59 |
11 | Hadoop 1 v/s Hadoop 2 | 12:51 |
12 | Hadoop Ecosystem | 03:36 |
13 | Vanilla/HDP/CDH/Cloud distributions | 10:12 |
14 | Install Cloudera Quickstart Docker | 07:19 |
15 | Hands-on with Linux and Hadoop commands | 05:49 |
16 | Hive Overview | 04:54 |
17 | How Hive works | 05:57 |
18 | Hive query execution flow | 04:59 |
19 | Creating a Data Warehouse & Loading data | 05:10 |
20 | Creating a Hive Table | 21:19 |
21 | Load data from local & HDFS | 17:19 |
22 | Internal tables vs External tables | 17:20 |
23 | Partitioning & Bucketing. (Cardinality concept) | 16:24 |
24 | Static Partitioning - Lab | 14:58 |
25 | Dynamic Partitioning - Lab | 13:55 |
26 | Bucketting - Lab | 22:32 |
27 | Storing Hive query output | 11:34 |
28 | Hive SerDe | 14:26 |
29 | ORC File Format | 14:10 |
30 | Sqoop overview | 03:52 |
31 | Sqoop list-databases and list-tables | 06:31 |
32 | Scoop Eval? | 03:59 |
33 | Import RDBMS table with Sqoop | 11:40 |
34 | Handling parallelism in Sqoop | 09:02 |
35 | Import table without primary key | 11:01 |
36 | Custom Query for Sqoop Import | 08:48 |
37 | Incremental Sqoop Import - Append | 09:52 |
38 | Incremental Sqoop Import - Last Modified | 13:55 |
39 | Scoop Job | 08:01 |
40 | Sqoop Import to a Hive table | 10:59 |
41 | Sqoop Import all tables - Part 1 | 06:20 |
42 | Sqoop Import all tables - Part 2 | 14:03 |
43 | Sqoop Export | 06:14 |
44 | Export Hive table | 04:36 |
45 | Export with Staging table | 06:24 |
Similar courses to Data Engineering with Hadoop

Getting Started with Embedded AI | Edge AIudemy

Python for Business Data Analytics & Intelligencezerotomastery.io

Dockerized ETL With AWS, TDengine & GrafanaAndreas Kretz

Apache Spark Certification TrainingFlorian Roscheck

PyTorch for Deep Learning with Python Bootcampudemy

Platform & Pipeline SecurityAndreas Kretz

Complete Machine Learning and Data Science: Zero to Masteryudemyzerotomastery.io

dbt for Data EngineersAndreas Kretz

Statistics for Data Science and Business Analysisudemy
