Skip to main content
CourseFlix

Data Engineering with Hadoop

7h 3m
English
Paid

Big Data is not just a buzzword, but a real phenomenon. Every day, companies around the world collect and process vast amounts of data at high speeds. This data is often unstructured and inconsistent, making it nearly impossible to process using traditional methods. One of the platforms that has proven itself for working with big data is Apache Hadoop. This is an open-source framework in Java that allows processing and storing large volumes of data in clusters using simple programming models. Hadoop is a flexible, fast, and affordable architecture capable of detecting and handling failures at the application level.

Course Overview

In this course led by Suyog Nagaokar, you will gain a comprehensive understanding of the Hadoop architecture and its components:

  • HDFS (Hadoop Distributed File System) - for large-scale storage
  • YARN (Yet Another Resource Negotiator) - for resource management
  • MapReduce - for data processing
  • Hive - for SQL-like querying
  • Sqoop - for transferring data between Hadoop and relational databases

The course includes both theoretical foundations and practical lab exercises. By the end of the course, you will be able to:

  • Grasp the essential elements of the Hadoop ecosystem
  • Execute basic Hadoop commands
  • Create solutions using each Hadoop component for tackling real-world business challenges

Practical Application

You will install and configure a full Hadoop environment using the Cloudera Quickstart VM directly on your computer. In practice, you will learn to:

  • Utilize Sqoop, Hive, and MySQL for data storage and querying
  • Craft and execute Hive queries for data analysis on Hadoop
  • Manage data clusters efficiently using HDFS, MapReduce, and YARN
  • Operate clusters with the Hue interface

Course Requirements

  • A PC with a 64-bit version of Windows or Linux and internet access
  • At least 8 GB of free (not total) RAM to complete practical tasks (having less will allow you to follow along with the training theory but without practice)
  • Basic programming skills, preferably with Python
  • Familiarity with the Linux command line is highly advantageous

This course is ideal for both beginners and those who wish to expand their knowledge in Big Data and master one of the industry's most popular frameworks.

Additional

https://github.com/team-data-science/Hadoop-Suyog-Nagaokar

About the Author: Suyog Nagaokar

Suyog Nagaokar thumbnail

Suyog Nagaokar is a software engineer and educator focused on the Hadoop / big-data ecosystem — the foundational platform for processing data at scale that anchored a generation of data-engineering work.

His CourseFlix listing carries Data Engineering with Hadoop — a structured treatment of the Hadoop ecosystem: HDFS, MapReduce, YARN, the Hive / Pig / Spark layers on top, and the operational patterns for running Hadoop clusters in production.

Material is paid and aimed at data engineers picking up Hadoop for legacy or current production systems. For broader content, see CourseFlix's Data processing and analysis category page.

Watch Online 45 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 45 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: What can you expect from this course?
All Course Lessons (45)
#Lesson TitleDurationAccess
1
What can you expect from this course? Demo
02:10
2
Introduction to Big Data
14:50
3
What is Hadoop? Why Hadoop?
05:38
4
Hadoop Architecture – Overview
02:39
5
Hadoop Architecture – Key services
07:13
6
Storage/Processing characteristics
07:51
7
Store and process data in HDFS
03:56
8
Handling failures - Part 1
05:10
9
Handling failures - Part 2
07:33
10
Rack Awareness
05:59
11
Hadoop 1 v/s Hadoop 2
12:51
12
Hadoop Ecosystem
03:36
13
Vanilla/HDP/CDH/Cloud distributions
10:12
14
Install Cloudera Quickstart Docker
07:19
15
Hands-on with Linux and Hadoop commands
05:49
16
Hive Overview
04:54
17
How Hive works
05:57
18
Hive query execution flow
04:59
19
Creating a Data Warehouse & Loading data
05:10
20
Creating a Hive Table
21:19
21
Load data from local & HDFS
17:19
22
Internal tables vs External tables
17:20
23
Partitioning & Bucketing. (Cardinality concept)
16:24
24
Static Partitioning - Lab
14:58
25
Dynamic Partitioning - Lab
13:55
26
Bucketting - Lab
22:32
27
Storing Hive query output
11:34
28
Hive SerDe
14:26
29
ORC File Format
14:10
30
Sqoop overview
03:52
31
Sqoop list-databases and list-tables
06:31
32
Scoop Eval?
03:59
33
Import RDBMS table with Sqoop
11:40
34
Handling parallelism in Sqoop
09:02
35
Import table without primary key
11:01
36
Custom Query for Sqoop Import
08:48
37
Incremental Sqoop Import - Append
09:52
38
Incremental Sqoop Import - Last Modified
13:55
39
Scoop Job
08:01
40
Sqoop Import to a Hive table
10:59
41
Sqoop Import all tables - Part 1
06:20
42
Sqoop Import all tables - Part 2
14:03
43
Sqoop Export
06:14
44
Export Hive table
04:36
45
Export with Staging table
06:24
Unlock unlimited learning

Get instant access to all 44 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Course content

45 lessons · 7h 3m
Show all 45 lessons
  1. 1 What can you expect from this course? 02:10
  2. 2 Introduction to Big Data 14:50
  3. 3 What is Hadoop? Why Hadoop? 05:38
  4. 4 Hadoop Architecture – Overview 02:39
  5. 5 Hadoop Architecture – Key services 07:13
  6. 6 Storage/Processing characteristics 07:51
  7. 7 Store and process data in HDFS 03:56
  8. 8 Handling failures - Part 1 05:10
  9. 9 Handling failures - Part 2 07:33
  10. 10 Rack Awareness 05:59
  11. 11 Hadoop 1 v/s Hadoop 2 12:51
  12. 12 Hadoop Ecosystem 03:36
  13. 13 Vanilla/HDP/CDH/Cloud distributions 10:12
  14. 14 Install Cloudera Quickstart Docker 07:19
  15. 15 Hands-on with Linux and Hadoop commands 05:49
  16. 16 Hive Overview 04:54
  17. 17 How Hive works 05:57
  18. 18 Hive query execution flow 04:59
  19. 19 Creating a Data Warehouse & Loading data 05:10
  20. 20 Creating a Hive Table 21:19
  21. 21 Load data from local & HDFS 17:19
  22. 22 Internal tables vs External tables 17:20
  23. 23 Partitioning & Bucketing. (Cardinality concept) 16:24
  24. 24 Static Partitioning - Lab 14:58
  25. 25 Dynamic Partitioning - Lab 13:55
  26. 26 Bucketting - Lab 22:32
  27. 27 Storing Hive query output 11:34
  28. 28 Hive SerDe 14:26
  29. 29 ORC File Format 14:10
  30. 30 Sqoop overview 03:52
  31. 31 Sqoop list-databases and list-tables 06:31
  32. 32 Scoop Eval? 03:59
  33. 33 Import RDBMS table with Sqoop 11:40
  34. 34 Handling parallelism in Sqoop 09:02
  35. 35 Import table without primary key 11:01
  36. 36 Custom Query for Sqoop Import 08:48
  37. 37 Incremental Sqoop Import - Append 09:52
  38. 38 Incremental Sqoop Import - Last Modified 13:55
  39. 39 Scoop Job 08:01
  40. 40 Sqoop Import to a Hive table 10:59
  41. 41 Sqoop Import all tables - Part 1 06:20
  42. 42 Sqoop Import all tables - Part 2 14:03
  43. 43 Sqoop Export 06:14
  44. 44 Export Hive table 04:36
  45. 45 Export with Staging table 06:24

Related courses

  • Data Engineering on Azure thumbnail

    Data Engineering on Azure

    Sources: Kristijan Bakarić
    Microsoft Azure is a cloud platform offering more than 200 products and services for data storage, management, virtual machine deployment, and...
    1 hour 20 minutes 57 seconds
  • Getting Started with Embedded AI | Edge AI thumbnail

    Getting Started with Embedded AI | Edge AI

    Sources: Udemy
    Nowadays, you may have heard of many keywords like Embedded AI /Embedded ML /Edge AI, the meaning behind them is the same, I.e. To make an AI algorithm or model
    3 hours 33 minutes 42 seconds 5 / 5
  • Data Structures and Algorithmic Trading: Machine Learning thumbnail

    Data Structures and Algorithmic Trading: Machine Learning

    Sources: Udemy
    Data Structures and Algorithmic trading is a method of executing orders using automated pre-programmed trading instructions over time. They were developed so th
    2 hours 20 minutes 32 seconds 5 / 5

Frequently asked questions

What is Data Engineering with Hadoop about?
Big Data is not just a buzzword, but a real phenomenon. Every day, companies around the world collect and process vast amounts of data at high speeds. This data is often unstructured and inconsistent, making it nearly impossible to process…
Who teaches Data Engineering with Hadoop?
Data Engineering with Hadoop is taught by Suyog Nagaokar. You can find more courses by this instructor on the corresponding source page.
How long is Data Engineering with Hadoop?
Data Engineering with Hadoop contains 45 lessons with a total runtime of 7 hours 3 minutes. All lessons are available to watch online at your own pace.
Is Data Engineering with Hadoop free to watch?
Data Engineering with Hadoop is part of CourseFlix's premium catalog. A CourseFlix subscription unlocks the full video player; the course description, table of contents, and preview information are available to everyone.
Where can I watch Data Engineering with Hadoop online?
Data Engineering with Hadoop is available to watch online on CourseFlix at https://courseflix.net/course/data-engineering-with-hadoop. The page hosts every lesson with the integrated video player; no download is required.