Skip to main content

Data Engineering with Hadoop

7h 3m
English
Paid

Course description

Big Data is not just a buzzword, but a real phenomenon. Every day, companies around the world collect and process vast amounts of data at high speeds. This data is often unstructured and inconsistent, making it nearly impossible to process using traditional methods.

One of the platforms that has proven itself for working with big data is Apache Hadoop. This is an open-source framework in Java that allows processing and storing large volumes of data in clusters using simple programming models. Hadoop is a flexible, fast, and affordable architecture capable of detecting and handling failures at the application level.

Read more about the course

What You Will Learn

In this course led by Suyog Nagaokar, you will gain a comprehensive understanding of the Hadoop architecture and its components:

  • HDFS
  • YARN
  • MapReduce
  • Hive
  • Sqoop

The course includes theoretical foundations and practical lab exercises. You will learn to:

  • Understand the concept of the Hadoop ecosystem
  • Use basic Hadoop commands
  • Implement solutions based on each Hadoop component to solve real business problems

You will install and configure a full Hadoop environment using Cloudera Quickstart VM right on your computer. In practice, you will learn to:

  • Store and query data using Sqoop, Hive, and MySQL
  • Write Hive queries to analyze data on Hadoop
  • Work with data clusters using HDFS, MapReduce, and YARN
  • Manage clusters using Hue

Requirements

  • A PC with a 64-bit version of Windows or Linux and internet access
  • At least 8 GB of free (not total) RAM to complete practical tasks (if less, you can follow along with the training but without practice)
  • Basic programming skills, preferably in Python
  • Familiarity with the Linux command line will be a big plus

The course is suitable for both beginners and those who want to deepen their knowledge in Big Data and learn to work with one of the most popular frameworks in the industry.

Watch Online

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 45 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: What can you expect from this course?

All Course Lessons (45)

#Lesson TitleDurationAccess
1
What can you expect from this course? Demo
02:10
2
Introduction to Big Data
14:50
3
What is Hadoop? Why Hadoop?
05:38
4
Hadoop Architecture – Overview
02:39
5
Hadoop Architecture – Key services
07:13
6
Storage/Processing characteristics
07:51
7
Store and process data in HDFS
03:56
8
Handling failures - Part 1
05:10
9
Handling failures - Part 2
07:33
10
Rack Awareness
05:59
11
Hadoop 1 v/s Hadoop 2
12:51
12
Hadoop Ecosystem
03:36
13
Vanilla/HDP/CDH/Cloud distributions
10:12
14
Install Cloudera Quickstart Docker
07:19
15
Hands-on with Linux and Hadoop commands
05:49
16
Hive Overview
04:54
17
How Hive works
05:57
18
Hive query execution flow
04:59
19
Creating a Data Warehouse & Loading data
05:10
20
Creating a Hive Table
21:19
21
Load data from local & HDFS
17:19
22
Internal tables vs External tables
17:20
23
Partitioning & Bucketing. (Cardinality concept)
16:24
24
Static Partitioning - Lab
14:58
25
Dynamic Partitioning - Lab
13:55
26
Bucketting - Lab
22:32
27
Storing Hive query output
11:34
28
Hive SerDe
14:26
29
ORC File Format
14:10
30
Sqoop overview
03:52
31
Sqoop list-databases and list-tables
06:31
32
Scoop Eval?
03:59
33
Import RDBMS table with Sqoop
11:40
34
Handling parallelism in Sqoop
09:02
35
Import table without primary key
11:01
36
Custom Query for Sqoop Import
08:48
37
Incremental Sqoop Import - Append
09:52
38
Incremental Sqoop Import - Last Modified
13:55
39
Scoop Job
08:01
40
Sqoop Import to a Hive table
10:59
41
Sqoop Import all tables - Part 1
06:20
42
Sqoop Import all tables - Part 2
14:03
43
Sqoop Export
06:14
44
Export Hive table
04:36
45
Export with Staging table
06:24

Unlock unlimited learning

Get instant access to all 44 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Comments

0 comments

Want to join the conversation?

Sign in to comment

Similar courses

TensorFlow Developer Certificate in 2023: Zero to Mastery

TensorFlow Developer Certificate in 2023: Zero to Mastery

Sources: zerotomastery.io
Learn TensorFlow. Pass the TensorFlow Developer Certificate Exam. Get Hired as a TensorFlow developer. This course will take you from a TensorFlow beginner to b
62 hours 43 minutes 54 seconds
Streaming with Kafka & Spark

Streaming with Kafka & Spark

Sources: Andreas Kretz
This course is a comprehensive project with a full cycle of real-time data processing. You will work with data from an online store, including invoices...
2 hours 46 minutes 25 seconds
Apache Kafka Fundamentals

Apache Kafka Fundamentals

Sources: Andreas Kretz
In this course, you will acquire the basic knowledge necessary for confidently starting to work with Apache Kafka. You will learn how to set up a message...
1 hour 4 minutes 52 seconds
The Data Bootcamp: Transform your Data using dbt™

The Data Bootcamp: Transform your Data using dbt™

Sources: udemy
Are you looking for a cutting-edge way to extract load and transform your data? Do you want to know more about dbt™ and how to use it? Well, this is the course
4 hours 10 minutes 51 seconds
Data Analysis for Beginners: Excel & Pivot Tables

Data Analysis for Beginners: Excel & Pivot Tables

Sources: zerotomastery.io
This short course on data analysis in Excel is perfect for beginners who want to acquire skills in analyzing structured data using two of Excel's most...
2 hours 10 minutes 21 seconds