4.05 out of 5
4.05
54 reviews on Udemy

A Big Data Hadoop and Spark project for absolute beginners

Hadoop, Spark, Python, Scala, Dataproc, AWS S3 Data Lake, Glue, Athena, Machine Learning trough a real world use case
Instructor:
FutureX Skill
4,447 students enrolled
Big Data , Hadoop and Spark from scratch using Python and Scala. You will also learn how to use free cloud tools to get started with Hadoop and Spark programming in minutes. Additionally you will find two bonus projects on AWS data lake solution and Machine Learning Classification model

A bank is launching a new credit card and wants to identify prospects it can target in its marketing campaign.

It has received prospect data from various internal and 3rd party sources. The data has various issues such as missing or unknown values in certain fields.The data needs to be cleansed before any kind of analysis can be done.

Since the data is in huge volume with billions of records, the bank has asked you to use Big Data Hadoop and Spark technology to cleanse, transform and analyze this data.

What you will learn :

  • Big Data, Hadoop concepts

  • How to create a free Hadoop and Spark cluster using Google Dataproc

  • Hadoop hands-on – HDFS, Hive

  • Why there was a need for Spark

  • Python basics

  • PySpark RDD – hands-on

  • PySpark SQL, DataFrame – hands-on

  • Project work using PySpark and Hive

  • Scala basics

  • Spark Scala DataFrame

  • Project working using Spark Scala

  • Google Colab environment

  • Bonus project – Applying spark transformation on data stored in AWS S3 using Glue and viewing data using Athena

  • Bonus project – Build your first Machine Learning model using Python, Scikit-learn to predict whether a customer will buy or not.

Prerequisites :

  • Some basic programming skills

  • Some knowledge of SQL queries

Introduction

1
Introduction

Big Data Hadoop concepts

1
Big Data concepts
2
Hadoop concepts

Hadoop - Hands-On

1
Creating a free Hadoop and Spark cluster using Google Dataproc
2
Reading HDFS data using Hive

Spark concepts and hands-on

1
Spark concepts
2
Installing Spark on Google Colab
3
Python basics
4
PySpark RDD
5
PySpark - Spark SQL and DataFrame
6
Running PySpark on a Hadoop Cluster

Project - Bank prospects marketing data cleansing using Spark

1
Project - Bank prospects marketing data transformation using Hadoop and Spark
2
Rapid Revision - Big Data, Hadoop and Spark concepts

Running the project in Scala

1
Scala basics
2
Spark SQL DataFrame using Scala
3
Bank prospects marketing project in Scala

Advanced Hive

1
Fast queries with Hive Partitioning
2
Fast queries with Hive Bucketing

Advanced Spark

1
Advanced Spark datasets
2
User Defined Function (UDF)
3
Joins - Left, Right, Inner, Outer

Bonus - Bank prospects data transformation using AWS S3, Glue and Athena

1
Introduction to AWS data lake use case

Learn the advantages of a serverless data lake solution over a Hadoop Platform

2
AWS data lake - S3, Glue and Athena introduction
3
Create a data lake on AWS S3
4
AWS Glue crawler and AWS Athena query tool
5
ETL transformation using AWS Glue
6
Triggering AWS Glue job with a serverless AWS Lambda function
7
Project - Bank prospects data transformation using S3, Glue & Athena services

Run the bank transformation code using AWS Glue. Store the prospects data in a bucket and apply transformation using the same code that you have executed in the Colab and Dataproc environment

Bonus project - Build your first Machine Learning Model

1
Use case introduction
2
AI ML Introduction & Google Colab Environment
3
NumPy Pandas MatPlotLib
4
Anaconda Spyder
5
Build your first Machine Learning Model
6
Feature Engineering
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.1
4.1 out of 5
54 Ratings

Detailed Rating

Stars 5
18
Stars 4
20
Stars 3
10
Stars 2
4
Stars 1
2
30-Day Money-Back Guarantee

Includes

5 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion
WP Twitter Auto Publish Powered By : XYZScripts.com