What you’ll learn
- Machine Learning Fundamentals
- Data Preparation and Preprocessing for Machine Learning
- Machine Learning Models
- Recommendation Systems
- Basic understanding of Hadoop ecosystem
- Some prior programming or scripting experience
Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data.
In this course, discover how to work with this powerful platform for machine learning. I will discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to business problems than code, test, and maintain their own machine learning libraries. I will show you how to use DataFrames to organize data structure, and also the data preparation and the most commonly used types of machine learning algorithms: clustering, classification, regression, and recommendations.
- Machine learning workflows
- Organizing data in DataFrames
- Preprocessing and data preparation steps for machine learning
- Clustering data
- Classification algorithms
- Regression methods available in Spark MLlib
- Common approaches to designing recommendation systems
Who this course is for:
- People with some software development background who want to learn the hottest technology in big data analysis will want to check this out.
- If your Data scientist job involves, or will involve, processing large amounts of data, you need to know about Spark.
- If you’re training for a new career in data science, machine learning or big data, Spark is an important part of it.