fbpx




Applied Big Data Spark with Python

Applied Big Data Spark with Python

Due to the COVID-19 our training courses will be taught via an online classroom.

Receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.

This course is an introduction into machine learning models at scale using Python as programming language. The lessons that are presented here focus on Spark concepts such as resilient distributed dataset (RDD), dataframes, Spark SQL and advanced machine learning algorithms using the Spark MLlib library. After this course, you will be able to design models with very large datasets such as a recommender system with Alternating Least Squares (ALS) and an airplane delays predictions model. This course is ideal for data scientists who want to step into the world of Big Data and master Spark. As requirements, strong experience with python and general machine learning knowledge are needed to follow the program.

Are you interested? Contact us and we will get in touch with you.

 

Get in touch for more information

Fill in the form and we will contact you about the Big Data Spark (Python) training:

Academy: Big Data Spark with Python
I agree to be contacted *

About the training & classes

The Applied Big Data Spark with Python training is split in 3 days. Click below to see a detailed description of each class: 

 
Spark: I

In this first Apache Spark training we will introduce basic Spark concepts and the Resilient Distributed Datasets (RDD) API that is core to Apache Spark.

During the practical session participants will use RDD API from Python to analyze a MovieLens dataset. The training includes theory and hands-on exercises.

After this training you will have gained knowledge about:

  • Spark concepts, roots and history
  • How Spark relates to Hadoop
  • How Spark solves challenges in concurrent and parallel programming
  • Spark RDDs and the RDD API
  • Spark deploy modes
Spark: II

In the second Apache Spark training you will be introduced to Spark’s Dataframe API and Spark SQL. These APIs are optimized for dealing with structured data, tabular data, and allow SQL access to very large datasets. During the practical session participants will be introduced to the APIs and then work on analyzing MovieLens dataset using Spark SQL.

The training includes theory, demos, and hands-on exercises.

After this training you will have gained knowledge on:

  • Spark’s Dataframe API
  • Spark SQL
  • The Parquet storage format
Spark: III

In the third Apache Spark training you will be introduced to Machine Learning concepts with Spark’s MLlib API as well as how to apply them at scale. During the practical sessions participants will work on a Recommender System and on predicting airplane delays.

The training includes theory, demos, and hands-on exercises.

After this training you will have gained knowledge about:

  • Basic machine learning concepts
  • Spark MLlib
  • Pipelines in Spark
  • Building a basic Recommender System in Spark
  • Using Spark and machine learning for predictions