fbpx




Applied Big Data Spark with R

Applied Big Data Spark with R
 

Due to the COVID-19 our training courses will be taught via an online classroom.

Receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.

This course is an introduction into machine learning models at scale using R as programming language. The lessons that are presented here focus on Spark concepts such as dataframes and Spark SQL and advanced machine learning algorithms using the Sparklyr library. After this course, you will be able to design models with very large datasets such as a churning model based on customer data, a model for income prediction based on people data, a classification model for detecting spam in SMS text messages and a joke recommendation system using Alternating Least Squares (ALS). This course is ideal for data scientists who want to step into the world of Big Data. As requirements, strong experience with R and general machine learning knowledge are needed.

Are you interested? Contact us and we will get in touch with you.

 

Get in touch for more information

Fill in the form and we will contact you about the Big Data Spark with R training:

Academy: Big Data Spark with R
I agree to be contacted *

About the training & classes

The Applied Big Data Spark with R training is split in 2 days. Click below to see a detailed description of each class: 

Spark with R: I

This training aims to give Apache Spark training using the R API. This is part 1 in a series of 2 courses. In the first Apache Spark training you will be introduced to basic Spark concepts including Spark’s Dataframe API and Spark SQL. These APIs are optimized for dealing with structured data, tabular data, and allow SQL access to very large datasets.

During the practical session participants will be introduced to the R API and then work on analyzing some simple examples using Spark SQL. The training includes theory, demos, and hands-on exercises.

After this training you will have gained knowledge on:

  • Spark’s Dataframe API
  • Spark SQL
  • The Parquet storage format
Spark with R: II

This training aims to give Apache Spark training using the R API. This is part 2 in a series of 2 courses.

In the second Apache Spark training you will be introduced to Machine Learning concepts with Spark’s MLlib API as well as how to apply them at scale. During the practical sessions participants will work on a churning model based on customer data, a model for income prediction based on people data, a classification model for detecting spam in SMS text messages and a joke recommendation system using ALS.

The training includes theory, demos, and hands-on exercises.

After this training you will have gained knowledge about:

  • Apache Arrow and UDFs with Sparklyr 
  • Basic machine learning concepts
  • Spark MLlib
  • Pipelines in Spark
  • Using Spark and machine learning for predictions