Hands-on with SparkR

Hands-on with SparkR


Course code: 
Time Unit: 

As of June 2015 SparkR is integrated in the actual Spark project. In the original version, no Spark MLlib machine learning algorithms were accessible via R, but this is rapidly changing. For example generalized linear models (glm) are already accessible via R. We will handle the latest Spark version.
In this one-day SparkR course, you will understand how Spark is working under the hood (MapReduce paradigm, lazy evaluation, …) and learn how to use SparkR.
You will start setting up a local Spark cluster and access it via R.
Next up you will learn basic data transformations in SparkR, either via R code or via SparkSql.
Finally we will use SparkR’s glm and compare it to R’s glm and we will implement our own machine learning algorithm.



CHAPTER 1: Introduction to Spark
CHAPTER 2: Really short introduction to R
CHAPTER 3: Starting SparkR
CHAPTER 4: SparkR code
CHAPTER 5: Integrating R code in Spark
CHAPTER 6: Machine learning from scratch in SparkR
CHAPTER 7: Newest features in the latest version
CHAPTER 8: Spark MLlib in SparkR



Previous experience with R is required, notions of Apache Spark are useful but not required.



This course is aimed towards R developers willing to use the power of Big Data.
Big Data developers wanting to use the power of R as well can follow this course. However they will not be the advisory participants.