Data science: the process (use cases in R)

Data science: the process (use cases in R)


Course code: 
Time Unit: 

This course gives an overview of a typical data science process. It covers the whole track from collecting your data over creating prediction models to presenting/integrating your final results
The course is the second part of our data science track.
The data science process is explained from the ground up and the math behind creating models is left out as much as possible since this is the topic of the first course of the track, namely “The math behind data science”

Essentially, this course gives an overview of the entire data science process

- Collect (+ cleaning and sampling)
- Describe (descriptive statistics, data visualization)
- Discover (exploratory data analysis, hypothesis testing, building models)
- Predict (extrapolation, patterns, model validation)
- Advise (data-driven decisions)

Use cases covering the entire cycle will be provided.

Learning objectives:
- Explain the data science process and the difference with typical IT development cycles
- Get an idea of what machine learning can do for you
- Know that data science and machine learning are not magic
- Understand R code in a data science use case



CHAPTER 1: Introduction to data science
- What is data science?
- The data science process
- Data science use cases

CHAPTER 2: Introduction to machine learning
- What is machine learning?
- A quick overview of some algorithms
- Machine learning use cases

CHAPTER 3: The data science process: pre-modeling
- Collect and clean your data
- To sample or not to sample
- Summary statistics and plots
- Exploratory data analysis
- Class tutorial (no coding required)

CHAPTER 4: The data science process: prediction models
- Prediction models
- Model validation
- Class tutorial (no coding required)

CHAPTER 5: The data science process: Advise
- How to use your actual predictions
- Integration in applications

CHAPTER 6: The data science process: the entire cycle
- Example use case provided in R



- Some basic understanding of prediction models
- The first part of the data science track “The math behind data science” will help you to understand the models, but is certainly not required
- Some experience with R (we refer to our R track and especially “Getting started with R”) is useful to read code, but it is not expected that you will write any code yourself



This course is aimed towards management/BI personnel willing to know how they could benefit from the entire process and understanding the data science process in order to indicate what would be necessary in their company to start with this process.