The math behind data science

The math behind data science


Course code: 
Time Unit: 

Big Data and data science are words widely found on the internet. But what can data science do for you and how does it do its magic. This course offers you a mathematical background of statistics and machine learning techniques used in data science to extract knowledge from your data.

Moreover, this course will show you examples for statistics in excel and R and for machine learning in an easy drag and drop environment like Weka, but it does not offer you hands-on knowledge of these example environments. Therefor we refer to the course “Data science: the process”, the second part of our data science track.

Essentially, this course gives an overview of the math behind the following statistics and machine learning techniques:

- Descriptive statistics
- Hypothesis testing
- Data quality
- General linear models (t-test, Anova, regression, mixed models, repeated measures
- Correlation
- The Bayesian approach (vs frequentist approach)
- Recommendations
- Prediction/classification models

Learning objectives:
- Understand the basics of statistics
- Get an idea of what statistics can do for you
- Grasp the meaning of data quality and know what you can do to enhance the quality of your data, early as well as late in your data gathering process
- Get to know the difference between correlation and causation
- Getting an insight in widely used recommendation engines
- Introduce the difference between probability estimators and classifiers
- Understand many machine learning techniques
- Have an idea of deep learning techniques like neural networks



CHAPTER 1: Introduction to statistics
- What is statistics?
- Probabilities
- Descriptive statistics
- Data quality

CHAPTER 2: Statistics in action
- Hypothesis testing
- Correlation
- General linear models
- Non-parametric variants

CHAPTER 3: Other analytical methods
- Repeated measurements
- Bayesian analysis
- Multivariate techniques
- And more…

CHAPTER 4: Machine learning
- What is machine learning?
- What can machine learning do for you?

CHAPTER 5: Recommendation engines
- Item similarity
- User similarity

CHAPTER 6: Probability estimators and classifiers
- The difference
- Basic methods: logistic regression, decision tree, naive Bayes, …
- Upgrading models: GAM, random forest, …
- Support vector machines

CHAPTER 7: Introduction to deep learning
- Neural networks



No knowledge is required, but an interest in statistics and mathematics is advised.



This course is aimed towards management/BI personnel willing to understand the basic math behind data science and willing to know what it can do for them.