UE 4 - Data science

Cours (CM) 50h
Cours intégrés (CI) -
Travaux dirigés (TD) -
Travaux pratiques (TP) -
Travail étudiant (TE) 90h

Langue de l'enseignement : Français

Description du contenu de l'enseignement

1) The Data science part is structured in four macro blocks:
1. The art of learning from data. What is learning; supervised learning and function approximation; bias- variance trade-off; model accuracy, assessment and selection; cross validation.
2. Regression methods and regularization. Least squares revisited; model selection and regularization; subset selection methods; shrinkage methods (ridge, LASSO, LARS, elastic nets); dimension reduction methods (PCA, PLS).
3. Classification. Linear regression on indicator matrices; logistic regression; linear and quadratic discriminant analysis (LDA and QDA); hyperplane separation theorems; optimal separating hyperplane; “kernel trick”; Support Vector Machines (SVM).
4. Tree-based methods. Stratified feature space; tree-building process; recursive binary splitting and pruning.
2) The Deep Learning part is structured in four macro blocks:
1. Machine learning paradigm; overfitting and underfitting; bias and variance; gradient-based learning; motivations for deep models; historical trends in artificial neural networks research.
2. Architecture design for deep feedforward neural networks; hidden layers, hidden and output units; universal approximation theorem; computational graphs language; back-propagation algorithm
3. Surrogate loss functions; batch/minibatch deterministic and stochastic methods; main challenges in neural network optimization (ill-conditioning, local minima, flat regions, cliff, etc.); stochastic gradient descent; momentum; Nesterov momentum; parameters initialization strategies; algorithms with adaptive learning rates; supervised pre-training
4. Regularization strategies for deep models; parameter norm penalties; data augmentation and sparse representation; early stopping algorithm; Ensemble methods; dropout; adversarial training.
5. Introduction to convolutional neural networks (CNNs) and recurrent neural networks (RNNs)

Compétences à acquérir

Upon completion of this course, students will have solid theoretical knowledge on the most effective (supervised) machine learning techniques, and gain practice implementing them.

- Select the appropriate method based on the scope and available data.
- Implement a range of regression and classification methods.
- Develop predicts tools for economics and business problems.
- Source, store and pre-process heterogeneous (large scale) data.
- Choose, design and train supervised machine learning techniques.
- Coding in R and Python.
- Speak in public to present an empirical project.

Bibliographie, lectures recommandées

Part 1:
- Hastie T., R. Tibshirani, J. Friedman, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer.
- James G., D. Witten, T. Hastie, R. Tibshirani, 2013, An Introduction to Statistical Learning with Applications in R, Springer.

Part 2 :
- Goodfellow, I., Y. Bengio, & A. Courville, 2016, Deep learning. MIT press.
- Chollet, F., & J. J. Allaire, 2017, Deep Learning with R. Manning Publications.
- Chollet, F., 2017, Deep Learning with Python. Manning Publications.

Contact

Faculté des sciences économiques et de gestion (FSEG)

61, avenue de la Forêt Noire
67085 STRASBOURG CEDEX
0368852178

Formulaire de contact

Responsable

Stefano Bianchini

s.bianchini@unistra.fr