Statistics and Computing

In 2020 EMbeDS started actively promoting the development and coordination of courses related to its core research mission. These include courses on programming and data processing, which build upon a Python course first offered by Andrea Vandin and Daniele Licari to students of the Sant’Anna School in the Spring of 2020, and courses on statistical methods, which build upon prior courses offered at the School by Francesca Chiaromonte – stressing applications to large contemporary datasets.

In the second semester of a.y. 2020-21, we were able to offer a sequence of four coordinated courses entitled Statistics, Computing & Data Processing (here descriptions and materials).

Due to the COVID-19 epidemic, the courses were held in blended mode, or entirely online, using modern e-learning technologies and implementing an innovative experiential approach to online learning. They saw a large enrollment by undergraduate and graduate students of the Sant’Anna School, as well as by students from other programs in the Pisan academic community.

  • Applied Statistics (Chiara Seghieri), which provided a 10-hours review of basic statistical concepts and linear and generalized linear models.
  • Topics in Statistical Learning (Francesca Chiaromonte and Jacopo di Iorio), which through two modules of 20 and 10 hours, respectively, introduced the students to various topics in contemporary Statistical Learning (clustering, unsupervised and supervised dimension reduction, classification, smoothing, crossvalidation, resampling, regularization and feature selection) – with a focus on hands-on data analysis work, Practicum sessions held in R, and group projects.
  • Introduction to Programming and Data Processing (Andrea Vandin and Daniele Licari), which through two modules of 20 and 10 hours, respectively, introduced the students to structured computer programming and various data processing techniques (manipulation and visualization) using Python as reference language.
  • Statistical Methods for Large, Complex Data (Francesca Chiaromonte and Andrea Vandin), which with 14 additional hours, described contemporary approaches to the analysis of high dimensional, ultra-high dimensional, and ultra-large datasets – including those characterized by internal structures and imbalances.

Introduction to Programming and Data Processing, by Andrea Vandin and Daniele Licari, was also held in different formats for the PhD Program in Computer Science at GSSI Gran Sasso Science Institute of L'Aquila ( and for Italian companies through the 21 ARTES4 Industry 4.0 Competence Center. This helped develop "gamified" quizzes for self-assessment and monitoring of the learning process, automatically tested coding exercises to provide hands on experience, integrated teaching material (slides and executable code in a single document), and various applications and examples.

We plan to further refine and expand the courses and modules offered in the Statistics, Computing & Data Processing sequence in a.y. 2021-22.