High dimensional statistical data analysis: an overview from general concepts to​ ​an application to breast cancer survival prediction

  • Date June 20, 2018
  • Hour 3 pm
  • Room GSSI Library
  • Speaker Claudia Angelini (CNR Istituto per le Applicazioni del Calcolo “Mauro Picone”)
  • Area Mathematics


Modern applications of statistical theory and methods require the analysis of large datasets in which a huge number of variables (or features) p is observed on a limited number of samples N. A typical example is given by genomics, where the expression of thousands of genes (or other omic features) is evaluated on tens or at most hundreds of individuals. From a statistical point of view such situation lead to the socall “p>>N” problem and to the so-called curse of dimensionality. Therefore, the analysis of high dimensional data has constituted one of the great statistical challenges in the last two decades. In this seminar, we will briefly review the main concepts and problems that arise when analyzing high dimensional data, then we describe a recent approach that combines variable screening and penalized network- based Cox-regression models for the identification of high- and low-risk groups in breast cancer and the selection of potential biomarkers.