Università Cattolica del Sacro Cuore

Seminari - Eventi

Wednesday, June 8th, 2022

Statistical Bridges Series

Department of Mathematics – Aston University, Birmingham, UK

The Replication Crisis in Research: A Progress Report


Evidence for the unreliability of research claims continues to grow, and has led to the emergence of the so-called "replication crisis".  Arguably the most significant cause of such unreliability is the concept of statistical significance. Its ability to undermine reliable inference has been noted in fields from medicine and psychology to finance and computer science. Following an unprecedented call for action from the American Statistical Association, the statistical community has responded with a range of alternatives, from quick fixes to paradigm shifts. I review the impact of these attempts to move beyond statistical significance, and suggest some ways forward. 


Webinar Video



Tuesday, May 31, 2022 


Scuola superiore Sant'Anna e Penn State University

Information matrices and Numbers in Large Supervised Problems



Contemporary high-throughput data gathering techniques, measuring massive numbers of features simultaneously and/or merging information from multiple sources, lead to high or ultra-high dimensional supervised problems. In this talk I will briefly introduce two classes of statistical methods used to tackle such problems: Sufficient Dimension Reduction (SDR) techniques, which extract a small number of synthetic features to capture information on the outcome variable; and Screening algorithms, which are used to remove features irrelevant to the outcome prior to the use of dimension reduction or feature selection techniques. In particular, I will present recent SDR [1] and Screening [2] approaches based on a Fisher Information framework. This is joint work with Debmalya Nandy (University of Colorado), Weixin Yao (UC Riverside), Runze Li (Penn State University) and Bruce Lindsay (in memoriam).

[1] Weixin Yao, Debmalya Nandy, Bruce G. Lindsay & Francesca Chiaromonte (2019) Covariate Information Matrix for Sufficient Dimension Reduction, Journal of the American Statistical Association, 114:528, 1752-1764, DOI: 10.1080/01621459.2018.1515080

[2] Debmalya Nandy, Francesca Chiaromonte & Runze Li (2021) Covariate Information Number for Feature Screening in Ultrahigh-Dimensional Supervised Problems, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1864380.



Wednesday, April 13, 2022

Francesco DENTI

Dipartimento di Scienze statistiche, Università Cattolica del Sacro Cuore


Bayesian nonparametric mixtures for novelty detection and partially exchangeable data



Bayesian mixtures have become an extremely popular statistical tool: they are widely applied to address different tasks such as density estimation, model-based clustering, and novelty detection.

In this talk, we will briefly introduce the fundaments of this topic with a specific focus on nonparametric mixtures, along with the computational methods used to perform inference.

Then, we will discuss how we can extend the basic models to address the challenges that arise when dealing with more complex data, discussing two examples.

First, we will present a two-stage semiparametric Bayesian model for novelty detection called Brand. Brand is a model-based classifier that robustly learns the characteristics of observed classes from a training set while allowing for the presence of unseen classes in the test set. The classification of the test data into seen and unseen classes - the latter containing novelties and outliers - is obtained via nested mixtures.

Second, we will discuss mixtures for partially exchangeable datasets, where observations are naturally organized into groups. Examples of this setting are microbiome abundance tables, where microbes frequencies are recorded within different subjects, or the famous Spotify datasets, which present audio features for multiple songs authored by various artists.

After introducing the Nested Dirichlet process, we will discuss a potential shortcoming of its usage for distributional clustering and propose a solution discussing the Common Atom Model (CAM).

CAM is a Bayesian nonparametric model that allows the estimation of a two-layer clustering solution, grouping observations and groups while allowing information sharing across the statistical units.


Webinar Video



Mercoledì 2 Marzo 2022


Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca


Riduzione della dimensionalità ed estrazione di ranking da sistemi multidimensionali di indicatori ordinali



Il problema della costruzione di indici sintetici e di ranking a partire da sistemi multidimensionali di indicatori ordinali è sempre più diffuso, nell’ambito della statistica socio-economica e a supporto di processi di valutazione e di multi-criteria decision/policy making; ciononostante, l’apparato metodologico per l’analisi di dati ordinali a più dimensioni è ancora limitato e in larga parte mutuato dall’analisi statistica di variabili quantitative. Obiettivo del seminario è mostrare come sia invece possibile impostare una “analisi ordinale dei dati”, importando nella metodologia statistica le corrette strutture matematiche, a partire dalla Teoria delle Relazioni d’Ordine, una parte della matematica discreta dedicata alle proprietà degli insiemi parzialmente ordinati e quasi-ordinati. In particolare, prendendo le mosse dal problema della misurazione della povertà multidimensionale, del benessere o della sostenibilità, il seminario affronta il tema della riduzione della dimensionalità e dell’estrazione di ranking per sistemi di dati ordinali e parzialmente ordinati, introducendo ai più recenti sviluppi metodologici, illustrando sia gli algoritmi già disponibili che quelli in corso di sviluppo e discutendo i loro punti di forza e di debolezza, con particolare attenzione agli aspetti computazionali. Il seminario si conclude, con un’illustrazione delle linee di ricerca, teoriche e applicate, in corso di sviluppo, fornendo una mappa dei problemi risolti e di quelli ancora aperti, nell’ambito dell’analisi multidimensionale ordinale di dati socio-economici.



Monday, October 11, 2021

Statistical Bridges Series

Eric-Jan Wagenmakers
Department of Psychological Methods, University of Amsterdam, http://ejwagenmakers.com   

From p-values to Bayesian evidence

Despite its continuing dominance in empirical research, the p-value suffers from a series of well-known statistical limitations; for instance, it cannot quantify evidence in favor of the null hypothesis, it cannot be monitored until the results are sufficiently compelling, and it tends to reject the null even when the evidence is ambiguous. Here I present a simple set of equations that allow researchers to transform their p-values into an approximate objective Bayes factor for the test of a point null hypothesis against a composite alternative. The transformed quantity is able to quantify evidence in favor of the null hypothesis, may be monitored until it is sufficiently compelling, and does not reject the null when the evidence is ambiguous.

Webinar Video


Friday, June 11, 2021

Statistical Bridges Series

Elizabeth Ogburn
Johns Hopkins University, Department of Biostatistics - https://www.eogburn.com   

Social network dependence, unmeasured confounding, and the replication crisis

In joint work with Youjin Lee, we showed that social network dependence can result in spurious associations, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we tested for possible spurious associations due to network dependence in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure.  In the latter part of the talk I will discuss how the phenomenon of spurious associations due to dependence is related to unmeasured confounding by network structure, akin to confounding by population structure in GWAS studies, and how this relationship sheds light on methods to control for both spurious associations and unmeasured confounding.

Most relevant paper:
Youjin Lee & Elizabeth L. Ogburn (2020): Network Dependence Can Lead to Spurious Associations and Invalid Inference, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1782219

Webinar Video


Wednesday, May 26, 2021
Jianyi Lin
Università Cattolica del Sacro Cuore

Computational complexity and exact algorithms for size-constrained clustering problems

Classical clustering analysis in geometric ambient space can benefit from a-priori background information in the form of problem constraints, such as cluster size constraints introduced for avoiding unbalanced clusterings and hence improving solution’s quality. I will present the problem of finding a partition of a given set of n points of the d-dimensional real space into k clusters such that the lp-norm induced distance of all points from their cluster centroid is globally minimized and each cluster has a prescribed cardinality. Such general problem is as computationally intractable as its unconstrained counterpart, the k-Means problem, which was shown to be NP-hard for a general parameter k by S. Dasgupta in 2008 only, although the corresponding heuristic is widespread since the ‘50s. Computational hardness results will be presented for certain variants of the size-constrained geometrical clustering, while in the polynomial time and space tractable cases some globally optimal exact algorithms based on computational and algebraic geometry techniques will be illustrated.


Wednesday, May 5, 2021 
Stefano Rizzelli
Università Cattolica del Sacro Cuore - École Polytecnique Fédérale de Lausanne

Data-dependent choice of prior hyperparameters in Bayesian inference: consistency and merging of posterior distributions

The Bayesian inferential paradigm prescribes the specification of a prior distribution on the parameters of the statistical model. For complex models, the subjective elicitation of prior hyper-parameters can be a delicate and difficult task. This is particularly the case for hyper-parameters affecting posterior inference via complexity penalization, shrinkage effects, etc. In absence of sufficient information a priori, a principled specification of a hyper-prior distribution can be difficult too and complicate computations. It is common practice to resort to a data-driven choice of the prior hyper-parameters as a shortcut: this approach is commonly called empirical Bayes (EB). Although not rigorous from a Bayesian standpoint, the traditional folklore of EB analysis is that it provides approximations to genuine Bayesian inference, while enjoying some frequentist asymptotic guarantees. We give a new illustration of EB posterior consistency in a semiparametric estimation problem, involving the analysis of extreme multivariate events. We then drift to parametric models and focus on merging in total variation between EB and Bayesian posterior/predictive distributions, almost surely as the sample size increases. We provide new results refining those in Petrone et al. (2014) and illustrate their applications in the context of variable selection.

Archivio seminari ed eventi degli anni precedenti: