Seminari - Eventi| Università Cattolica del Sacro Cuore

Seminari - Eventi

Thursday, May 8th, 2025.
Alessandro CASA

Associate Professor - Free University of Bozen-Bolzano

Sparse High-Dimensional Covariance Estimation via Pairwise Composite Likelihood

Abstract

Estimating covariance matrices in high-dimensional settings poses significant computational and statistical challenges. Pairwise likelihood offers a practical alternative to the full likelihood by considering only bivariate marginal distributions. In this work, we introduce a novel approach for estimating sparse high-dimensional covariance matrices by maximizing a truncated pairwise likelihood function. Our method strategically selects only those pairwise likelihood terms corresponding to nonzero covariance elements, leading to significant computational gains and improved interpretability. The truncation is guided by minimizing the L2-distance between pairwise and full likelihood scores, coupled with an L1-penalty to promote sparsity at the level of pairwise likelihood components. Unlike traditional penalization methods that shrink individual covariance parameters, our technique performs direct selection of pairwise likelihood objects, thereby preserving the unbiasedness of the underlying estimating equations. We will present theoretical results establishing the selection consistency of our procedure, demonstrating its ability to correctly identify the nonzero covariance structure even as the dimensionality grows. Furthermore, we will showcase the empirical performance of our method on both synthetic and real-world datasets, highlighting its effectiveness in achieving stable and accurate covariance estimation.

In collaboration with Center for Applied Statistics in Business and Economics

Friday, April 4th, 2025
Bruno ARPINO
Full Professor - University of Padova.

Ties and Older People's Well-Being: Evidence and Methodological Challenges

Abstract:

Demographic transformations such as population aging, declining fertility, increasing migration, and shifts in marriage and partnership patterns are reshaping family structures and intergenerational relationships. As a result, older adults are experiencing new family dynamics, including rising kinlessness and evolving roles within multigenerational networks. Understanding how these changes impact the well-being of older people is crucial for both researchers and policymakers.
This seminar will explore the complex relationship between family ties and older adults’ well-being, drawing on empirical studies of changing family structures and relationships. Particular attention will be given to the role of grandparental childcare—an increasingly significant aspect of intergenerational support. I will discuss the potential benefits and challenges of this caregiving role, including its implications for grandparents' health and well-being.
A key methodological challenge in this area of research is establishing causal relationships between grandparental childcare and older people’s well-being. In the final part of the seminar, I will review the most commonly used statistical methods for causal inference and present the results of a small simulation study comparing their performance. The seminar will conclude with a discussion on future research directions, including the potential of machine learning techniques for advancing this field and the bidirectional link between family relationships and older adults’ engagement with digital technologies.

Tuesday, March 11th, 2025

Andrea Cappozzo

Associate Professor - Università Cattolica del Sacro Cuore.

Model-Based Clustering of Right-Censored Survival Data with Frailties and Random Covariates

Abstract:

We present a novel approach for clustering multilevel survival data that accounts for baseline heterogeneity and the local distributions of explanatory variables. The proposed method identifies patient clusters with distinct survival patterns and evaluates the hierarchical structure’s impact on survival within each cluster. A stochastic EM algorithm, specifically adapted for right-censored survival data, is employed to maximize the objective function. We demonstrate the effectiveness of the proposed methodology by analyzing survival times in COVID-19 patients with heart failure, successfully revealing latent patient profiles, assessing hospital-level effects within clusters, and quantifying the influence of respiratory diseases on survival.

Thursday, February 20th, 2025
Luis CARVALHO
Associate Professor- Department of Mathematics and Statistics - Boston University

Deviance Matrix Factorization

Abstract:
The singular value decomposition can be used to find a low-rank representation of a matrix under the Frobenius norm (entrywise square-error loss) and, for this reason, it enjoys an ubiquitous presence in many areas, including in Statistics with principal component and factor analyses. In this talk, we discuss a generalization of this matrix factorization, the deviance matrix factorization (DMF), that assumes broader deviance losses and thus allows for more meaningful and representative decompositions under different data domains and variance assumptions. We provide an efficient algorithm for the DMF and discuss using entrywise weights to represent missing data. We propose two tests to identify suitable decomposition ranks and data distributions and prove a few theoretical guarantees such as consistency. To showcase the practical performance of the proposed decomposition, we present a number of case studies in genetics, network analysis, and image classification. Finally, we offer a few directions for future work. This is joint work with Liang Wang.

Thursday, January 16th, 2025

Raya MUTTARAK

Full Professor - Dipartimento di Scienze Statistiche "Paolo Fortunati” - Università di Bologna

Exploring data and methodological approaches for assessing climate change impacts on population dynamics

Abstract:

The extreme record-breaking heat in April and May 2024 in the Asian continent, major hurricanes like Helene and Milton in the US and severe flooding in Central Europe and Emilia Romagna, to name a few, are examples of extreme events that are documented to be attributable to anthropogenic climate change. Indeed, it is evident that the impacts of human-induced climate change on our lives, livelihoods and wellbeing are already being felt. This raises a question whether, in which direction and to what extent climate change also influences demographic processes, through affecting fertility, mortality and migration, the three key demographic outcomes driving population change. Although it is highly plausible that climate change also affects population trends, to date existing global population projections have not taken into account the climate feedback on demographic processes. This talk aims to present current evidence on the impact of climatic factors on demographic outcomes with a focus on fertility and migration. I will explore the methodological approaches and data employed to examine the connections between climate change and demographic behaviors. Additionally, I will discuss whether population projections should incorporate the impacts of climate feedback on demographic processes.

Thursday, December 19th, 2024

Alessandro ZITO

Postdoctoral research fellow in Biostatistics at the Harvard T.H. Chan School of Public Health.

Compressive Bayesian non-negative matrix factorization for mutational signatures analysis

Abstract:

Non-negative matrix factorization (NMF) is widely used in many applications for dimensionality reduction. Inferring an appropriate number of factors for NMF is a challenging problem, and several approaches based on information criteria or sparsity-inducing priors have been proposed. However, inference in these models is often complicated and computationally costly. In this talk, we will describe a novel methodology for overfitted Bayesian NMF models that uses compressive hyperpriors to force unneeded factors down to negligible values while only imposing mild shrinkage on needed ones. The method uses simple semi-conjugate priors to facilitate inference while setting the strength of the hyperprior in a data-dependent way to achieve this compressive property. This results in a simple yet effective way to find the appropriate rank of any NMF decomposition, allowing for better interpretability of the resulting factors. We will discuss theoretical results establishing the compressive property, and show the benefits of our method within the context of mutational signatures analysis, which has become a routine practice in cancer genomics. In particular, our framework enables the use of biologically informed priors on the signatures, yielding significantly improved accuracy.

Thursday, June 27th, 2024

Donatello TELESCA

Professor of Biostatistics - UCLA

Mixed membership models and phase variability in functional data analysis

Abstract: A common concern in the field of functional data analysis is the challenge of temporal misalignment, which is typically addressed using curve registration methods. Currently, most of these methods assume the data is governed by a single common shape or a finite mixture of population level shapes. We introduce more flexibility using mixed membership models. Individual observations are assumed to partially belong to different pure mixtures, allowing for variation across multiple functional features. We propose a Bayesian hierarchical model to estimate the underlying shapes, as well as the individual time-transformation functions and levels of membership. Motivating this work is data from EEG signals in children with autism spectrum disorder (ASD). Our method agrees with the neuroimaging literature, recovering the 1/f pink noise feature distinctly from the peak in the alpha band. Furthermore, the introduction of a regression component in the estimation of time-transformation functions quantifies the effect of age and clinical designation on the location of the peak alpha frequency (PAF).

Wednesday June 12th, 2024

Stefano CASTRUCCIO

Associate Professor - University of Notre Dame, Indiana, USA.

Stochastic environmental modeling in a time of convergence: physics meets artificial intelligence

Abstract: It is widely acknowledged that the relentless surge of Volume, Velocity and Variety of data, as well as the simultaneous increase of computational resources have stimulated the development of data-driven methods with unprecedented flexibility and predictive power. However, not every environmental study entails a large data set: many applications ranging from astronomy or paleo-climatology have a high associated sampling cost and are instead constrained by physics-informed partial differential equations. Throughout the past few years, a new and powerful paradigm has emerged in the machine learning literature, merging data-driven and physics-informed problems, hence providing a unified framework for a whole spectrum of problems ranging from data-rich/context-poor to data-poor/context-rich. In this talk, I will present this new framework and discuss some of the most recent efforts to reformulate it as a stochastic model-based approach, thereby allowing calibrated uncertainty quantification.

Tuesday, May 28th, 2024

Fulvio DE SANTIS

Full Professore of Statistics - Università La Sapienza, Rome

On the distribution of the risk function induced by a prior

Abstract:

In the frequentist approach to statistical decision theory, the risk function quantifies the average performance of a decision over the sample space. In parametric inference, the risk function depends on the parameter of the model. Hence, when a prior distribution is assigned to the parameter, the risk function is a random variable, typically summarized by its expected value, the Bayes risk. However, for a good assessment of the quality of a decision function, it might be useful to explore the whole distribution of its random risk and to consider additional summaries to complement or to replace the Bayes risk. We here discuss some classes of standard yet relevant models and decision problems where the cdf and the pdf of the random risk can be determined in closed-form or easily approximated with basic Monte Carlo. Issues related to sample size determination are also discussed. Illustrative examples are taken from the literature on clinical trials, a context where this approach is receiving increasing attention.

Thursday, January 25th, 2024
Daniele Durante
Assistant professor of Statistics, Department of Decision Sciences, Bocconi University

Bayesian Nonparametric Stochastic Block Modeling of Criminal Networks

Abstract:

Europol recently defined criminal networks as a modern version of the Hydra mythological creature, with covert and complex structure. Indeed, relationships data among criminals are subject to measurement errors, structured missingness patterns, and exhibit a sophisticated combination of an unknown number of core-periphery, assortative and disassortative structures that may encode key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of community detection algorithms routinely-used in criminology, thereby leading to overly-simplified and possibly biased reconstructions of organized crime architectures. In this seminar, I will present a number of model-based solutions which aim at covering these gaps via a combination of stochastic block models and priors for random partitions arising from Bayesian nonparametrics. These include Gibbs-type priors, and random partition priors driven by the urn scheme of a hierarchical normalized completely random measure. Product-partition models to incorporate criminals' attributes, and zero-inflated Poisson representations accounting for weighted edges and security strategies, will be also discussed. Collapsed Gibbs samplers for posterior computation are presented, and refined strategies for estimation, prediction, uncertainty quantification and model selection will be outlined. Results are illustrated in an application to an Italian Mafia network, where the proposed models unveil a structure of the criminal organization mostly hidden to state-of-the-art alternatives routinely used in criminology. I will conclude the seminar with ideas on how to learn the evolutionary history of the criminal organization from the relationship data among its criminals via a novel combination of latent space models for network data and phylogenetic trees.

Wednesday, December 13th, 2023

Matteo Iacopini

Lecturer in Statistics, Queen Mary University of London

Static and Dynamic BART for Rank-Order Data

Abstract

Ranking lists are often provided at regular time intervals by one or multiple rankers in a range of applications, including sports, marketing, and politics. Most popular methods for rank-order data postulate a linear specification for the latent scores, which determine the observed ranks, and ignore the temporal dependence of the ranking lists. To address these issues, novel nonparametric static (ROBART) and autoregressive (ARROBART) models are introduced, with latent scores de- fined as nonlinear Bayesian additive regression tree functions of covariates.To make inferences in the dynamic ARROBART model, closed-form filtering, predictive, and smoothing distributions for the latent time-varying scores are de- rived. These results are applied in a Gibbs sampler with data augmentation for posterior inference.The proposed methods are shown to outperform existing competitors in sim- ulation studies, and the advantages of the dynamic model are demonstrated by forecasts of weekly pollster rankings of NCAA football teams.

Monday, June 5th, 2023

Marzia A. Cremona

Université Laval - Québec (Québec) G1V 0A6 (Canada)

smoothEM: a new approach for the simultaneous assessment of smooth patterns and spikes

Abstract:

We consider functional data where an underlying smooth curve is composed not just with errors, but also with irregular spikes that (a) are themselves of interest, and (b) can negatively affect our ability to characterize the underlying curve. We propose an approach that, combining regularized spline smoothing and an Expectation-Maximization algorithm, allows one to both identify spikes and estimate the smooth component. Imposing some assumptions on the error distribution, we prove consistency of EM estimates. Next, we demonstrate the performance of our proposal on finite samples and its robustness to assumptions violations through simulations. Finally, we apply our proposal to data on the annual heatwaves index in the US and on weekly electricity consumption in Ireland. In both datasets, we are able to characterize underlying smooth trends and to pinpoint irregular/extreme behaviors.

Work in collaboration with Huy Dang (Penn State University) and Francesca Chiaromonte (Penn State University and Sant’Anna School of Advanced Studies)

Thursday, May 18th, 2023

Matteo SESIA

Department of Data Sciences and Operations, Marshall School of Business - University of Southern California

Conformal Inference for Frequency Estimation with Sketched Data

Abstract

A flexible model-free method is developed to construct a confidence interval for the frequency of a queried object in a very large data set, based on a much smaller sketch of the data. The approach requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals for random queries using a conformal inference approach. After achieving marginal coverage for random queries under the assumption of data exchangeability, the proposed method is extended to provide stronger inferences accounting for possibly heterogeneous frequencies of different random queries, redundant queries, and distribution shifts. While the presented methods are broadly applicable, this work focuses on use cases involving the count-min sketch algorithm and a non-linear variation thereof, to facilitate comparison to prior work. In particular, the developed methods are compared empirically to frequentist and Bayesian alternatives, through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.

Slide seminario

Monday, March 27th, 2023

Augusto FASANO

Scalable and accurate variational Bayes for high-dimensional binary regression models and beyond

Abstract:

Bayesian binary probit regression and its extensions to time-dependent observations and multi-class responses are popular tools in binary and categorical data regression due to their interpretability and non-restrictive assumptions. Although the theory is well established in the frequentist literature, these models still face a florid research in the Bayesian framework to overcome computational issues or inaccuracies in high dimensions as well as the lack of a closed-form expression for the posterior distribution of the model parameters in many cases. We develop a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit regression with binary responses and Gaussian priors, resulting in a unified skew-normal (SUN) approximating distribution that converges to the exact posterior as the number of predictors increases. Moreover, we derive closed-form expressions for posterior distributions arising from models that account for correlated binary time-series and multi-class responses, developing computational methods that outperform state-of-the-art routines. Finally, we show that such methodological and computational results can be extended to a broad variety of routinely-used regression models leveraging on SUN conjugacy.

Monday, November 5th, 2022

Gerardo GALLO

Responsabile del Servizio censimento permanente della popolazione dell’Istat

I «Segnali di vita amministrativi»: cosa sono e come vengono utilizzati nel censimento permanente della popolazione

Abstract

Dal 1861 al 2001 i censimenti della popolazione hanno conosciuto notevoli cambiamenti. Nonostante questo, però, l’impianto portante del censimento della popolazione è rimasto più o meno immutato almeno fino all’inizio del nuovo millennio. Fino al 2001 la rilevazione censuaria veniva condotta su tutto il territorio nazionale attraverso l’impiego dei rilevatori comunali che si recavano porta a porta presso gli alloggi e le altre strutture abitative per contare le famiglie, le convivenze, le persone dimoranti abitualmente (o residenti) e rilevare le loro caratteristiche principali.

Il censimento del 2011 ha conservato l’enumerazione completa e simultanea, ma ha utilizzato, per la prima volta in Italia, una lista di partenza delle unità di rilevazione personalizzata in base ai dati nominativi delle Liste anagrafiche comunali (Lac) di tutti i comuni italiani. Questo ha rappresentato una tappa decisiva per realizzare un registro statistico della popolazione residente centralizzato, rendendo così possibile negli anni successivi la realizzazione del censimento italiano basato sull’uso di dati amministrativi, proprio come accade già da tempo in altri paesi europei.

Nel 2012, con il provvedimento legislativo n. 221/2012 di conversione in legge del decreto-legge n. 179/2012, si è dato avvio al censimento permanente della popolazione realizzato ogni anno, e fondato sulla combinazione di rilevazioni campionarie e dati di fonte amministrativa trattati statisticamente.

In questo quadro il seminario intende illustrare le più recenti procedure adottate dai ricercatori dell’Istituto Nazionale di Statistica per selezionare le fonti statistiche adeguate ad osservare i «segnali di vita» degli individui in termini di «dimora abituale in Italia». Si mostrerà inoltre come tali segnali intervengono nel processo complessivo di conteggio della popolazione, nel rispetto dei regolamenti europei in materia di censimenti. Si accennerà infine alle vie suggerite dall’Istat per realizzare ricerche basate sull’utilizzo di dati censuari.

Thurday, November 1st, 2022

Andrea POZZI

Università Cattolica del Sacro Cuore di Brescia

Solving Sequential Decision-Making Problems with Reinforcement Learning

Abstract

Reinforcement learning is an area of in an environment to maximize the notion of cumulative reward, where the interaction between the agent and the environment is usually represented as a Markov Decision Process. Reinforcement learning, which is studied in computer science, control theory, and statistics, is one of the most promising artificial intelligence techniques in the last decade. In particular, its use in combination with deep neural networks has led to extraordinary results, such as achieving super-human performance in numerous games (see for instance Mnih et al. 2013, “Playing Atari with Deep Reinforcement Learning”). In real-world scenarios, reinforcement learning is mainly used when: i) a model of the environment is known but an analytic solution is not available, ii) the only way to collect information about the environment is to directly interact with it. Among the different real-world applications of reinforcement learning, the focus is here directed to: autonomous vehicle control; algo-trading and portfolio management in finance; text summarization in natural language processing.

Webinar Video

Friday, July 15th, 2022

Mario BERAHA

Politecnico di Milano

Normalized Latent Measure Factor Models

Abstract

We propose a methodology for modeling and comparing probability distributions within a Bayesian nonparametric framework. Building on dependent normalized random measures, we consider a prior distribution for a collection of discrete random measures where each measure is a linear combination of a set of latent measures, interpretable as characteristic traits shared by different distributions, with positive random weights. The model is non-identified and a method for post-processing posterior samples to achieve identified inference is developed. This uses Riemannian optimization to solve a non-trivial optimization problem over a Lie group of matrices. The effectiveness of our approach is validated on simulated data and in two applications to two real-world data sets: school student test scores and personal incomes in California. Our approach leads to interesting insights for populations and easily interpretable posterior inference

Friday, July 15th, 2022

Andrea GILARDI

Università degli studi Milano Bicocca

Lattice models for spatial data on linear networks

Abstract

In the last years, we observed a surge of interest in the statistical analysis of spatial data lying on or alongside networks. Car crashes, vehicle thefts, and ambulance interventions are just few of the most typical examples, whereas the edges of the network represent an abstraction of roads, rivers or railways. In this talk, we discuss two approaches for the analysis of car-crashes at the street network level. In both cases, the analyses are based on a major city (Leeds, UK) in which car crashes of different severity were recorded over several years. In the first project, we introduce a multivariate Bayesian hierarchical model that includes spatially structured and unstructured random effects to capture the spatial nature of the events and the dependencies between the severity levels. We also discuss a novel procedure for testing the presence of MAUP at the network-lattice level. In the second part of the talk, we present a series of preliminary results that extend the first project by including an external covariate that suffers from spatial measurement error. The suggested methodology is exemplified considering estimates of traffic volumes at the road network level obtained from mobile devices.

Wednesday, June 8th, 2022

Statistical Bridges Series

Robert MATTHEWS
Department of Mathematics – Aston University, Birmingham, UK

The Replication Crisis in Research: A Progress Report

Abstract

Evidence for the unreliability of research claims continues to grow, and has led to the emergence of the so-called "replication crisis". Arguably the most significant cause of such unreliability is the concept of statistical significance. Its ability to undermine reliable inference has been noted in fields from medicine and psychology to finance and computer science. Following an unprecedented call for action from the American Statistical Association, the statistical community has responded with a range of alternatives, from quick fixes to paradigm shifts. I review the impact of these attempts to move beyond statistical significance, and suggest some ways forward.

Webinar Video

Tuesday, May 31, 2022

Francesca CHIAROMONTE

Scuola superiore Sant'Anna e Penn State University

Information matrices and Numbers in Large Supervised Problems

Abstract

Contemporary high-throughput data gathering techniques, measuring massive numbers of features simultaneously and/or merging information from multiple sources, lead to high or ultra-high dimensional supervised problems. In this talk I will briefly introduce two classes of statistical methods used to tackle such problems: Sufficient Dimension Reduction (SDR) techniques, which extract a small number of synthetic features to capture information on the outcome variable; and Screening algorithms, which are used to remove features irrelevant to the outcome prior to the use of dimension reduction or feature selection techniques. In particular, I will present recent SDR [1] and Screening [2] approaches based on a Fisher Information framework. This is joint work with Debmalya Nandy (University of Colorado), Weixin Yao (UC Riverside), Runze Li (Penn State University) and Bruce Lindsay (in memoriam).

[1] Weixin Yao, Debmalya Nandy, Bruce G. Lindsay & Francesca Chiaromonte (2019) Covariate Information Matrix for Sufficient Dimension Reduction, Journal of the American Statistical Association, 114:528, 1752-1764, DOI: 10.1080/01621459.2018.1515080

[2] Debmalya Nandy, Francesca Chiaromonte & Runze Li (2021) Covariate Information Number for Feature Screening in Ultrahigh-Dimensional Supervised Problems, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1864380.

Wednesday, April 13, 2022

Francesco DENTI

Dipartimento di Scienze statistiche, Università Cattolica del Sacro Cuore

Bayesian nonparametric mixtures for novelty detection and partially exchangeable data

Abstract

Bayesian mixtures have become an extremely popular statistical tool: they are widely applied to address different tasks such as density estimation, model-based clustering, and novelty detection.

In this talk, we will briefly introduce the fundaments of this topic with a specific focus on nonparametric mixtures, along with the computational methods used to perform inference.

Then, we will discuss how we can extend the basic models to address the challenges that arise when dealing with more complex data, discussing two examples.

First, we will present a two-stage semiparametric Bayesian model for novelty detection called Brand. Brand is a model-based classifier that robustly learns the characteristics of observed classes from a training set while allowing for the presence of unseen classes in the test set. The classification of the test data into seen and unseen classes - the latter containing novelties and outliers - is obtained via nested mixtures.

Second, we will discuss mixtures for partially exchangeable datasets, where observations are naturally organized into groups. Examples of this setting are microbiome abundance tables, where microbes frequencies are recorded within different subjects, or the famous Spotify datasets, which present audio features for multiple songs authored by various artists.

After introducing the Nested Dirichlet process, we will discuss a potential shortcoming of its usage for distributional clustering and propose a solution discussing the Common Atom Model (CAM).

CAM is a Bayesian nonparametric model that allows the estimation of a two-layer clustering solution, grouping observations and groups while allowing information sharing across the statistical units.

Webinar Video

Mercoledì 2 Marzo 2022

Marco FATTORE

Dipartimento di Statistica e Metodi Quantitativi, Università degli Studi di Milano-Bicocca

Riduzione della dimensionalità ed estrazione di ranking da sistemi multidimensionali di indicatori ordinali

Abstract.

Il problema della costruzione di indici sintetici e di ranking a partire da sistemi multidimensionali di indicatori ordinali è sempre più diffuso, nell’ambito della statistica socio-economica e a supporto di processi di valutazione e di multi-criteria decision/policy making; ciononostante, l’apparato metodologico per l’analisi di dati ordinali a più dimensioni è ancora limitato e in larga parte mutuato dall’analisi statistica di variabili quantitative. Obiettivo del seminario è mostrare come sia invece possibile impostare una “analisi ordinale dei dati”, importando nella metodologia statistica le corrette strutture matematiche, a partire dalla Teoria delle Relazioni d’Ordine, una parte della matematica discreta dedicata alle proprietà degli insiemi parzialmente ordinati e quasi-ordinati. In particolare, prendendo le mosse dal problema della misurazione della povertà multidimensionale, del benessere o della sostenibilità, il seminario affronta il tema della riduzione della dimensionalità e dell’estrazione di ranking per sistemi di dati ordinali e parzialmente ordinati, introducendo ai più recenti sviluppi metodologici, illustrando sia gli algoritmi già disponibili che quelli in corso di sviluppo e discutendo i loro punti di forza e di debolezza, con particolare attenzione agli aspetti computazionali. Il seminario si conclude, con un’illustrazione delle linee di ricerca, teoriche e applicate, in corso di sviluppo, fornendo una mappa dei problemi risolti e di quelli ancora aperti, nell’ambito dell’analisi multidimensionale ordinale di dati socio-economici.

Monday, October 11, 2021

Statistical Bridges Series

Eric-Jan Wagenmakers
Department of Psychological Methods, University of Amsterdam, https://ejwagenmakers.com

From p-values to Bayesian evidence

Abstract
Despite its continuing dominance in empirical research, the p-value suffers from a series of well-known statistical limitations; for instance, it cannot quantify evidence in favor of the null hypothesis, it cannot be monitored until the results are sufficiently compelling, and it tends to reject the null even when the evidence is ambiguous. Here I present a simple set of equations that allow researchers to transform their p-values into an approximate objective Bayes factor for the test of a point null hypothesis against a composite alternative. The transformed quantity is able to quantify evidence in favor of the null hypothesis, may be monitored until it is sufficiently compelling, and does not reject the null when the evidence is ambiguous.

Webinar Video

Friday, June 11, 2021

Statistical Bridges Series

Elizabeth Ogburn
Johns Hopkins University, Department of Biostatistics - https://www.eogburn.com

Social network dependence, unmeasured confounding, and the replication crisis

Abstract
In joint work with Youjin Lee, we showed that social network dependence can result in spurious associations, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we tested for possible spurious associations due to network dependence in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. In the latter part of the talk I will discuss how the phenomenon of spurious associations due to dependence is related to unmeasured confounding by network structure, akin to confounding by population structure in GWAS studies, and how this relationship sheds light on methods to control for both spurious associations and unmeasured confounding.

Most relevant paper:
Youjin Lee & Elizabeth L. Ogburn (2020): Network Dependence Can Lead to Spurious Associations and Invalid Inference, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1782219
https://www.tandfonline.com/doi/pdf/10.1080/01621459.2020.1782219?casa_token=rvG9oMzmKLIAAAAA:T6IW864UWrT9z1yctCnZf0qAByjVbsOseMvsaw3uWmp1jhY8bQdiEFLXbzEFd8XFZY8qYQHDw-1W

Webinar Video
Interview

Wednesday, May 26, 2021
Jianyi Lin
Università Cattolica del Sacro Cuore

Computational complexity and exact algorithms for size-constrained clustering problems

Abstract
Classical clustering analysis in geometric ambient space can benefit from a-priori background information in the form of problem constraints, such as cluster size constraints introduced for avoiding unbalanced clusterings and hence improving solution’s quality. I will present the problem of finding a partition of a given set of n points of the d-dimensional real space into k clusters such that the lp-norm induced distance of all points from their cluster centroid is globally minimized and each cluster has a prescribed cardinality. Such general problem is as computationally intractable as its unconstrained counterpart, the k-Means problem, which was shown to be NP-hard for a general parameter k by S. Dasgupta in 2008 only, although the corresponding heuristic is widespread since the ‘50s. Computational hardness results will be presented for certain variants of the size-constrained geometrical clustering, while in the polynomial time and space tractable cases some globally optimal exact algorithms based on computational and algebraic geometry techniques will be illustrated.

Wednesday, May 5, 2021
Stefano Rizzelli
Università Cattolica del Sacro Cuore - École Polytecnique Fédérale de Lausanne

Data-dependent choice of prior hyperparameters in Bayesian inference: consistency and merging of posterior distributions

Abstract
The Bayesian inferential paradigm prescribes the specification of a prior distribution on the parameters of the statistical model. For complex models, the subjective elicitation of prior hyper-parameters can be a delicate and difficult task. This is particularly the case for hyper-parameters affecting posterior inference via complexity penalization, shrinkage effects, etc. In absence of sufficient information a priori, a principled specification of a hyper-prior distribution can be difficult too and complicate computations. It is common practice to resort to a data-driven choice of the prior hyper-parameters as a shortcut: this approach is commonly called empirical Bayes (EB). Although not rigorous from a Bayesian standpoint, the traditional folklore of EB analysis is that it provides approximations to genuine Bayesian inference, while enjoying some frequentist asymptotic guarantees. We give a new illustration of EB posterior consistency in a semiparametric estimation problem, involving the analysis of extreme multivariate events. We then drift to parametric models and focus on merging in total variation between EB and Bayesian posterior/predictive distributions, almost surely as the sample size increases. We provide new results refining those in Petrone et al. (2014) and illustrate their applications in the context of variable selection.

Static List 8931 ( KB)

Sesia-Lucidi ( KB)

Seminari - Eventi

Archivio seminari ed eventi degli anni precedenti:

Dipartimento di Scienze statistiche