- Milano
- Dipartimento di Scienze statistiche
- Seminari - Eventi

## Seminari - Eventi

Monday, October 11, 2021, at 17.00 (CEST)

**Eric-Jan Wagenmakers**

Department of Psychological Methods, University of Amsterdam, http://ejwagenmakers.com

**From p-values to Bayesian evidence**

**Abstract**

Despite its continuing dominance in empirical research, the p-value suffers from a series of well-known statistical limitations; for instance, it cannot quantify evidence in favor of the null hypothesis, it cannot be monitored until the results are sufficiently compelling, and it tends to reject the null even when the evidence is ambiguous. Here I present a simple set of equations that allow researchers to transform their p-values into an approximate objective Bayes factor for the test of a point null hypothesis against a composite alternative. The transformed quantity is able to quantify evidence in favor of the null hypothesis, may be monitored until it is sufficiently compelling, and does not reject the null when the evidence is ambiguous.

Friday, June 11, 2021, at 18.00 (CEST)

**Elizabeth Ogburn**

Johns Hopkins University, Department of Biostatistics - https://www.eogburn.com

**Social network dependence, unmeasured confounding, and the replication crisis**

**Abstract**

In joint work with Youjin Lee, we showed that social network dependence can result in spurious associations, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we tested for possible spurious associations due to network dependence in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. In the latter part of the talk I will discuss how the phenomenon of spurious associations due to dependence is related to unmeasured confounding by network structure, akin to confounding by population structure in GWAS studies, and how this relationship sheds light on methods to control for both spurious associations and unmeasured confounding.

**Most relevant paper:**

Youjin Lee & Elizabeth L. Ogburn (2020): Network Dependence Can Lead to Spurious Associations and Invalid Inference, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1782219

https://www.tandfonline.com/doi/pdf/10.1080/01621459.2020.1782219?casa_token=rvG9oMzmKLIAAAAA:T6IW864UWrT9z1yctCnZf0qAByjVbsOseMvsaw3uWmp1jhY8bQdiEFLXbzEFd8XFZY8qYQHDw-1W

Wednesday, **May 26**, at **11.30 a.m.**

**Jianyi Lin**

Università Cattolica del Sacro Cuore

**Computational complexity and exact algorithms for size-constrained clustering problems**

**Abstract**

Classical clustering analysis in geometric ambient space can benefit from a-priori background information in the form of problem constraints, such as cluster size constraints introduced for avoiding unbalanced clusterings and hence improving solution’s quality. I will present the problem of finding a partition of a given set of n points of the d-dimensional real space into k clusters such that the lp-norm induced distance of all points from their cluster centroid is globally minimized and each cluster has a prescribed cardinality. Such general problem is as computationally intractable as its unconstrained counterpart, the k-Means problem, which was shown to be NP-hard for a general parameter k by S. Dasgupta in 2008 only, although the corresponding heuristic is widespread since the ‘50s. Computational hardness results will be presented for certain variants of the size-constrained geometrical clustering, while in the polynomial time and space tractable cases some globally optimal exact algorithms based on computational and algebraic geometry techniques will be illustrated.

Wednesday, **May 5**, at **11.30 a.m.**

**Stefano Rizzelli**

Università Cattolica del Sacro Cuore - École Polytecnique Fédérale de Lausanne

**Data-dependent choice of prior hyperparameters in Bayesian inference: consistency and merging of posterior distributions**

**Abstract**

The Bayesian inferential paradigm prescribes the specification of a prior distribution on the parameters of the statistical model. For complex models, the subjective elicitation of prior hyper-parameters can be a delicate and difficult task. This is particularly the case for hyper-parameters affecting posterior inference via complexity penalization, shrinkage effects, etc. In absence of sufficient information a priori, a principled specification of a hyper-prior distribution can be difficult too and complicate computations. It is common practice to resort to a data-driven choice of the prior hyper-parameters as a shortcut: this approach is commonly called empirical Bayes (EB). Although not rigorous from a Bayesian standpoint, the traditional folklore of EB analysis is that it provides approximations to genuine Bayesian inference, while enjoying some frequentist asymptotic guarantees. We give a new illustration of EB posterior consistency in a semiparametric estimation problem, involving the analysis of extreme multivariate events. We then drift to parametric models and focus on merging in total variation between EB and Bayesian posterior/predictive distributions, almost surely as the sample size increases. We provide new results refining those in Petrone et al. (2014) and illustrate their applications in the context of variable selection.