skip to content

Department of Pure Mathematics and Mathematical Statistics

<p><span style="color: rgb(33, 33, 33);">(Joint work with Ellen Graham and Marco Carone) </span></p><p><br></p><p><span style="color: rgb(33, 33, 33);">The work I will present in this talk contributes to the broader goal of developing a unified, automated methodology for efficient debiased machine learning inference using individual-level data fused from multiple independent sources. The increasing availability of such data has spurred the development of new statistical theories for data integration, including a recent comprehensive framework by Li and Luedtke for cases where data sources align with different subsets of conditional distributions corresponding to a single factorization of the target distribution. However, many real-world data fusion problems violate this structure. Examples include integrating data from different epidemiological study designs, addressing measurement error with validation studies, or handling two-sample instrumental variable problems---cases where existing theory falls short.</span></p><p><br></p><p><span style="color: rgb(33, 33, 33);">In this talk, I will introduce a new framework that significantly extends the reach of the current theory by enabling the integration of individual-level data when sources align with conditional distributions that do not conform to a single factorization of the target distribution. I will present universal results characterizing the class of influence functions, and the efficient one, for regular asymptotically linear estimators and the efficient influence function for any pathwise differentiable parameter, regardless of the number of data sources, the parameter of interest, or the statistical model. This theory opens new avenues for machine-learning-assisted, semiparametric efficient estimation, pushing the boundaries of data integration in modern statistical science.</span></p>

Further information

Time:

08May
May 8th 2026
13:00 to 14:00

Venue:

MR12, Centre for Mathematical Sciences

Series:

Statistics