In the world of scientific research, multisite studies are essential for gathering comprehensive data. However, these collaborative studies often face a significant challenge: some sites may not hold all key variables. Traditionally, researchers would use data from sites with recorded observations to fill the missing gap, but this isn’t always possible due to logistical or legal constraints.
To address this issue, cross-site imputation was developed, which is a novel approach that allows researchers to recover missing data without pooling individual-level information. Instead of sharing raw data, sites can share predicted regression coefficients and variances that contain the statistical relationship between variables. This method was successfully applied to recover missing data in studies across Swedish hospitals, ensuring that all sites can be included in the analysis.
This novel solution is particularly important for multi-center studies, where privacy and data security are paramount. By extending multiple imputation techniques to these settings, researchers can maintain data privacy while still addressing missing data issues. Further methodological research will need to address when cross-site imputation cannot be used due to large heterogeneity between study sites.
In summary, cross-site imputation offers a practical and privacy-preserving solution for handling missing data in multisite studies, paving the way for more robust and inclusive research. This method involves sharing predicted regression coefficients and variances from studies with observed data to impute missing variables in studies without data.
Publication
Thiesmeier R, Madley-Dowd P, Orsini N, Ahlqvist VH. Cross-site imputation can recover missing variables in federated multicenter studies. Journal of Clinical Epidemiology. 2025:184:111820. doi: 10.1016/j.jclinepi.2025.111820.