scFedVI: A Privacy-Preserving Approach to Mitigating Batch Effects in Single-Cell RNA-Sequencing Data

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 67

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_183

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

The growing field of single-cell RNA sequencing (scRNA-seq) has revolutionized ourunderstanding of cellular heterogeneity. In this study, we introduce Single-Cell Federated VariationalInference (scFedVI), a novel federated learning-based method to address the challenge of batch effectsin scRNA-seq data analysis. Batch effects, arising from variances in cell processing such as differentchips, sequencing lanes, or harvest times, significantly affect transcriptome measurements, creatingdiscrepancies within and across experiments. To mitigate these, we integrated deep neural networks,specifically Variational Autoencoders (VAEs) [۱], into our federated learning framework forsophisticated batch correction, enhancing biological insights while maintaining data integrity. Ourapproach utilizes the inherent differences in each client's dataset as a feature rather than a limitation,enabling more robust and generalizable models. By distributing the learning process across clients, eachpossessing their unique scRNA-seq dataset with distinct batch characteristics, we employed theFederated Averaging (Fed-Avg) [۲] algorithm to aggregate the learned models. This approach,prioritizing data privacy, demonstrates enhanced effectiveness in batch effect correction compared torunning single-cell variational inference individually on each client’s data. Performance evaluationusing the k-nearest-neighbor batch effect test (kBET) and the Adjusted Rand Index (ARI) for clusteringconfirms that scFedVI outperforms scVI [۳], a current leading method in batch correction andintegration for single-cell data. Furthermore, we establish the robustness of scFedVI by testing variousscenarios involving different numbers of clients, ranging from ۲ to ۵. Our results, validated acrossdiverse pancreatic and nervous system scRNA-seq datasets, illustrate that the scFedVI not onlyeffectively corrects batch effects but also utilizes these variations to enhance overall data analysis. Thisis a significant advancement over conventional non-private batch correction methods, which typicallyaim to merely eliminate these effects. This method opens new avenues for collaborative research acrossdifferent laboratories without compromising data privacy or integrity.

نویسندگان

P Mokhber

Department of Computer Science, Sharif University of Technology, Tehran, Iran

A Gargoori Motlagh

Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran