A Contrastive Learning Framework for Single-Cell Multi-Omics Data Integration
سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 95
متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IBIS13_084
تاریخ نمایه سازی: 10 اردیبهشت 1404
چکیده مقاله:
The advancement of single-cell omics technologies has changed our understanding of biological systems' functionalities and heterogeneities. Methods such as SHARE-seq and SNARE-seq capture gene expression and chromatin accessibility, while CITE-seq measures gene expression and cell surface protein abundance. However, analyzing each modality independently can lead to partial insights. Integrating these modalities offers a solution but is challenging due to differences in their distributions and feature spaces. There has been a lot of effort to develop efficient computational frameworks to address this problem. The majority of these approaches learn low-dimensional joint embeddings of the omics modalities. Some of these methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) use linear transformations of input data, with tools such as Seurat combining PCA, CCA, and Mutual Nearest Neighbors for alignment. MOFA uses matrix factorization to derive shared and modality-specific representations. More recent multi-modal deep learning approaches, such as scglue, employ variational autoencoders to capture hierarchical, nonlinear patterns and align multi-omics representations end-to-end. While these methods demonstrate promising results and strong performances, they often suffer from low signal-to-noise ratios (Wang et al., ۲۰۲۳) and over-complicated architectures. Here, we present a neural network architecture inspired by the CLIP model developed by OpenAI (Radford et al., ۲۰۲۱) for paired single-cell multi-omics integration. This framework consists of two encoders, each learning a low-dimensional representation of the input modality. Then, these representations are aligned using a contrastive loss function. We benchmarked this model with two baselines (PCA and an Auto Encoder with reconstruction loss) and three state-of-the-art models (MOFA (Argelaguet et al., ۲۰۱۸), Harmony (Korsunsky et al., ۲۰۱۹), and Con-AAE (Wang et al., ۲۰۲۳)) on three real-world datasets including SHARE-seq (Ma et al., ۲۰۲۰), PBMC (۱۰x Genomics, ۲۰۲۰), and CITE-seq (Stoeckius et al., ۲۰۱۷). All evaluations were performed on unseen test datasets with ۱۰ replications. Benchmarks were based on four measures: Average Silhouette Width (ASW) for clustering quality of latent representations based on cell types, Recall at k, Cell type accuracy, and Median Rank for the quality of integration. Results show that our framework outperforms other models in most of the metrics. Moreover, it achieved high ASW values compared to original datasets which reflect the ability of the model to denoise single-cell data and extract biological signals. In addition, we assess the model's ability to handle unpaired multi-omics data which shows high values for most metrics compared to other frameworks. These findings position our framework as a high-potential platform capable of extending to downstream applications such as cell-type annotation and disease subtyping.
کلیدواژه ها:
نویسندگان
Amir Ebrahimi
Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran
Alireza Fotuhi Siahpirani
Department of Bioinformatics, Institute of Biochemistry & Biophysics, University of Tehran, Tehran, Iran
Hesam Montazeri
Department of Bioinformatics, Institute of Biochemistry & Biophysics, University of Tehran, Tehran, Iran