Enhancing Model Evaluation in Single-Cell Perturbation Response Prediction

سال انتشار: 1402
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 89

نسخه کامل این مقاله ارائه نشده است و در دسترس نمی باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IBIS12_197

تاریخ نمایه سازی: 12 آبان 1403

چکیده مقاله:

Various computational approaches such as variational autoencoders [۱] and neural optimaltransport [۲] methods have been used to predict single-cell perturbation responses. Correlation ordistance metrics, which compare observed and predicted gene expression profiles, have previously beenused for model assessment. However, we have observed that these metrics might not accuratelyrepresent the performance of various prediction models due to their sensitivity to the variable ranges ofindividual genes. To enhance model evaluation, this study introduces several novel evaluation metricsintended to provide more accurate measures of prediction accuracy.This study introduces a range of evaluation metrics to assess the accuracy of model predictions incomparison to observed data. Alongside correlations across samples, we evaluated the normalized rootmean square error, R-squared, and the average Pearson correlation coefficient across genes, which areinsensitive to the variable ranges of genes. Moreover, we employed the Frobenius norm to measure thedifference between the observed and predicted correlation matrices as an additional metric. This metriccan be applied across both samples and genes. Additionally, we calculated the Area Under the Curve(AUC) value of the Receiver Operating Characteristic (ROC) curve for a discriminative model, aimingto predict the observed correlation matrix by adjusting the thresholds of predicted correlations.Two single-cell RNA sequencing datasets were used including PBMC cell types under interferon-betastimulation [۳], and a subset of the SCIPlex۳ dataset [۴] comprising three cell types and ۱۰ single-dosedrug conditions. Following necessary preprocessing using Scanpy, the CPA and cellOT models, alongwith their baselines (no-perturbation, vector arithmetic, and PCA + vector arithmetic), were applied topredict gene expression profiles of different conditions across cell types. Evaluation of predictionsagainst observed values using the aforementioned metrics indicated that relying solely on assessmentacross samples (without considering the range of each gene) might be misleading, as baseline modelsdemonstrated better accuracy using gene expression range and graph-based methods. Permutation testapproaches were employed to establish a foundation for comparing the evaluations of modelperformance against random models. We demonstrated that our introduced evaluation metrics providevaluable insights into the accuracy of model performance.

کلیدواژه ها:

perturbation responses ، single cell RNA sequencing ، evaluation metrics

نویسندگان

Mina Karimpour

Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran

Hesam Montazeri

Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran