A Data-Centric Literature Review of SHAP and LIME for Tabular Decision Support Systems

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 3

فایل این مقاله در 15 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

ICAICS01_016

تاریخ نمایه سازی: 19 خرداد 1405

چکیده مقاله:

Decision Support Systems (DSS) in healthcare and finance increasingly rely on SHAP and LIME to mitigate the "black box" problem of machine learning models. While traditional research focuses on algorithmic mechanics, this paper provides a data-centric literature review. It shifts the focus from the internal logic of the explainer to how intrinsic data characteristics and preprocessing decisions shape the resulting output.The review identifies critical vulnerabilities linked to feature dependence, where collinearity leads to marginalization failures and "attribution splitting". It further examines data quality gaps, specifically how imputation artifacts act as confounding factors that can lead explainers to highlight reconstructed values rather than observed evidence. Additionally, the paper addresses how class imbalance and synthetic resampling distort decision boundaries, while sampling stochasticity introduces instability through out-of-distribution perturbations. Security and privacy concerns are also explored, including adversarial manipulation and membership inference attacks facilitated by specific attribution patterns.Understanding these data-driven limitations is particularly essential for data scientists and AI practitioners to interpret models outputs reliably before deployment. Ultimately, this work argues that the fidelity of an explanation is inextricably linked to the underlying data distribution. Achieving robust and trustworthy explainability is a fundamentally data-driven challenge that requires the development of data-aware XAI methods and standardized evaluation protocols.To understand this shift, consider that a machine learning explanation is like a photograph of a landscape; its clarity depends less on the camera's brand (the algorithm) and more on the weather conditions and terrain (the data) being captured.

کلیدواژه ها:

نویسندگان

Raheleh Yoosefzadeh

Independent Researcher