Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B۲ production data

سال انتشار: 1399
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 66

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JHES-8-2_002

تاریخ نمایه سازی: 8 آذر 1402

چکیده مقاله:

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variables. In addition, classical methods are affected by the presence of outliers and collinearity. Methods: Nowadays, many real-world data sets carry structures of high-dimensional problems. To handle this problem, we used the least absolute shrinkage and selection operator (LASSO). Also, due to the flexibility and applicability of the semiparametric model in medical data, it can be used for modeling the genomic data. Motivated by these, here an improved robust approach in a high-dimensional data set was developed for the analysis of gene expression and prediction in the presence of outliers. Results: Among the common problems in regression analysis, there was the problem of outliers. In the regression concept, an outlier is a point that fails to follow the main linear pattern of the data. The ordinary least-squares estimator was found potentially sensitive to the outliers; this fact provided necessary motivations to investigate robust estimations. Generally, the robust regression is among the most popular problems in the statistics community. In the present study, the least trimmed squares (LTS) estimation was applied to overcome the outlier problem. Conclusions: We have proposed an optimization approach for semiparametric models to combat outliers in the data set. Especially, based on a penalization LASSO scheme, we have suggested a nonlinear integer programming problem as the semiparametric model which can be effectively solved by any evolutionary algorithm. We have also studied a real-world application related to the riboflavin production. The results showed that the proposed method was reasonably efficient in contrast to the LTS Method.

کلیدواژه ها:

High-dimensional data set ، Ordinary least square method ، Outliers ، Robust regression

نویسندگان

Mahdi Roozbeh

Faculty of Mathematics, Statistics & Computer Science, Semnan University, Semnan, Iran

Monireh Maanavi

Faculty of Mathematics, Statistics and Computer Science, Semnan University, Semnan, Iran

Saman Babaie-Kafaki

Faculty of Mathematics, Statistics & Computer Science, Semnan University, Semnan, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Liu H, Sirish S, Wei J. On-line outlier detection and ...
  • Hawkins D. Identification of outliers. London: Chapman and Hall; ۱۹۸۰ ...
  • Barnett V, Lewis T. Outliers in statistical data. ۳rd ed. ...
  • Beckman RJ, Cook RD. Outlier ... ... .... s. Technometrics. ...
  • https://doi.org/۱۰.۲۳۰۷/۱۲۶۸۵۴۱Sheather SJ. A modern approach to regression with R. New ...
  • Moore DS, Mccabe GP, Criag BA. Introduction to the practice ...
  • Das MK, Gogoi B. Influential observations and cutoffs of different ...
  • Rousseeuw PJ, Van Driessen K. Computing LTS regression for large ...
  • Engle RF, Granger CWJ, Rice J, WEISS A. Semiparametric estimation ...
  • Yatchew A. An elementary estimator of the partial linear model. ...
  • Yatchew A. Nonparametric regression techniques in economics. Journal of Economic ...
  • Leskovec J, Rajaraman A, Ullman JD. Mining of massive datasets. ...
  • Samet H. Foundations of multidimensional and metric data structures. ۱st ...
  • Tan N, Steinbach M, Kumar V. Introduction to data mining. ...
  • Tibshirani R. Regression shrinkage and Selection via the Lasso. Journal ...
  • Rousseeuw PJ. Least median of squares regression. Journal of the ...
  • Visek JA. Regression with high breakdown point. Proceedings of the ...
  • Rousseeuw PJ, Leroy AM. Robust regression and Outlier Detection. New ...
  • Alfons A, Croux C, Gelper S. Sparse least trimmed squares ...
  • Härdle WK, Liang H, Gao J. Partially linear models. Heidelberg: ...
  • Speckman P. Kernel smoothing in partial linear models. Journal of ...
  • Roozbeh M. Robust ridge estimator in restricted semiparametric regression models. ...
  • Amini M, Roozbeh M. Optimal partial ridge estimation in restricted ...
  • Roozbeh M. Optimal QR-based estimation in partially linear regression models ...
  • Roozbeh M, Babaie-Kafaki S, Naeimi Sadigh A. A heuristic approach ...
  • Akdeniz F, Roozbeh M. Generalized difference-based weighted mixed almost unbiased ...
  • Hall P, Kay J, Titterington DM. Asymptotically optimal difference-based estimation ...
  • Buhlmann P, Kalisch M, Meier L. High dimensional statistics with ...
  • Arashi M, Roozbeh M. Some improved estimation strategies in high ...
  • Amini M, Roozbeh M. Improving the prediction performance of the ...
  • Babaie-Kafaki S, Roozbeh M. A revised Cholesky decomposition to combat ...
  • Roozbeh M, Babaie-Kafaki S, Arashi M. A class of biased ...
  • Roozbeh M, Babaie-Kafaki S. Extended least trimmed squares estimator in ...
  • نمایش کامل مراجع