When ۱۰۵% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor

Javier Arredondo Montero

When ۱۰۵% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor

محل انتشار: International Journal of Reproductive BioMedicine، دوره: 23، شماره: 12

سال انتشار: 1404

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 13

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2548641

شناسه ملی سند علمی:

JR_IJRM-23-12_008

تاریخ نمایه سازی: 29 بهمن 1404

چکیده مقاله:

Reading a diagnostic test accuracy (DTA) meta-analysis often feels like a careful reconstruction of reasoning, coherence, and analytical validity. A reported sensitivity of ۹۷.۶%, with a ۹۵% CI extending to ۱۰۵.۵%, is one such red flag. Far from trivial, this reflects the use of statistical methods inappropriate for bounded data. Upon reviewing the meta-analysis by Keshari et al. (۱), I identified several methodological issues that compromise the validity of their findings and merit correction. Confidence intervals for sensitivity and specificity must fall within the ۰-۱۰۰% range, as these are proportions bounded between ۰ and ۱. Figure ۲ in the meta-analysis (۱) reports a sensitivity of ۹۷.۶% (۹۵% CI: ۸۹.۶۵-۱۰۵.۵۵%) for Dugoff et al. an impossible result suggesting an unbounded model without transformation for proportion data (e.g., logit, log, or Freeman-Tukey double arcsine). The pooled sensitivity (۹۰.۸۸%, ۹۵% CI: ۸۰.۹۲-۱۰۰.۸۵%) likewise exceeds ۱۰۰%, indicating a formally invalid estimate. Instead of addressing the problem, the authors truncate the upper bound in the abstract, reporting “۹۰.۹ (۹۵% CI: ۸۰.۹-۱۰۰%)”, which presents an adjusted rather than a model-derived value. Such modification diminishes transparency and misrepresents analytic uncertainty. The methods further conflate concepts by stating that heterogeneity was “evaluated using Cochran’s Q test and the DerSimonian-Laird method”. Cochran’s Q tests for between-study variability, while DerSimonian-Laird is a random-effects estimator applied after such variability was detected. Although DerSimonian-Laird is cited in the methods (despite its limitations compared to restricted maximum likelihood) (۲), several forest plots (e.g., Figures ۲ and ۳) indicate the use of restricted maximum likelihood estimation. This inconsistency between the reported and applied models reduces reproducibility. The meta-analysis also performs univariate pooling of sensitivity without plotting specificity or employing hierarchical or bivariate models, which account for correlation and threshold effects (۳, ۴). This approach limits interpretability. Moreover, restricting the analysis to only ۴ studies from a larger review of over ۷۰ may introduce selection bias and diminish generalizability. Clinical heterogeneity further undermines the pooled results. In figure ۲, the authors combine the sensitivity for all aneuploidies from Schlaikjær Hartwig et al. (۵) with that for trisomy ۲۱ from Dugoff et al. (۶), yielding the invalid ۹۵% CI noted earlier, even though the original trial reported a valid ۹۷% (۹۵% CI: ۸۳.۸-۹۹.۷%). Pooling such distinct endpoints without stratification or sensitivity analyses violates the principle of clinical coherence. Sensitivity and detection rate (diagnostic yield) are also used interchangeably, though they represent different measures: sensitivity denotes the proportion of true positives among affected individuals, whereas detection rate refers to positive tests among the screened population. This conceptual distinction is critical for interpretability. A further error appears in the abstract: “MicroRNA levels were significantly increased (standardized mean difference ۱.۲۲, ۹۵%: CI: -۰.۹۰ to ۳.۳۴)”. Because the CI includes ۰, the difference is not statistically significant; indeed, figure ۳ shows p = ۰.۲۶. Reporting it as significant misrepresents the evidence. The wide CI (-۰.۹۰ to ۳.۳۴) also reflects extreme imprecision, with heterogeneity indices (I² = ۹۷.۸۵%, Q = ۳۸.۶, p < ۰.۰۰۱, τ² = ۴.۴۵) confirming severe inconsistency that invalidates any pooled inference. Moreover, in figure ۳, all control groups appear with identical mean values (۱.۰۰), which the original data do not support, for instance, Lamadrid-Romero et al. reported no such uniformity. If standardization or imputation was applied, this should have been explicitly stated, as standardized mean differences are sensitive to such transformations. Several sensitivity estimates, including those from Schlaikjær Hartwig et al. and Dugoff et al. (۵, ۶), were directly extracted without reconstructing ۲×۲ tables. Although convenient, this practice departs from recommended DTA standards that require independent reconstruction to ensure consistent definitions and denominators. Omitting this step risks propagating biases and precludes assessment of threshold effects. The study process also lacks essential transparency: it was not registered in PROSPERO, and inclusion/exclusion criteria are only broadly described. Such omissions conflict with accepted standards for systematic reviews and reduce reproducibility. The use of the Joanna Briggs Institute checklist, instead of QUADAS-۲, the standard tool for DTA quality assessment, further weakens the methodological rigor and diverges from PRISMA-DTA guidelines. Finally, the use of funnel plots to assess publication bias is inappropriate when fewer than ۱۰ studies are included (۳). With only ۵ studies analyzed, such plots are underpowered and unreliable. In summary, the meta-analysis contains several methodological errors that materially affect its conclusions. Reporting sensitivity values exceeding ۱۰۰% and modifying confidence intervals post hoc indicates the need to revisit the underlying statistical models rather than adjust the presentation. Diagnostic meta-analysis requires bounded data transformations, hierarchical modeling, and transparent reporting to ensure valid inference. These observations are intended not as criticism but as constructive clarification, to support more rigorous and reproducible application of meta-analytic methods in diagnostic research.

نویسندگان

Javier Arredondo Montero

Pediatric Surgery Department, Complejo Asistencial Universitario de León, León, Spain.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Keshari JR, Prakash P, Sinha SR, Prakash P, Rani K, ...
Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, ...
Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y. Cochrane handbook ...
Arredondo Montero J. Diagnostic test accuracy meta-analysis: A practical guide ...
Schlaikjær Hartwig T, Ambye L, Gruhn JR, Petersen JF, Wrønding ...
Dugoff L, Koelper NC, Chasen ST, Russo ML, Roman AS, ...
Keshari JR, Prakash P, Sinha SR, Prakash P, Rani K, ...
Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, ...
Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y. Cochrane handbook ...
Arredondo Montero J. Diagnostic test accuracy meta-analysis: A practical guide ...
Schlaikjær Hartwig T, Ambye L, Gruhn JR, Petersen JF, Wrønding ...
Dugoff L, Koelper NC, Chasen ST, Russo ML, Roman AS, ...

نمایش کامل مراجع