Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation

emad mousavian; Danial Qashqai; Shahriar B. Shokouhi

Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation

محل انتشار: مجله مهندسی کامپیوتر و دانش، دوره: 8، شماره: 1

سال انتشار: 1404

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 117

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2264896

شناسه ملی سند علمی:

JR_CKE-8-1_005

تاریخ نمایه سازی: 28 اردیبهشت 1404

چکیده مقاله:

Scene understanding through semantic segmentation is a vital component for autonomous vehicles. Given the importance of safety in autonomous driving, existing methods are constantly striving to improve accuracy and reduce error. RGB-based semantic segmentation models typically underperform due to information loss in challenging situations such as lighting variations and limitations in distinguishing occluded objects of similar appearance. Therefore, recent studies have developed RGB-D semantic segmentation methods by employing attention-based fusion modules. Existing fusion modules typically combine cross-modal features by focusing on each modality independently, which limits their ability to capture the complementary nature of modalities. To address this issue, we propose a simple yet effective module called the Discriminative Cross-modal Attention Fusion (DCMAF) module. Specifically, the proposed module performs cross-modal discrimination using element-wise subtraction in an attention-based approach. By integrating the DCMAF module with efficient channel- and spatial-wise attention modules, we introduce the Discriminative Cross-modal Network (DCMNet), a scale- and appearance-invariant model. Extensive experiments demonstrate significant improvements, particularly in predicting small and fine objects, achieving an mIoU of ۷۷.۳۹% on the CamVid dataset, outperforming state-of-the-art RGB-based methods, and a remarkable mIoU of ۸۲.۸% on the Cityscapes dataset. As the CamVid dataset lacks depth information, we employ the DPT monocular depth estimation model to generate depth images.

کلیدواژه ها:

Attention Mechanism ، Autonomous Driving ، deep Learning ، RGB-D Semantic Segmentation

نویسندگان

emad mousavian

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,

Danial Qashqai

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,

Shahriar B. Shokouhi

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Hu, K. Yang, L. Fei, and K. Wang. (۲۰۱۹, Sep.). ...

نمایش کامل مراجع