Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation

سال انتشار: 1404
محل انتشار: مجله مهندسی کامپیوتر و دانش، دوره: 8، شماره: 1
کد COI اختصاصی: JR_CKE-8-1_005
زبان مقاله: انگلیسی
تعداد مشاهده: 60

دانلود فایل این مقاله

نویسندگان

emad mousavian

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,

Danial Qashqai

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,

Shahriar B. Shokouhi

Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran

چکیده

Scene understanding through semantic segmentation is a vital component for autonomous vehicles. Given the importance of safety in autonomous driving, existing methods are constantly striving to improve accuracy and reduce error. RGB-based semantic segmentation models typically underperform due to information loss in challenging situations such as lighting variations and limitations in distinguishing occluded objects of similar appearance. Therefore, recent studies have developed RGB-D semantic segmentation methods by employing attention-based fusion modules. Existing fusion modules typically combine cross-modal features by focusing on each modality independently, which limits their ability to capture the complementary nature of modalities. To address this issue, we propose a simple yet effective module called the Discriminative Cross-modal Attention Fusion (DCMAF) module. Specifically, the proposed module performs cross-modal discrimination using element-wise subtraction in an attention-based approach. By integrating the DCMAF module with efficient channel- and spatial-wise attention modules, we introduce the Discriminative Cross-modal Network (DCMNet), a scale- and appearance-invariant model. Extensive experiments demonstrate significant improvements, particularly in predicting small and fine objects, achieving an mIoU of ۷۷.۳۹% on the CamVid dataset, outperforming state-of-the-art RGB-based methods, and a remarkable mIoU of ۸۲.۸% on the Cityscapes dataset. As the CamVid dataset lacks depth information, we employ the DPT monocular depth estimation model to generate depth images.

کلیدواژه ها

Attention Mechanism, Autonomous Driving, deep Learning, RGB-D Semantic Segmentation

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.