ETwins: Enhanced Twins Transformer for Facial Expression Recognition

سال انتشار: 1403
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 71

فایل این مقاله در 13 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

EITCONF03_259

تاریخ نمایه سازی: 18 فروردین 1404

چکیده مقاله:

Facial expression recognition (FER) requires distinguishing between subtle visual differences, making it a challenging fine-grained classification task. This paper presents ETwins, an Enhanced Twins transformer architecture specifically tailored for facial expression recognition. ETwins builds on the Twins vision transformer and incorporates three key improvements: a weighted global average pooling layer that emphasizes critical facial regions, the integration of a class token for capturing both local and global features, and an enhanced positional encoding mechanism to embed essential contextual information. Our evaluations on widely used FER datasets demonstrate that ETwins achieves improved accuracy compared to larger baseline models while using fewer parameters. This result underscores the efficiency of custom vision transformers for fine-grained tasks, achieving competitive FER performance with a compact, parameter-efficient design.

نویسندگان

Ali Mohammad Pazandeh

Electrical Engineering Department, Sharif University of Technology, Azadi Ave., Tehran, Iran

Emad Fatemizadeh

Electrical Engineering Department, Sharif University of Technology, Azadi Ave., Tehran, Iran