Multi-Speaker Noise Reduction through Audio-Visual Fusion and Disentanglement

Recognizing speech from multiple simultaneous speakers is critical for real-world speech applications but suffers from overlapping acoustic noise. This paper proposes a multi-view audio-visual deep learning architecture to reduce noise and improve speech recognition accuracy in multi-talker settings using visual speech cues. A visual processing front-end disentangles blended mouth movements into individual speaker representations using adversarial training. These representations are fused with beamformed audio encodings through temporally synchronized co-attention. Experiments demonstrate significant noise reduction and increased recognition accuracy compared to audio-only methods. The model generalizes well to varying speaker counts and unseen combinations. This audio-visually fused framework enables deploying robust multi-speaker speech recognition without requiring per-speaker training.

کلیدواژه ها:

Multi-Speaker ، Noise ، Audio-Visual Fusion ، Disentanglement

نویسندگان

Faramarz Zareian

University of Genova,Computer science, Master Degree, Genova, DIBRIS Departimento di informatica, Bioingegneria, Robotics e Ingegneria dei Sistemi

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1770455

شناسه ملی سند علمی:

ICPCONF09_131

تاریخ نمایه سازی: 8 مهر 1402

نحوه استناد به مقاله:

در صورتی که می خواهید در اثر پژوهشی خود به این مقاله ارجاع دهید، به سادگی می توانید از عبارت زیر در بخش منابع و مراجع استفاده نمایید:

Zareian, Faramarz,1402,Multi-Speaker Noise Reduction through Audio-Visual Fusion and Disentanglement,The 9th International Conference on Electrical,computer and mechanical engineering,Tehran,https://civilica.com/doc/1770455

در داخل متن نیز هر جا که به عبارت و یا دستاوردی از این مقاله اشاره شود پس از ذکر مطلب، در داخل پارانتز، مشخصات زیر نوشته می شود.
برای بار اول: (1402, Zareian, Faramarz؛ )
برای بار دوم به بعد: (1402, Zareian؛ )
برای آشنایی کامل با نحوه مرجع نویسی لطفا بخش راهنمای سیویلیکا (مرجع دهی) را ملاحظه نمایید.

مقالات مرتبط جدید