ArmanTTS single-speaker Persian dataset

سال انتشار: 1401
محل انتشار: اولین کنفرانس بین المللی و ششمین کنفرانس ملی کامپیوتر، فناوری اطلاعات و کاربردهای هوش مصنوعی
کد COI اختصاصی: CEITCONF06_046
زبان مقاله: انگلیسی
تعداد مشاهده: 376

دانلود فایل این مقاله

نویسندگان

Mohammd Hasan Shamgholi

MSc StudentSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Vahid Saeedi

MSc GraduateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Javad Peymanfard

PhD CandidateSchool of Computer EngineeringIran University of Science andTechnologyTehran, Iran

Leila Alhabib

BSc StudentSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran

Hossein Zeinali

Assistant ProfessorSchool of Computer EngineeringAmirkabir University of TechnologyTehran, Iran

چکیده

TTS, or text-to-speech, is a complicated process thatcan be accomplished through appropriate modeling using deeplearning methods. In order to implement deep learning models, asuitable dataset is required. Since there is a scarce amount ofwork done in this field for the Persian language, this paper willintroduce the single speaker dataset: ArmanTTS. We comparedthe characteristics of this dataset with those of various prevalentdatasets to prove that ArmanTTS meets the necessary standardsfor teaching a Persian text-to-speech conversion model. We alsocombined the Tacotron ۲ and HiFi GAN to design a model thatcan receive phonemes as input, with the output being thecorresponding speech. ۴.۰ value of MOS was obtained from realspeech, ۳.۸۷ value was obtained by the vocoder prediction and۲.۹۸ value was reached with the synthetic speech generated bythe TTS model.

کلیدواژه ها

dataset; Vocoders; Acoustic models

مقالات مرتبط جدید

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.