Utilizing Speech Emotion Recognition andRecommender Systems for Negative EmotionHandling in Therapy Chatbots

Emotional well-being significantly influencesmental health and overall quality of life. As therapy chatbotsbecome increasingly prevalent, their ability to comprehend andrespond empathetically to users' emotions remains limited. Thispaper addresses this limitation by proposing an approach toenhance therapy chatbots with auditory perception, enablingthem to understand users' feelings and provide human-likeempathy. The proposed method incorporates speech emotionrecognition (SER) techniques using Convolutional NeuralNetwork (CNN) models and the ShEMO dataset to accuratelydetect and classify negative emotions, including anger, fear, andsadness. The SER model achieves a validation accuracy of ۸۸%,demonstrating its effectiveness in recognizing emotional statesfrom speech signals. Furthermore, a recommender system isdeveloped, leveraging the SER model's output to generatepersonalized recommendations for managing negativeemotions, for which a new bilingual dataset was generated aswell since there is no such dataset available for this task. Therecommender model achieves an accuracy of ۹۸% by employinga combination of global vectors for word representation (GloVe)and LSTM models. To provide a more immersive andempathetic user experience, a text-to-speech model called Glow-TTS is integrated, enabling the therapy chatbot to audiblycommunicate the generated recommendations to users in bothEnglish and Persian. The proposed approach offers promisingpotential to enhance therapy chatbots by providing them withthe ability to recognize and respond to users' emotions,ultimately improving the delivery of mental health support forboth English and Persian-speaking users.


Farideh Majidi

Computer Engineering DepartmentIslamic Azad University, South Tehran BranchTehran, Iran

Marzieh Bahrami

Computer Engineering DepartmentIslamic Azad University, South Tehran BranchTehran, Iran