TY - GEN
T1 - Creation and Analysis of Emotional Speech Database for Multiple Emotions Recognition
AU - Sato, Ryota
AU - Sasaki, Ryohei
AU - Suga, Norisato
AU - Furukawa, Toshihiro
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/5
Y1 - 2020/11/5
N2 - Speech emotion recognition (SER) is one of the latest challenge in human-computer interaction. In conventional SER classification methods, a single emotion label is outputted per one utterance as the estimation result. This is because conventional speech emotional databases which are used to train SER models have a single emotion label for one utterance. However, it is often the case that multiple emotions are expressed simultaneously with different intensities in human speech. In order to realize more natural SER than ever, existence of multiple emotions in one utterance should be taken into account. Therefore, we created an emotional speech database which contains multiple emotions and their intensities labels. The creation experiment was conducted by extracting speech utterance parts where emotions appear from existing video works. In addition, we evaluated the created database by performing statistical analysis on the database. As a result, 2,025 samples were obtained, of which 1,525 samples contained multiple emotions.
AB - Speech emotion recognition (SER) is one of the latest challenge in human-computer interaction. In conventional SER classification methods, a single emotion label is outputted per one utterance as the estimation result. This is because conventional speech emotional databases which are used to train SER models have a single emotion label for one utterance. However, it is often the case that multiple emotions are expressed simultaneously with different intensities in human speech. In order to realize more natural SER than ever, existence of multiple emotions in one utterance should be taken into account. Therefore, we created an emotional speech database which contains multiple emotions and their intensities labels. The creation experiment was conducted by extracting speech utterance parts where emotions appear from existing video works. In addition, we evaluated the created database by performing statistical analysis on the database. As a result, 2,025 samples were obtained, of which 1,525 samples contained multiple emotions.
KW - emotion estimation
KW - emotion recognition
KW - emotional intensity
KW - emotional speech database
KW - multiple emotions
KW - speech corpus
KW - speech emotion
UR - http://www.scopus.com/inward/record.url?scp=85099575273&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099575273&partnerID=8YFLogxK
U2 - 10.1109/O-COCOSDA50338.2020.9295041
DO - 10.1109/O-COCOSDA50338.2020.9295041
M3 - Conference contribution
AN - SCOPUS:85099575273
T3 - Proceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
SP - 33
EP - 37
BT - Proceedings of 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2020
Y2 - 5 November 2020 through 7 November 2020
ER -