Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems

Yoshiko Arimoto, Hiromi Kawatsu, Sumio Ohno, Hitoshi Iida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12% for Surprise and 70.11% for Sadness were obtained, and over 60% correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (R̂ 2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages322-325
Number of pages4
Publication statusPublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 2008 Sep 222008 Sep 26

Other

OtherINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
CountryAustralia
CityBrisbane, QLD
Period08/9/2208/9/26

Fingerprint

Discriminant analysis
Analysis of variance (ANOVA)
Acoustics
Linear regression
Regression analysis
Experiments

Keywords

  • Discriminant analysis
  • Emotional speech
  • Multiple regression analysis
  • Natural dialog
  • Prosody
  • Spontaneous speech

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

Arimoto, Y., Kawatsu, H., Ohno, S., & Iida, H. (2008). Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 322-325)

Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. / Arimoto, Yoshiko; Kawatsu, Hiromi; Ohno, Sumio; Iida, Hitoshi.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 322-325.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Arimoto, Y, Kawatsu, H, Ohno, S & Iida, H 2008, Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 322-325, INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, Australia, 08/9/22.
Arimoto Y, Kawatsu H, Ohno S, Iida H. Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 322-325
Arimoto, Yoshiko ; Kawatsu, Hiromi ; Ohno, Sumio ; Iida, Hitoshi. / Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. pp. 322-325
@inproceedings{4b99761fde154d7eb6d0ed8357d0d93f,
title = "Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems",
abstract = "For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12{\%} for Surprise and 70.11{\%} for Sadness were obtained, and over 60{\%} correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (R̂ 2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).",
keywords = "Discriminant analysis, Emotional speech, Multiple regression analysis, Natural dialog, Prosody, Spontaneous speech",
author = "Yoshiko Arimoto and Hiromi Kawatsu and Sumio Ohno and Hitoshi Iida",
year = "2008",
language = "English",
pages = "322--325",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

}

TY - GEN

T1 - Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems

AU - Arimoto, Yoshiko

AU - Kawatsu, Hiromi

AU - Ohno, Sumio

AU - Iida, Hitoshi

PY - 2008

Y1 - 2008

N2 - For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12% for Surprise and 70.11% for Sadness were obtained, and over 60% correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (R̂ 2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).

AB - For the purpose of determining emotion recognition by acoustic information, we recorded natural dialogs made by two or three players of online games to construct an emotional speech database. Two evaluators categorized recorded utterances in a certain emotion, which were defined with reference to the eight primary emotions of Plutchik's three-dimensional circumplex model. Furthermore, 14 evaluators graded utterances using a 5-point scale of subjective evaluation to obtain reference degrees of emotion. Eleven acoustic features were extracted from utterances and analysis of variance (ANOVA) was conducted to assess significant differences between emotions. Based on the results of ANOVA, we conducted discriminant analysis to discriminate one emotion from the others. Moreover, the experiment estimating emotional degree was conducted with multiple linear regression analysis to estimate emotional degree for each utterance. As a result of discriminant analysis, high correctness values of 79.12% for Surprise and 70.11% for Sadness were obtained, and over 60% correctness were obtained for most of the other emotions. As for emotional degree estimation, values of the adjusted R square (R̂ 2) for each emotion ranged from 0.05 (Disgust) to 0.55 (Surprise) for closed sets, and values of root mean square (RMS) of residual for open sets ranged from 0.39 (Acceptance) to 0.59 (Anger).

KW - Discriminant analysis

KW - Emotional speech

KW - Multiple regression analysis

KW - Natural dialog

KW - Prosody

KW - Spontaneous speech

UR - http://www.scopus.com/inward/record.url?scp=84867194514&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867194514&partnerID=8YFLogxK

M3 - Conference contribution

SP - 322

EP - 325

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

ER -