The authors explore the construction of a system to classify the dominant emotion in spoken utterances, in an environment where resources such as labelled utterances are scarce. The research addresses two issues relevant to detecting emotion in speech: (a) compensating for the lack of resources and (b) finding features of speech which best characterise emotional expression in the cultural environment being studied (South African telephone speech). Emotional speech was divided into three classes: active, neutral and passive emotion. An emotional speech corpus was created by naive annotators using recordings of telephone speech from a customer service call centre. Features were extracted from the emotional speech samples and the most suitable features selected by sequential forward selection (SFS). A consistency check was performed to compensate for the lack of experienced annotators and emotional speech samples. The classification accuracy achieved is 76.9%, with 95% classification accuracy for active emotion
Reference:
Martirosian, O and Barnard, E. 2007. Speech-based emotion detection in a resource-scarce environment. 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Pietermaritzburg, Kwazulu-Natal, South Africa, 28-30 November 2007, pp 5
Martirosian, O., & Barnard, E. (2007). Speech-based emotion detection in a resource-scarce environment. 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA). http://hdl.handle.net/10204/1975
Martirosian, O, and E Barnard. "Speech-based emotion detection in a resource-scarce environment." (2007): http://hdl.handle.net/10204/1975
Martirosian O, Barnard E, Speech-based emotion detection in a resource-scarce environment; 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA); 2007. http://hdl.handle.net/10204/1975 .