This paper introduces a model of affect to improve prosody in text-to-speech synthesis. It operates on the discourse level of text to predict the underlying linguistic factors that contribute towards emotional appraisal, rather than any particular surface emotion itself. The architecture of the model is described and its performance is evaluated on three levels—its predictive accuracy on text, its effect on natural speech and its effect on synthesised speech.
Reference:
Schlunz, G.I and Barnard, E. 2013. A discourse model of affect for text-to-speech synthesis. In: Conference Proceedings of the 24th Annual Symposium of the Pattern Recognition Association of South Africa, Johannesburg, South Africa, 3 December 2013
Schlunz, G. I., & Barnard, E. (2013). A discourse model of affect for text-to-speech synthesis. PRASA 2013 Proceedings. http://hdl.handle.net/10204/7272
Schlunz, Georg I, and E Barnard. "A discourse model of affect for text-to-speech synthesis." (2013): http://hdl.handle.net/10204/7272
Schlunz GI, Barnard E, A discourse model of affect for text-to-speech synthesis; PRASA 2013 Proceedings; 2013. http://hdl.handle.net/10204/7272 .
Conference Proceedings of the 24th Annual Symposium of the Pattern Recognition Association of South Africa, Johannesburg, South Africa, 3 December 2013