Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble a specific target child speaker. Due to the scarcity of child speech data, initial models used in this approach are mostly trained with adult speech data. However, selection of appropriate training speakers from large corpora is not a trivial task because there is no means, other than conducting exhaustive subjective listening tests, to determine which training speakers will yield the best quality synthetic child voice. Therefore, there is a need to find an objective measure that can be used to easily identify a small set of training speakers that will yield the best quality output. In this paper we investigate whether a relationship exists between objective and subjective voice evaluation measures with regard to the selection of training speakers for an average voice model used in speaker-adaptive HMM child speech synthesis. Results indicate that, if training speakers that are closer to the target speaker are used to train initial models, better quality child voices are generated.
Reference:
Govender, A. and De Wet, F. 2016. Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa, p. 25-30. DOI: 10.1109/RoboMech.2016.7813193
Govender, A., & De Wet, F. (2016). Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. IEEE. http://hdl.handle.net/10204/9179
Govender, Avashna, and Febe De Wet. "Objective measures to improve the selection of training speakers in HMM-based child speech synthesis." (2016): http://hdl.handle.net/10204/9179
Govender A, De Wet F, Objective measures to improve the selection of training speakers in HMM-based child speech synthesis; IEEE; 2016. http://hdl.handle.net/10204/9179 .
2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa.