ResearchSpace

Objective measures to improve the selection of training speakers in HMM-based child speech synthesis

Show simple item record

dc.contributor.author Govender, Avashna
dc.contributor.author De Wet, Febe
dc.date.accessioned 2017-06-07T07:11:24Z
dc.date.available 2017-06-07T07:11:24Z
dc.date.issued 2016-12
dc.identifier.citation Govender, A. and De Wet, F. 2016. Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa, p. 25-30. DOI: 10.1109/RoboMech.2016.7813193 en_US
dc.identifier.isbn 978-1-5090-3335-5
dc.identifier.uri DOI: 10.1109/RoboMech.2016.7813193
dc.identifier.uri http://ieeexplore.ieee.org/document/7813193/
dc.identifier.uri http://hdl.handle.net/10204/9179
dc.description 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference, 30 November - 2 December 2016, Stellenbosch, South Africa. en_US
dc.description.abstract Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble a specific target child speaker. Due to the scarcity of child speech data, initial models used in this approach are mostly trained with adult speech data. However, selection of appropriate training speakers from large corpora is not a trivial task because there is no means, other than conducting exhaustive subjective listening tests, to determine which training speakers will yield the best quality synthetic child voice. Therefore, there is a need to find an objective measure that can be used to easily identify a small set of training speakers that will yield the best quality output. In this paper we investigate whether a relationship exists between objective and subjective voice evaluation measures with regard to the selection of training speakers for an average voice model used in speaker-adaptive HMM child speech synthesis. Results indicate that, if training speakers that are closer to the target speaker are used to train initial models, better quality child voices are generated. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.ispartofseries Worklist;18123
dc.subject Synthetic child voices en_US
dc.subject Hidden Markov Model en_US
dc.title Objective measures to improve the selection of training speakers in HMM-based child speech synthesis en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Govender, A., & De Wet, F. (2016). Objective measures to improve the selection of training speakers in HMM-based child speech synthesis. IEEE. http://hdl.handle.net/10204/9179 en_ZA
dc.identifier.chicagocitation Govender, Avashna, and Febe De Wet. "Objective measures to improve the selection of training speakers in HMM-based child speech synthesis." (2016): http://hdl.handle.net/10204/9179 en_ZA
dc.identifier.vancouvercitation Govender A, De Wet F, Objective measures to improve the selection of training speakers in HMM-based child speech synthesis; IEEE; 2016. http://hdl.handle.net/10204/9179 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Govender, Avashna AU - De Wet, Febe AB - Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble a specific target child speaker. Due to the scarcity of child speech data, initial models used in this approach are mostly trained with adult speech data. However, selection of appropriate training speakers from large corpora is not a trivial task because there is no means, other than conducting exhaustive subjective listening tests, to determine which training speakers will yield the best quality synthetic child voice. Therefore, there is a need to find an objective measure that can be used to easily identify a small set of training speakers that will yield the best quality output. In this paper we investigate whether a relationship exists between objective and subjective voice evaluation measures with regard to the selection of training speakers for an average voice model used in speaker-adaptive HMM child speech synthesis. Results indicate that, if training speakers that are closer to the target speaker are used to train initial models, better quality child voices are generated. DA - 2016-12 DB - ResearchSpace DP - CSIR KW - Synthetic child voices KW - Hidden Markov Model LK - https://researchspace.csir.co.za PY - 2016 SM - 978-1-5090-3335-5 T1 - Objective measures to improve the selection of training speakers in HMM-based child speech synthesis TI - Objective measures to improve the selection of training speakers in HMM-based child speech synthesis UR - http://hdl.handle.net/10204/9179 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record