dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Dlamini, Nkosikhona
|
|
dc.contributor.author |
Van der Walt, Willem J
|
|
dc.contributor.author |
Govender, Avashna
|
|
dc.date.accessioned |
2018-01-15T09:58:48Z |
|
dc.date.available |
2018-01-15T09:58:48Z |
|
dc.date.issued |
2017-12 |
|
dc.identifier.citation |
De Wet, F. et al. 2017. Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data. 2017 PRASA-RobMech International Conference, Bloemfontein, South Africa, 29 November - 1 December 2017 |
en_US |
dc.identifier.isbn |
978-1-5386-2313-8 |
|
dc.identifier.uri |
http://www.rgems.co.za/Downloads/Events/2017_PRASA-RobMech_Program.pdf
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/9957
|
|
dc.description |
Copyright: 2017. The attached pdf contains the accepted version of the paper. For access to the published version, kindly consult the publisher's website. |
en_US |
dc.description.abstract |
Creating synthetic voices that are both natural and intelligible is a daunting challenge for well-resourced languages. The challenge is much bigger for languages in which the speech and text resources required for voice development are not available. In previous studies, audiobooks have been considered as an alternative source of speech data. The aim of the current study was to compare the quality of voices derived from audiobook data with voices based on data recorded by professional voice artist under studio conditions. Two sets of voices were evaluated: male voices built using a very small data set (around 3 hours, representing a severely resource constrained scenario) and female voices trained on almost 10 hours of speech data. The results of subjective listening tests indicate that, while the majority of the listeners preferred the voice artists’ voices over the audiobook voices, the difference in naturalness was not perceived to be substantial. Results also showed that the artists’ voices outperform the audiobook voices in terms of intelligibility, especially if a limited amount of training data is available. However, if more training data is used, the difference in intelligibility can be reduced substantially. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Worklist;19991 |
|
dc.subject |
Audiobooks |
en_US |
dc.subject |
Speech synthesis |
en_US |
dc.subject |
Text-to-speech |
en_US |
dc.subject |
Under-resourced languages |
en_US |
dc.title |
Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
De Wet, F., Dlamini, N., Van der Walt, W. J., & Govender, A. (2017). Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data. IEEE. http://hdl.handle.net/10204/9957 |
en_ZA |
dc.identifier.chicagocitation |
De Wet, Febe, Nkosikhona Dlamini, Willem J Van der Walt, and Avashna Govender. "Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data." (2017): http://hdl.handle.net/10204/9957 |
en_ZA |
dc.identifier.vancouvercitation |
De Wet F, Dlamini N, Van der Walt WJ, Govender A, Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data; IEEE; 2017. http://hdl.handle.net/10204/9957 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - De Wet, Febe
AU - Dlamini, Nkosikhona
AU - Van der Walt, Willem J
AU - Govender, Avashna
AB - Creating synthetic voices that are both natural and intelligible is a daunting challenge for well-resourced languages. The challenge is much bigger for languages in which the speech and text resources required for voice development are not available. In previous studies, audiobooks have been considered as an alternative source of speech data. The aim of the current study was to compare the quality of voices derived from audiobook data with voices based on data recorded by professional voice artist under studio conditions. Two sets of voices were evaluated: male voices built using a very small data set (around 3 hours, representing a severely resource constrained scenario) and female voices trained on almost 10 hours of speech data. The results of subjective listening tests indicate that, while the majority of the listeners preferred the voice artists’ voices over the audiobook voices, the difference in naturalness was not perceived to be substantial. Results also showed that the artists’ voices outperform the audiobook voices in terms of intelligibility, especially if a limited amount of training data is available. However, if more training data is used, the difference in intelligibility can be reduced substantially.
DA - 2017-12
DB - ResearchSpace
DP - CSIR
KW - Audiobooks
KW - Speech synthesis
KW - Text-to-speech
KW - Under-resourced languages
LK - https://researchspace.csir.co.za
PY - 2017
SM - 978-1-5386-2313-8
T1 - Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data
TI - Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data
UR - http://hdl.handle.net/10204/9957
ER -
|
en_ZA |