Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data

De Wet, Febe; Dlamini, Nkosikhona; Van der Walt, Willem J; Govender, Avashna

dc.contributor.author	De Wet, Febe
dc.contributor.author	Dlamini, Nkosikhona
dc.contributor.author	Van der Walt, Willem J
dc.contributor.author	Govender, Avashna
dc.date.accessioned	2018-01-15T09:58:48Z
dc.date.available	2018-01-15T09:58:48Z
dc.date.issued	2017-12
dc.identifier.citation	De Wet, F. et al. 2017. Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data. 2017 PRASA-RobMech International Conference, Bloemfontein, South Africa, 29 November - 1 December 2017	en_US
dc.identifier.isbn	978-1-5386-2313-8
dc.identifier.uri	http://www.rgems.co.za/Downloads/Events/2017_PRASA-RobMech_Program.pdf
dc.identifier.uri	http://hdl.handle.net/10204/9957
dc.description	Copyright: 2017. The attached pdf contains the accepted version of the paper. For access to the published version, kindly consult the publisher's website.	en_US
dc.description.abstract	Creating synthetic voices that are both natural and intelligible is a daunting challenge for well-resourced languages. The challenge is much bigger for languages in which the speech and text resources required for voice development are not available. In previous studies, audiobooks have been considered as an alternative source of speech data. The aim of the current study was to compare the quality of voices derived from audiobook data with voices based on data recorded by professional voice artist under studio conditions. Two sets of voices were evaluated: male voices built using a very small data set (around 3 hours, representing a severely resource constrained scenario) and female voices trained on almost 10 hours of speech data. The results of subjective listening tests indicate that, while the majority of the listeners preferred the voice artists’ voices over the audiobook voices, the difference in naturalness was not perceived to be substantial. Results also showed that the artists’ voices outperform the audiobook voices in terms of intelligibility, especially if a limited amount of training data is available. However, if more training data is used, the difference in intelligibility can be reduced substantially.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	Worklist;19991
dc.subject	Audiobooks	en_US
dc.subject	Speech synthesis	en_US
dc.subject	Text-to-speech	en_US
dc.subject	Under-resourced languages	en_US
dc.title	Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	De Wet, F., Dlamini, N., Van der Walt, W. J., & Govender, A. (2017). Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data. IEEE. http://hdl.handle.net/10204/9957	en_ZA
dc.identifier.chicagocitation	De Wet, Febe, Nkosikhona Dlamini, Willem J Van der Walt, and Avashna Govender. "Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data." (2017): http://hdl.handle.net/10204/9957	en_ZA
dc.identifier.vancouvercitation	De Wet F, Dlamini N, Van der Walt WJ, Govender A, Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data; IEEE; 2017. http://hdl.handle.net/10204/9957 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - De Wet, Febe AU - Dlamini, Nkosikhona AU - Van der Walt, Willem J AU - Govender, Avashna AB - Creating synthetic voices that are both natural and intelligible is a daunting challenge for well-resourced languages. The challenge is much bigger for languages in which the speech and text resources required for voice development are not available. In previous studies, audiobooks have been considered as an alternative source of speech data. The aim of the current study was to compare the quality of voices derived from audiobook data with voices based on data recorded by professional voice artist under studio conditions. Two sets of voices were evaluated: male voices built using a very small data set (around 3 hours, representing a severely resource constrained scenario) and female voices trained on almost 10 hours of speech data. The results of subjective listening tests indicate that, while the majority of the listeners preferred the voice artists’ voices over the audiobook voices, the difference in naturalness was not perceived to be substantial. Results also showed that the artists’ voices outperform the audiobook voices in terms of intelligibility, especially if a limited amount of training data is available. However, if more training data is used, the difference in intelligibility can be reduced substantially. DA - 2017-12 DB - ResearchSpace DP - CSIR KW - Audiobooks KW - Speech synthesis KW - Text-to-speech KW - Under-resourced languages LK - https://researchspace.csir.co.za PY - 2017 SM - 978-1-5386-2313-8 T1 - Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data TI - Building synthetic voices for under-resourced languages: a comparison between audiobook and studio data UR - http://hdl.handle.net/10204/9957 ER -	en_ZA

Files in this item

Name: De Wet_19991_2017.pdf

Size: 122.1Kb

Format: PDF

Description: Conference paper ...

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.