ResearchSpace

Developing speech resources from parliamentary data for South African english

Show simple item record

dc.contributor.author De Wet, Febe
dc.contributor.author Badenhorst, J
dc.contributor.author Modipa, T
dc.date.accessioned 2016-09-08T09:22:36Z
dc.date.available 2016-09-08T09:22:36Z
dc.date.issued 2016-05
dc.identifier.citation De Wet, F. Badenhorst, J. and Modipa, T. 2016. Developing speech resources from parliamentary data for South African english. In: 5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia en_US
dc.identifier.uri http://www.sciencedirect.com/science/article/pii/S1877050916300424
dc.identifier.uri http://hdl.handle.net/10204/8767
dc.description 5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia en_US
dc.description.abstract The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data. en_US
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.relation.ispartofseries Workflow;17238
dc.subject South African languages en_US
dc.subject Speech resources en_US
dc.subject Parliamentary data en_US
dc.title Developing speech resources from parliamentary data for South African english en_US
dc.type Article en_US
dc.identifier.apacitation De Wet, F., Badenhorst, J., & Modipa, T. (2016). Developing speech resources from parliamentary data for South African english. http://hdl.handle.net/10204/8767 en_ZA
dc.identifier.chicagocitation De Wet, Febe, J Badenhorst, and T Modipa "Developing speech resources from parliamentary data for South African english." (2016) http://hdl.handle.net/10204/8767 en_ZA
dc.identifier.vancouvercitation De Wet F, Badenhorst J, Modipa T. Developing speech resources from parliamentary data for South African english. 2016; http://hdl.handle.net/10204/8767. en_ZA
dc.identifier.ris TY - Article AU - De Wet, Febe AU - Badenhorst, J AU - Modipa, T AB - The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data. DA - 2016-05 DB - ResearchSpace DP - CSIR KW - South African languages KW - Speech resources KW - Parliamentary data LK - https://researchspace.csir.co.za PY - 2016 T1 - Developing speech resources from parliamentary data for South African english TI - Developing speech resources from parliamentary data for South African english UR - http://hdl.handle.net/10204/8767 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record