Developing speech resources from parliamentary data for South African english

De Wet, Febe; Badenhorst, J; Modipa, T

dc.contributor.author	De Wet, Febe
dc.contributor.author	Badenhorst, J
dc.contributor.author	Modipa, T
dc.date.accessioned	2016-09-08T09:22:36Z
dc.date.available	2016-09-08T09:22:36Z
dc.date.issued	2016-05
dc.identifier.citation	De Wet, F. Badenhorst, J. and Modipa, T. 2016. Developing speech resources from parliamentary data for South African english. In: 5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia	en_US
dc.identifier.uri	http://www.sciencedirect.com/science/article/pii/S1877050916300424
dc.identifier.uri	http://hdl.handle.net/10204/8767
dc.description	5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia	en_US
dc.description.abstract	The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data.	en_US
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.relation.ispartofseries	Workflow;17238
dc.subject	South African languages	en_US
dc.subject	Speech resources	en_US
dc.subject	Parliamentary data	en_US
dc.title	Developing speech resources from parliamentary data for South African english	en_US
dc.type	Article	en_US
dc.identifier.apacitation	De Wet, F., Badenhorst, J., & Modipa, T. (2016). Developing speech resources from parliamentary data for South African english. http://hdl.handle.net/10204/8767	en_ZA
dc.identifier.chicagocitation	De Wet, Febe, J Badenhorst, and T Modipa "Developing speech resources from parliamentary data for South African english." (2016) http://hdl.handle.net/10204/8767	en_ZA
dc.identifier.vancouvercitation	De Wet F, Badenhorst J, Modipa T. Developing speech resources from parliamentary data for South African english. 2016; http://hdl.handle.net/10204/8767.	en_ZA
dc.identifier.ris	TY - Article AU - De Wet, Febe AU - Badenhorst, J AU - Modipa, T AB - The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data. DA - 2016-05 DB - ResearchSpace DP - CSIR KW - South African languages KW - Speech resources KW - Parliamentary data LK - https://researchspace.csir.co.za PY - 2016 T1 - Developing speech resources from parliamentary data for South African english TI - Developing speech resources from parliamentary data for South African english UR - http://hdl.handle.net/10204/8767 ER -	en_ZA

Files in this item

Name: De Wet4_2015_ABST ...

Size: 6.451Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Journal Articles

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.