dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Badenhorst, J
|
|
dc.contributor.author |
Modipa, T
|
|
dc.date.accessioned |
2016-09-08T09:22:36Z |
|
dc.date.available |
2016-09-08T09:22:36Z |
|
dc.date.issued |
2016-05 |
|
dc.identifier.citation |
De Wet, F. Badenhorst, J. and Modipa, T. 2016. Developing speech resources from parliamentary data for South African english. In: 5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia |
en_US |
dc.identifier.uri |
http://www.sciencedirect.com/science/article/pii/S1877050916300424
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/8767
|
|
dc.description |
5th Workshop on Spoken Language Technology for Under-Resourced Languages, SLTU 2016, 9-12 May 2016, Yogyakarta, Indonesia |
en_US |
dc.description.abstract |
The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Elsevier |
en_US |
dc.relation.ispartofseries |
Workflow;17238 |
|
dc.subject |
South African languages |
en_US |
dc.subject |
Speech resources |
en_US |
dc.subject |
Parliamentary data |
en_US |
dc.title |
Developing speech resources from parliamentary data for South African english |
en_US |
dc.type |
Article |
en_US |
dc.identifier.apacitation |
De Wet, F., Badenhorst, J., & Modipa, T. (2016). Developing speech resources from parliamentary data for South African english. http://hdl.handle.net/10204/8767 |
en_ZA |
dc.identifier.chicagocitation |
De Wet, Febe, J Badenhorst, and T Modipa "Developing speech resources from parliamentary data for South African english." (2016) http://hdl.handle.net/10204/8767 |
en_ZA |
dc.identifier.vancouvercitation |
De Wet F, Badenhorst J, Modipa T. Developing speech resources from parliamentary data for South African english. 2016; http://hdl.handle.net/10204/8767. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - De Wet, Febe
AU - Badenhorst, J
AU - Modipa, T
AB - The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data.
DA - 2016-05
DB - ResearchSpace
DP - CSIR
KW - South African languages
KW - Speech resources
KW - Parliamentary data
LK - https://researchspace.csir.co.za
PY - 2016
T1 - Developing speech resources from parliamentary data for South African english
TI - Developing speech resources from parliamentary data for South African english
UR - http://hdl.handle.net/10204/8767
ER -
|
en_ZA |