ResearchSpace

Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system

Show simple item record

dc.contributor.author Kamper, H
dc.contributor.author De Wet, Febe
dc.contributor.author Hain, T
dc.contributor.author Niesler, T
dc.date.accessioned 2014-07-30T09:25:30Z
dc.date.available 2014-07-30T09:25:30Z
dc.date.issued 2014-11
dc.identifier.citation Kamper, H, De Wet, F, Hain, T and Niesler, T. 2014. Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. Computer Speech and Language, vol. 28(6), pp 1255-1268 en_US
dc.identifier.issn 0885-2308
dc.identifier.uri http://ac.els-cdn.com/S0885230814000369/1-s2.0-S0885230814000369-main.pdf?_tid=58c8df98-1719-11e4-840c-00000aacb362&acdnat=1406636016_b7d0b92cbf742737e6b945258ef83077
dc.identifier.uri http://hdl.handle.net/10204/7550
dc.description Copyright: 2014 Elsevier. This is an ABSTRACT ONLY. The definitive version is published in Computer Speech and Language, vol. 28(6), pp 1255-1268 en_US
dc.description.abstract South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest. en_US
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.relation.ispartofseries Workflow;13147
dc.subject Under-resourced languages en_US
dc.subject Accented speech en_US
dc.subject South African English en_US
dc.subject Varieties of English en_US
dc.subject Extensive speech resources en_US
dc.subject South African language models en_US
dc.title Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system en_US
dc.type Article en_US
dc.identifier.apacitation Kamper, H., De Wet, F., Hain, T., & Niesler, T. (2014). Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. http://hdl.handle.net/10204/7550 en_ZA
dc.identifier.chicagocitation Kamper, H, Febe De Wet, T Hain, and T Niesler "Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system." (2014) http://hdl.handle.net/10204/7550 en_ZA
dc.identifier.vancouvercitation Kamper H, De Wet F, Hain T, Niesler T. Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. 2014; http://hdl.handle.net/10204/7550. en_ZA
dc.identifier.ris TY - Article AU - Kamper, H AU - De Wet, Febe AU - Hain, T AU - Niesler, T AB - South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest. DA - 2014-11 DB - ResearchSpace DP - CSIR KW - Under-resourced languages KW - Accented speech KW - South African English KW - Varieties of English KW - Extensive speech resources KW - South African language models LK - https://researchspace.csir.co.za PY - 2014 SM - 0885-2308 T1 - Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system TI - Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system UR - http://hdl.handle.net/10204/7550 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record