dc.contributor.author |
Kamper, H
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Hain, T
|
|
dc.contributor.author |
Niesler, T
|
|
dc.date.accessioned |
2014-07-30T09:25:30Z |
|
dc.date.available |
2014-07-30T09:25:30Z |
|
dc.date.issued |
2014-11 |
|
dc.identifier.citation |
Kamper, H, De Wet, F, Hain, T and Niesler, T. 2014. Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. Computer Speech and Language, vol. 28(6), pp 1255-1268 |
en_US |
dc.identifier.issn |
0885-2308 |
|
dc.identifier.uri |
http://ac.els-cdn.com/S0885230814000369/1-s2.0-S0885230814000369-main.pdf?_tid=58c8df98-1719-11e4-840c-00000aacb362&acdnat=1406636016_b7d0b92cbf742737e6b945258ef83077
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/7550
|
|
dc.description |
Copyright: 2014 Elsevier. This is an ABSTRACT ONLY. The definitive version is published in Computer Speech and Language, vol. 28(6), pp 1255-1268 |
en_US |
dc.description.abstract |
South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Elsevier |
en_US |
dc.relation.ispartofseries |
Workflow;13147 |
|
dc.subject |
Under-resourced languages |
en_US |
dc.subject |
Accented speech |
en_US |
dc.subject |
South African English |
en_US |
dc.subject |
Varieties of English |
en_US |
dc.subject |
Extensive speech resources |
en_US |
dc.subject |
South African language models |
en_US |
dc.title |
Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system |
en_US |
dc.type |
Article |
en_US |
dc.identifier.apacitation |
Kamper, H., De Wet, F., Hain, T., & Niesler, T. (2014). Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. http://hdl.handle.net/10204/7550 |
en_ZA |
dc.identifier.chicagocitation |
Kamper, H, Febe De Wet, T Hain, and T Niesler "Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system." (2014) http://hdl.handle.net/10204/7550 |
en_ZA |
dc.identifier.vancouvercitation |
Kamper H, De Wet F, Hain T, Niesler T. Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system. 2014; http://hdl.handle.net/10204/7550. |
en_ZA |
dc.identifier.ris |
TY - Article
AU - Kamper, H
AU - De Wet, Febe
AU - Hain, T
AU - Niesler, T
AB - South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest.
DA - 2014-11
DB - ResearchSpace
DP - CSIR
KW - Under-resourced languages
KW - Accented speech
KW - South African English
KW - Varieties of English
KW - Extensive speech resources
KW - South African language models
LK - https://researchspace.csir.co.za
PY - 2014
SM - 0885-2308
T1 - Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system
TI - Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system
UR - http://hdl.handle.net/10204/7550
ER -
|
en_ZA |