dc.contributor.author |
Kamper, H
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.contributor.author |
Hain, T
|
|
dc.contributor.author |
Niesler, T
|
|
dc.date.accessioned |
2012-05-31T14:19:27Z |
|
dc.date.available |
2012-05-31T14:19:27Z |
|
dc.date.issued |
2012-05 |
|
dc.identifier.citation |
Kamper, H, De Wet, F, Hain, T and Niesler, T. Resource development and experiments in automatic SA broadcast news transcription. SLTU 2012. 3rd Workshop on Spoken Technologies for Under-Resourced Languages, Cape Town, South Africa, 7-9 May 2012 |
en_US |
dc.identifier.uri |
http://hdl.handle.net/10204/5893
|
|
dc.description |
SLTU 2012. 3rd Workshop on Spoken Technologies for Under-Resourced Languages, Cape Town, South Africa, 7-9 May 2012 |
en_US |
dc.description.abstract |
We present a description of the development and evaluation of a first South African broadcast news transcription system. We describe a number of speech resources which have been collected in the resource-scarce South African environment for system development purposes: a 20 hour corpus of South African English (SAE) broadcast news, a 109M word corpus of South African newspaper text collected for language modelling purposes, and a 60k word SAE pronunciation dictionary. The development of our system is based on similar state-of-the-art broadcast news transcription systems and uses cross-word triphone HMMs, MF-PLP features and per segment cepstral mean and per-bulletin cepstral variance normalisation. Our final system achieves a word error rate of 24.6%. We find that reasonable performance is achieved on spontaneous and telephone speech in our test data. Finally, we consider the recognition of MP3-compressed audio and show that performance deteriorates only at low bit-rates. |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.ispartofseries |
Workflow;9015 |
|
dc.subject |
Broadcast news transcription |
en_US |
dc.subject |
South African English |
en_US |
dc.subject |
Under-resourced languages |
en_US |
dc.subject |
English accents |
en_US |
dc.title |
Resource development and experiments in automatic SA broadcast news transcription |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Kamper, H., De Wet, F., Hain, T., & Niesler, T. (2012). Resource development and experiments in automatic SA broadcast news transcription. http://hdl.handle.net/10204/5893 |
en_ZA |
dc.identifier.chicagocitation |
Kamper, H, Febe De Wet, T Hain, and T Niesler. "Resource development and experiments in automatic SA broadcast news transcription." (2012): http://hdl.handle.net/10204/5893 |
en_ZA |
dc.identifier.vancouvercitation |
Kamper H, De Wet F, Hain T, Niesler T, Resource development and experiments in automatic SA broadcast news transcription; 2012. http://hdl.handle.net/10204/5893 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Kamper, H
AU - De Wet, Febe
AU - Hain, T
AU - Niesler, T
AB - We present a description of the development and evaluation of a first South African broadcast news transcription system. We describe a number of speech resources which have been collected in the resource-scarce South African environment for system development purposes: a 20 hour corpus of South African English (SAE) broadcast news, a 109M word corpus of South African newspaper text collected for language modelling purposes, and a 60k word SAE pronunciation dictionary. The development of our system is based on similar state-of-the-art broadcast news transcription systems and uses cross-word triphone HMMs, MF-PLP features and per segment cepstral mean and per-bulletin cepstral variance normalisation. Our final system achieves a word error rate of 24.6%. We find that reasonable performance is achieved on spontaneous and telephone speech in our test data. Finally, we consider the recognition of MP3-compressed audio and show that performance deteriorates only at low bit-rates.
DA - 2012-05
DB - ResearchSpace
DP - CSIR
KW - Broadcast news transcription
KW - South African English
KW - Under-resourced languages
KW - English accents
LK - https://researchspace.csir.co.za
PY - 2012
T1 - Resource development and experiments in automatic SA broadcast news transcription
TI - Resource development and experiments in automatic SA broadcast news transcription
UR - http://hdl.handle.net/10204/5893
ER -
|
en_ZA |