dc.contributor.author |
Badenhorst, Jacob AC
|
|
dc.contributor.author |
Martinus, Laura
|
|
dc.contributor.author |
De Wet, Febe
|
|
dc.date.accessioned |
2019-03-26T06:40:59Z |
|
dc.date.available |
2019-03-26T06:40:59Z |
|
dc.date.issued |
2019-01 |
|
dc.identifier.citation |
Badenhorst, J.A.C., Martinus, L. and De Wet, F. 2019. BLSTM harvesting of auxiliary NCHLT speech data. SAUPEC/RobMech/PRASA 2019 Conference, Bloemfontein, South Africa, 28-30 January 2019 |
en_US |
dc.identifier.isbn |
978-1-7281-03686 |
|
dc.identifier.uri |
http://hdl.handle.net/10204/10860
|
|
dc.description |
Paper presented at the SAUPEC/RobMech/PRASA 2019 Conference, Bloemfontein, South Africa, 28-30 January 2019 |
en_US |
dc.description.abstract |
Since the release of the National Centre for Human Language Technology (NCHLT) Speech corpus, very few additional resources for automatic speech recognition (ASR) system development have been created for South Africa’s eleven official languages. The NCHLT corpus contained a curated but limited subset of the collected data. In this study the auxiliary data that was not included in the released corpus was processed with the aim to improve the acoustic modelling of the NCHLT data. Recent advances in ASR modelling that incorporate deep learning approaches require even more data than previous techniques. Sophisticated neural models seem to accommodate the variability between related acoustic units better and are capable of exploiting speech resources containing more training examples. Our results show that time delay neural networks (TDNN) combined with bi-directional long short-term memory (BLSTM) models are effective, significantly reducing error rates across all languages with just 56 hours of training data. In addition, a cross-corpus evaluation of an Afrikaans system trained on the original NCHLT data plus harvested auxiliary data shows further improvements on this baseline. |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.ispartofseries |
Worklist;22219 |
|
dc.subject |
Automatic speech recognition |
en_US |
dc.subject |
Bidirectional Long Short Term Memory |
en_US |
dc.subject |
BLSTM |
en_US |
dc.subject |
Kaldi |
en_US |
dc.subject |
Languages |
en_US |
dc.subject |
NCHLT corpora |
en_US |
dc.subject |
Speech data |
en_US |
dc.subject |
Under resourced |
en_US |
dc.title |
BLSTM harvesting of auxiliary NCHLT speech data |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Badenhorst, J. A., Martinus, L., & De Wet, F. (2019). BLSTM harvesting of auxiliary NCHLT speech data. http://hdl.handle.net/10204/10860 |
en_ZA |
dc.identifier.chicagocitation |
Badenhorst, Jacob AC, Laura Martinus, and Febe De Wet. "BLSTM harvesting of auxiliary NCHLT speech data." (2019): http://hdl.handle.net/10204/10860 |
en_ZA |
dc.identifier.vancouvercitation |
Badenhorst JA, Martinus L, De Wet F, BLSTM harvesting of auxiliary NCHLT speech data; 2019. http://hdl.handle.net/10204/10860 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Badenhorst, Jacob AC
AU - Martinus, Laura
AU - De Wet, Febe
AB - Since the release of the National Centre for Human Language Technology (NCHLT) Speech corpus, very few additional resources for automatic speech recognition (ASR) system development have been created for South Africa’s eleven official languages. The NCHLT corpus contained a curated but limited subset of the collected data. In this study the auxiliary data that was not included in the released corpus was processed with the aim to improve the acoustic modelling of the NCHLT data. Recent advances in ASR modelling that incorporate deep learning approaches require even more data than previous techniques. Sophisticated neural models seem to accommodate the variability between related acoustic units better and are capable of exploiting speech resources containing more training examples. Our results show that time delay neural networks (TDNN) combined with bi-directional long short-term memory (BLSTM) models are effective, significantly reducing error rates across all languages with just 56 hours of training data. In addition, a cross-corpus evaluation of an Afrikaans system trained on the original NCHLT data plus harvested auxiliary data shows further improvements on this baseline.
DA - 2019-01
DB - ResearchSpace
DP - CSIR
KW - Automatic speech recognition
KW - Bidirectional Long Short Term Memory
KW - BLSTM
KW - Kaldi
KW - Languages
KW - NCHLT corpora
KW - Speech data
KW - Under resourced
LK - https://researchspace.csir.co.za
PY - 2019
SM - 978-1-7281-03686
T1 - BLSTM harvesting of auxiliary NCHLT speech data
TI - BLSTM harvesting of auxiliary NCHLT speech data
UR - http://hdl.handle.net/10204/10860
ER -
|
en_ZA |