ResearchSpace

BLSTM harvesting of auxiliary NCHLT speech data

Show simple item record

dc.contributor.author Badenhorst, Jacob AC
dc.contributor.author Martinus, Laura
dc.contributor.author De Wet, Febe
dc.date.accessioned 2019-03-26T06:40:59Z
dc.date.available 2019-03-26T06:40:59Z
dc.date.issued 2019-01
dc.identifier.citation Badenhorst, J.A.C., Martinus, L. and De Wet, F. 2019. BLSTM harvesting of auxiliary NCHLT speech data. SAUPEC/RobMech/PRASA 2019 Conference, Bloemfontein, South Africa, 28-30 January 2019 en_US
dc.identifier.isbn 978-1-7281-03686
dc.identifier.uri http://hdl.handle.net/10204/10860
dc.description Paper presented at the SAUPEC/RobMech/PRASA 2019 Conference, Bloemfontein, South Africa, 28-30 January 2019 en_US
dc.description.abstract Since the release of the National Centre for Human Language Technology (NCHLT) Speech corpus, very few additional resources for automatic speech recognition (ASR) system development have been created for South Africa’s eleven official languages. The NCHLT corpus contained a curated but limited subset of the collected data. In this study the auxiliary data that was not included in the released corpus was processed with the aim to improve the acoustic modelling of the NCHLT data. Recent advances in ASR modelling that incorporate deep learning approaches require even more data than previous techniques. Sophisticated neural models seem to accommodate the variability between related acoustic units better and are capable of exploiting speech resources containing more training examples. Our results show that time delay neural networks (TDNN) combined with bi-directional long short-term memory (BLSTM) models are effective, significantly reducing error rates across all languages with just 56 hours of training data. In addition, a cross-corpus evaluation of an Afrikaans system trained on the original NCHLT data plus harvested auxiliary data shows further improvements on this baseline. en_US
dc.language.iso en en_US
dc.relation.ispartofseries Worklist;22219
dc.subject Automatic speech recognition en_US
dc.subject Bidirectional Long Short Term Memory en_US
dc.subject BLSTM en_US
dc.subject Kaldi en_US
dc.subject Languages en_US
dc.subject NCHLT corpora en_US
dc.subject Speech data en_US
dc.subject Under resourced en_US
dc.title BLSTM harvesting of auxiliary NCHLT speech data en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Badenhorst, J. A., Martinus, L., & De Wet, F. (2019). BLSTM harvesting of auxiliary NCHLT speech data. http://hdl.handle.net/10204/10860 en_ZA
dc.identifier.chicagocitation Badenhorst, Jacob AC, Laura Martinus, and Febe De Wet. "BLSTM harvesting of auxiliary NCHLT speech data." (2019): http://hdl.handle.net/10204/10860 en_ZA
dc.identifier.vancouvercitation Badenhorst JA, Martinus L, De Wet F, BLSTM harvesting of auxiliary NCHLT speech data; 2019. http://hdl.handle.net/10204/10860 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Badenhorst, Jacob AC AU - Martinus, Laura AU - De Wet, Febe AB - Since the release of the National Centre for Human Language Technology (NCHLT) Speech corpus, very few additional resources for automatic speech recognition (ASR) system development have been created for South Africa’s eleven official languages. The NCHLT corpus contained a curated but limited subset of the collected data. In this study the auxiliary data that was not included in the released corpus was processed with the aim to improve the acoustic modelling of the NCHLT data. Recent advances in ASR modelling that incorporate deep learning approaches require even more data than previous techniques. Sophisticated neural models seem to accommodate the variability between related acoustic units better and are capable of exploiting speech resources containing more training examples. Our results show that time delay neural networks (TDNN) combined with bi-directional long short-term memory (BLSTM) models are effective, significantly reducing error rates across all languages with just 56 hours of training data. In addition, a cross-corpus evaluation of an Afrikaans system trained on the original NCHLT data plus harvested auxiliary data shows further improvements on this baseline. DA - 2019-01 DB - ResearchSpace DP - CSIR KW - Automatic speech recognition KW - Bidirectional Long Short Term Memory KW - BLSTM KW - Kaldi KW - Languages KW - NCHLT corpora KW - Speech data KW - Under resourced LK - https://researchspace.csir.co.za PY - 2019 SM - 978-1-7281-03686 T1 - BLSTM harvesting of auxiliary NCHLT speech data TI - BLSTM harvesting of auxiliary NCHLT speech data UR - http://hdl.handle.net/10204/10860 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record