Since the release of the National Centre for Human Language Technology (NCHLT) Speech corpus, very few additional resources for automatic speech recognition (ASR) system development have been created for South Africa’s eleven official languages. The NCHLT corpus contained a curated but limited subset of the collected data. In this study the auxiliary data that was not included in the released corpus was processed with the aim to improve the acoustic modelling of the NCHLT data. Recent advances in ASR modelling that incorporate deep learning approaches require even more data than previous techniques. Sophisticated neural models seem to accommodate the variability between related acoustic units better and are capable of exploiting speech resources containing more training examples. Our results show that time delay neural networks (TDNN) combined with bi-directional long short-term memory (BLSTM) models are effective, significantly reducing error rates across all languages with just 56 hours of training data. In addition, a cross-corpus evaluation of an Afrikaans system trained on the original NCHLT data plus harvested auxiliary data shows further improvements on this baseline.
Reference:
Badenhorst, J.A.C., Martinus, L. and De Wet, F. 2019. BLSTM harvesting of auxiliary NCHLT speech data. SAUPEC/RobMech/PRASA 2019 Conference, Bloemfontein, South Africa, 28-30 January 2019
Badenhorst, J. A., Martinus, L., & De Wet, F. (2019). BLSTM harvesting of auxiliary NCHLT speech data. http://hdl.handle.net/10204/10860
Badenhorst, Jacob AC, Laura Martinus, and Febe De Wet. "BLSTM harvesting of auxiliary NCHLT speech data." (2019): http://hdl.handle.net/10204/10860
Badenhorst JA, Martinus L, De Wet F, BLSTM harvesting of auxiliary NCHLT speech data; 2019. http://hdl.handle.net/10204/10860 .