When developing speech recognition systems in resource-constrained environments, careful design of the training corpus can play an important role in compensating for data scarcity. One of the factors to consider relates to the speaker composition of a corpus, finding the appropriate balance between the number of speakers and the number of speaker-specific utterances. The authors define a model stability measure based on the Bhattacharyya bound and apply this to analyse intra- and inter-speaker variability of a training corpus. The authors find that the different phone groups exhibit a significantly different behaviour across groups, but within groups similar trends are observed. They demonstrate that at a predictable point, additional data from one speaker does not contribute further to modelling accuracy and demonstrate the trends that can be expected when additional speakers are added
Reference:
Badenhorst, JAC and Davel, M. 2008. Data requirements for speaker independent acoustic models. 19th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, 27-28 November 2008, pp 147-152
Badenhorst, J., & Davel, M. (2008). Data requirements for speaker independent acoustic models. PRASA 2008. http://hdl.handle.net/10204/3439
Badenhorst, JAC, and M Davel. "Data requirements for speaker independent acoustic models." (2008): http://hdl.handle.net/10204/3439
Badenhorst J, Davel M, Data requirements for speaker independent acoustic models; PRASA 2008; 2008. http://hdl.handle.net/10204/3439 .