A novel approach to speech rate normalization is presented. Models are constructed to model the way in which speech rate variation of a specific speaker influences the duration of phonemes. The models are evaluated in two ways. Firstly, the mean square error in phoneme duration based on our normalization is compared to the same error when such normalization is not applied. The second evaluation uses the durations of context-dependent phonemes in speaker verification. Both methods show that this approach to normalization is indeed effective to counteract the effect of variable speaking rates.
Reference:
Van Heerden, CJ and Barnard, E. 2006. Speech rate normalization used to improve speaker verification. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 6
Van Heerden, C., & Barnard, E. (2006). Speech rate normalization used to improve speaker verification. http://hdl.handle.net/10204/1045
Van Heerden, CJ, and E Barnard. "Speech rate normalization used to improve speaker verification." (2006): http://hdl.handle.net/10204/1045
Van Heerden C, Barnard E, Speech rate normalization used to improve speaker verification; 2006. http://hdl.handle.net/10204/1045 .