Pronunciation lexicons often contain pronunciation variants. This can create two problems: It can be difficult to define these variants in an internally consistent way and it can also be difficult to extract generalised grapheme-to-phoneme rule sets from a lexicon containing variants. In this paper we (the authors) address both these issues by creating ‘pseudo-phonemes’ associated with sets of ‘generation restriction rules’ to model those pronunciations that are consistently realised as two or more variants. By pre-processing and post-processing the lexicon appropriately, grapheme-to-phoneme algorithms that were not able to deal with pronunciation variants previously can now be extended to incorporate variants easily, without requiring changes to the standard algorithms. We (the authors) evaluate the effectiveness of this approach using the Default and Refine rule extraction algorithm, and apply the method to both the English Oxford Advanced Learners Dictionary (OALD) and the Flemish FONILEX pronunciation lexicon. We (the authors) find that the approach generalises to different languages, is able to model phonemic variation accurately and is able to identify inconsistent variants in pre-existing lexicons.
Reference:
Davel, M and Barnard, E. 2006. Developing consistent pronunciation models for phonemic variants. Interspeech, Pittsburg, Sept 2006, pp 4
Davel, M., & Barnard, E. (2006). Developing consistent pronunciation models for phonemic variants. http://hdl.handle.net/10204/843
Davel, M, and E Barnard. "Developing consistent pronunciation models for phonemic variants." (2006): http://hdl.handle.net/10204/843
Davel M, Barnard E, Developing consistent pronunciation models for phonemic variants; 2006. http://hdl.handle.net/10204/843 .