The phonetic segmentation of recorded speech is a crucial factor in the quality of concatenative systems for speech synthesis. The authors describe a likelihood-based error detection process that can be used to flag possible errors in such a segmentation, with a view towards manual correction. It is shown that this process can be used to assist in the creation of high-accuracy segmentations. In particular, for an isiZulu corpus used in the creation of a unit-selection synthesizer, almost half of the errors that existed in a manual segmentation were detected by this process, while flagging less than a quarter of all segments. Different phoneme classes are handled with differing amounts of success, with vowels being the most troublesome
Reference:
Barnard, E and Davel, M. 2006. Automatic error detection in alignments for speech synthesis. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 4
Barnard, E., & Davel, M. (2006). Automatic error detection in alignments for speech synthesis. http://hdl.handle.net/10204/1044
Barnard, E, and M Davel. "Automatic error detection in alignments for speech synthesis." (2006): http://hdl.handle.net/10204/1044
Barnard E, Davel M, Automatic error detection in alignments for speech synthesis; 2006. http://hdl.handle.net/10204/1044 .