Initial work towards processing Afrikaans spoken lectures in a resource-scarce environment is presented. Two approaches to acoustic modeling for eventual alignment are compared: (a) using a well-trained target-language acoustic model and (b) using an acoustic model from another language, in this case American English. The authors show that while target-language acoustic models are preferable, similar performance can be achieved by repeatedly bootstrapping with the American English model, segmenting and then adapting or training new models using the segmented spoken lectures. The eventual systems perform quite well, aligning more than 90% of a selected set of target words successfully.
Reference:
Van Heerden, CJ, De Villiers, P, Barnard, E and Davel, MH. Processing spoken lectures in resource-scarce environments. Proceedings of the Twenty-Second Annual Symposium of the Pattern Recognition Association of South Africa, Vanderbijlpark, South Africa, 22-25 November 2011, pp 138-143
Van Heerden, C., De Villiers, P., Barnard, E., & Davel, M. (2011). Processing spoken lectures in resource-scarce environments. PRASA. http://hdl.handle.net/10204/5681
Van Heerden, CJ, P De Villiers, E Barnard, and MH Davel. "Processing spoken lectures in resource-scarce environments." (2011): http://hdl.handle.net/10204/5681
Van Heerden C, De Villiers P, Barnard E, Davel M, Processing spoken lectures in resource-scarce environments; PRASA; 2011. http://hdl.handle.net/10204/5681 .
Proceedings of the Twenty-Second Annual Symposium of the Pattern Recognition Association of South Africa, Vanderbijlpark, South Africa, 22-25 November 2011