Statistical machine translation techniques offer great promise for the development of automatic translation systems. However, the realization of this potential requires the availability of significant amounts of parallel bilingual texts. This paper reports on an attempt to reduce the amount of text that is required to obtain an acceptable translation system, through the use of active and semi-supervised learning. Systems were built using resources collected from South African government websites and the results evaluated using a standard automatic evaluation metric (BLEU). The authors show that significant improvements in translation quality can be achieved with very limited parallel corpora, and that both active learning and semi-supervised learning are useful in this context.
Reference:
Ronald, K and Barnard, E. 2006. Statistical translation with scarce resources: a South African case study. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 5
Ronald, K., & Barnard, E. (2006). Statistical translation with scarce resources: a South African case study. http://hdl.handle.net/10204/880
Ronald, K, and E Barnard. "Statistical translation with scarce resources: a South African case study." (2006): http://hdl.handle.net/10204/880
Ronald K, Barnard E, Statistical translation with scarce resources: a South African case study; 2006. http://hdl.handle.net/10204/880 .