Text-based language identification for the South African languages

Botha, G; Zimu, V; Barnard, E

Text-based language identification for the South African languages

http://hdl.handle.net/10204/951

Abstract:

The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n.

Reference:

Botha, G, Zimu, V and Barnard, E.2006. Text-based language identification for the South African languages. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 7

Botha, G., Zimu, V., & Barnard, E. (2006). Text-based language identification for the South African languages. http://hdl.handle.net/10204/951

Botha, G, V Zimu, and E Barnard. "Text-based language identification for the South African languages." (2006): http://hdl.handle.net/10204/951

Botha G, Zimu V, Barnard E, Text-based language identification for the South African languages; 2006. http://hdl.handle.net/10204/951 .

Download RIS

This paper was later published in the SAIEE Africa Research Journal, Vol 98(4), pp 141-146

Botha, G
Zimu, V
Barnard, E

Nov 2006

Language identification systems
Official languages
Support Vector Machine

Show full item record

Files in this item

Botha_2006.pdf

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Text-based language identification for the South African languages

Text-based language identification for the South African languages

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect