dc.contributor.author |
Botha, G
|
|
dc.contributor.author |
Zimu, V
|
|
dc.contributor.author |
Barnard, E
|
|
dc.date.accessioned |
2007-07-04T06:15:37Z |
|
dc.date.available |
2007-07-04T06:15:37Z |
|
dc.date.issued |
2006-11 |
|
dc.identifier.citation |
Botha, G, Zimu, V and Barnard, E.2006. Text-based language identification for the South African languages. 17th Annual Symposium of the Pattern Recognition Association of South Africa, Parys, South Africa, 29 Nov - 1 Dec 2006, pp 7 |
en |
dc.identifier.uri |
http://hdl.handle.net/10204/951
|
|
dc.description |
This paper was later published in the SAIEE Africa Research Journal, Vol 98(4), pp 141-146 |
|
dc.description.abstract |
The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n. |
en |
dc.language.iso |
en |
en |
dc.subject |
Language identification systems |
en |
dc.subject |
Official languages |
en |
dc.subject |
Support Vector Machine |
en |
dc.title |
Text-based language identification for the South African languages |
en |
dc.type |
Conference Presentation |
en |
dc.identifier.apacitation |
Botha, G., Zimu, V., & Barnard, E. (2006). Text-based language identification for the South African languages. http://hdl.handle.net/10204/951 |
en_ZA |
dc.identifier.chicagocitation |
Botha, G, V Zimu, and E Barnard. "Text-based language identification for the South African languages." (2006): http://hdl.handle.net/10204/951 |
en_ZA |
dc.identifier.vancouvercitation |
Botha G, Zimu V, Barnard E, Text-based language identification for the South African languages; 2006. http://hdl.handle.net/10204/951 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Botha, G
AU - Zimu, V
AU - Barnard, E
AB - The authors investigate the performance of text-based language identification systems on the 11 official languages of South Africa, when n-gram statistics are used as features for classification. In particular, the authors compare support vector machines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the additional computational complexity of training this classifier may not be justified in light of the importance of using a large value for n.
DA - 2006-11
DB - ResearchSpace
DP - CSIR
KW - Language identification systems
KW - Official languages
KW - Support Vector Machine
LK - https://researchspace.csir.co.za
PY - 2006
T1 - Text-based language identification for the South African languages
TI - Text-based language identification for the South African languages
UR - http://hdl.handle.net/10204/951
ER -
|
en_ZA |