N-grams are used to quantify the similarity between two documents or the similarity between two collections of words. This paper shows how N-grams of length 3 and 4 both coupled with text processing (including stop word removal and stemming according to MXit spelling conventions) can be used to categorize very short mathematical conversations conducted in MXit lingo into broad mathematical groups such as algebra, geometry, trigonometry, and calculus. MXit lingo is an abbreviated form of written English which children, teenagers and young adults utilise when communicating using the popular MXit chat mechanism over cell phones. Conversations from the "Dr Math" project were used for this analysis. "Dr Math" is a mathematical tutoring service which links primary and secondary school pupils to tutors from local universities. The tutors assist the pupils with their mathematics homework.
Reference:
Butgereit, LL and Botha, RA. Using N-grams to identify mathematical topics in MXit lingo. Annual Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT 2011), Cape Town, South Africa, 3-5 October 2011, pp 40-48
Butgereit, L., & Botha, R. (2011). Using N-grams to identify mathematical topics in MXit lingo. ACM. http://hdl.handle.net/10204/5814
Butgereit, LL, and RA Botha. "Using N-grams to identify mathematical topics in MXit lingo." (2011): http://hdl.handle.net/10204/5814
Butgereit L, Botha R, Using N-grams to identify mathematical topics in MXit lingo; ACM; 2011. http://hdl.handle.net/10204/5814 .