dc.contributor.author |
Butgereit, L
|
|
dc.contributor.author |
Botha, RA
|
|
dc.date.accessioned |
2013-12-12T07:34:27Z |
|
dc.date.available |
2013-12-12T07:34:27Z |
|
dc.date.issued |
2013-10 |
|
dc.identifier.citation |
Butgereit, L and Botha, R.A. 2013. A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language. In: South African Institute for Computer Scientists and Information Technologists (SAICSIT) 2013, 7-9 October 2013, East London, South Africa |
en_US |
dc.identifier.uri |
http://delivery.acm.org/10.1145/2520000/2513458/p1-butgereit.pdf?ip=146.64.81.115&id=2513458&acc=ACTIVE%20SERVICE&key=C2716FEBFA981EF16F26307A25115533B16AE41C93EF03EC&CFID=387516605&CFTOKEN=24848873&__acm__=1386666358_3b0c923db5c180fedd0a187906ec5273
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/7119
|
|
dc.description |
South African Institute for Computer Scientists and Information Technologists (SAICSIT) 2013, 7-9 October 2013, East London, South Africa. Abstract only attached. |
en_US |
dc.description.abstract |
Mobile Instant Messaging (MIM) systems have produced a new convention in writing where vowels are often omitted, where new suffixes have appeared, where numerals and symbols often appear in the place of letters which have a similar shape or sound, and where words are often spelled phonetically. A word such as mister may be spelled numerous ways including mista and mistr (with new suffixes). When both participants to a MIM conversation understand these new spelling conventions, there is no problem. But in a situation such as automated topic spotting, it is advantageous to attempt to associate these new spellings (mista and mistr) back to the original word (mister). This paper describes work in creating a spelling corrector for MIM conversations for use after stop words have been removed from a conversation, after words have been stemmed, and after double letters have been collapsed to single letters. Four different similarity calculations Jaccard, Sørensen-Dice, Cosine, and Overlap are investigated and tested with historical data from the Dr Math mobile tutoring environment. This research found that the Overlap similarity calculation was the least accurate of the four measured. In situations where the length of the various words were the same, Sørensen-Dice and Cosine similarity calculations were identical. Jaccard and Sørensen-Dice worked equally well, however, they required different numerical cut-off values for misspelled words. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
ACM Digital Library |
en_US |
dc.relation.ispartofseries |
Workflow;11770 |
|
dc.subject |
Algorithms |
en_US |
dc.subject |
N-grams |
en_US |
dc.subject |
Spelling |
en_US |
dc.subject |
Dr math |
en_US |
dc.subject |
Mobile Instant Messaging |
en_US |
dc.subject |
MIM |
en_US |
dc.title |
A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Butgereit, L., & Botha, R. (2013). A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language. ACM Digital Library. http://hdl.handle.net/10204/7119 |
en_ZA |
dc.identifier.chicagocitation |
Butgereit, L, and RA Botha. "A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language." (2013): http://hdl.handle.net/10204/7119 |
en_ZA |
dc.identifier.vancouvercitation |
Butgereit L, Botha R, A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language; ACM Digital Library; 2013. http://hdl.handle.net/10204/7119 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Butgereit, L
AU - Botha, RA
AB - Mobile Instant Messaging (MIM) systems have produced a new convention in writing where vowels are often omitted, where new suffixes have appeared, where numerals and symbols often appear in the place of letters which have a similar shape or sound, and where words are often spelled phonetically. A word such as mister may be spelled numerous ways including mista and mistr (with new suffixes). When both participants to a MIM conversation understand these new spelling conventions, there is no problem. But in a situation such as automated topic spotting, it is advantageous to attempt to associate these new spellings (mista and mistr) back to the original word (mister). This paper describes work in creating a spelling corrector for MIM conversations for use after stop words have been removed from a conversation, after words have been stemmed, and after double letters have been collapsed to single letters. Four different similarity calculations Jaccard, Sørensen-Dice, Cosine, and Overlap are investigated and tested with historical data from the Dr Math mobile tutoring environment. This research found that the Overlap similarity calculation was the least accurate of the four measured. In situations where the length of the various words were the same, Sørensen-Dice and Cosine similarity calculations were identical. Jaccard and Sørensen-Dice worked equally well, however, they required different numerical cut-off values for misspelled words.
DA - 2013-10
DB - ResearchSpace
DP - CSIR
KW - Algorithms
KW - N-grams
KW - Spelling
KW - Dr math
KW - Mobile Instant Messaging
KW - MIM
LK - https://researchspace.csir.co.za
PY - 2013
T1 - A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language
TI - A comparison of different calculations for N-Gram similarities in a spelling corrector for mobile instant messaging language
UR - http://hdl.handle.net/10204/7119
ER -
|
en_ZA |