In the modern digital age, the widespread dissemination of misinformation has become a serious issue. Most focus in identifying misinformation online has been targeted at the English language in contrast to low-resource languages like Tshivenda. In this paper, we create a new dataset for news in the Tshivenda language to assist in developing resources for misinformation in the language. In our proposed methodology, we leveraged conditional random fields (CRF), gated recurrent unit (GRU), and long short-term memory (LSTM) to collect and annotate social media content. By applying these deep learning approaches to existing Tshivenda posts, we can assess their effectiveness for identifying false news in a low-resource language setting. This paper emphasises the vital need to combat misinformation in languages with limited resources, such as Tshivenda. Through the creation of a specialised dataset and the use of advanced techniques, it aims to address the problem of the spread of misinformation in low represented language communities.
Reference:
Mukwevho, M., Rananga, S., Mbooi, M.S., Isong, B. & Marivate, V. 2024. Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760 .
Mukwevho, M., Rananga, S., Mbooi, M. S., Isong, B., & Marivate, V. (2024). Building a dataset for misinformation detection in the low-resource language. http://hdl.handle.net/10204/13760
Mukwevho, M, S Rananga, Mahlatse S Mbooi, B Isong, and V Marivate. "Building a dataset for misinformation detection in the low-resource language." IST-Africa Conference (IST-Africa), 20-24 May 2024 (2024): http://hdl.handle.net/10204/13760
Mukwevho M, Rananga S, Mbooi MS, Isong B, Marivate V, Building a dataset for misinformation detection in the low-resource language; 2024. http://hdl.handle.net/10204/13760 .