Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models.
Reference:
Sefara, T.J. & Rangata, M.R. 2023. Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 .
Sefara, T. J., & Rangata, M. R. (2023). Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087
Sefara, Tshephisho J, and Mapitsi R Rangata. "Topic classification of tweets in the broadcasting domain using machine learning methods." 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023 (2023): http://hdl.handle.net/10204/13087
Sefara TJ, Rangata MR, Topic classification of tweets in the broadcasting domain using machine learning methods; 2023. http://hdl.handle.net/10204/13087 .
2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023