ResearchSpace

Topic classification of tweets in the broadcasting domain using machine learning methods

Show simple item record

dc.contributor.author Sefara, Tshephisho J
dc.contributor.author Rangata, Mapitsi R
dc.date.accessioned 2023-09-22T12:09:47Z
dc.date.available 2023-09-22T12:09:47Z
dc.date.issued 2023-08
dc.identifier.citation Sefara, T.J. & Rangata, M.R. 2023. Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 . en_ZA
dc.identifier.isbn 979-8-3503-1480-9
dc.identifier.uri DOI: 10.1109/icABCD59051.2023.10220553
dc.identifier.uri http://hdl.handle.net/10204/13087
dc.description.abstract Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://ieeexplore.ieee.org/document/10220553 en_US
dc.relation.uri https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10220553 en_US
dc.relation.uri https://icabcd.org/2023/ en_US
dc.source 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023 en_US
dc.subject Topic modelling en_US
dc.subject Machine learning en_US
dc.subject Natural Language Processing en_US
dc.subject Twitter en_US
dc.subject Topic classification en_US
dc.title Topic classification of tweets in the broadcasting domain using machine learning methods en_US
dc.type Conference Presentation en_US
dc.description.pages 6 en_US
dc.description.note This is the preprint version of the paper. en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Sefara, T. J., & Rangata, M. R. (2023). Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 en_ZA
dc.identifier.chicagocitation Sefara, Tshephisho J, and Mapitsi R Rangata. "Topic classification of tweets in the broadcasting domain using machine learning methods." <i>2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023</i> (2023): http://hdl.handle.net/10204/13087 en_ZA
dc.identifier.vancouvercitation Sefara TJ, Rangata MR, Topic classification of tweets in the broadcasting domain using machine learning methods; 2023. http://hdl.handle.net/10204/13087 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Sefara, Tshephisho J AU - Rangata, Mapitsi R AB - Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models. DA - 2023-08 DB - ResearchSpace DP - CSIR J1 - 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023 KW - Topic modelling KW - Machine learning KW - Natural Language Processing KW - Twitter KW - Topic classification LK - https://researchspace.csir.co.za PY - 2023 SM - 979-8-3503-1480-9 T1 - Topic classification of tweets in the broadcasting domain using machine learning methods TI - Topic classification of tweets in the broadcasting domain using machine learning methods UR - http://hdl.handle.net/10204/13087 ER - en_ZA
dc.identifier.worklist 27061 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record