dc.contributor.author |
Sefara, Tshephisho J
|
|
dc.contributor.author |
Rangata, Mapitsi R
|
|
dc.date.accessioned |
2023-09-22T12:09:47Z |
|
dc.date.available |
2023-09-22T12:09:47Z |
|
dc.date.issued |
2023-08 |
|
dc.identifier.citation |
Sefara, T.J. & Rangata, M.R. 2023. Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 . |
en_ZA |
dc.identifier.isbn |
979-8-3503-1480-9 |
|
dc.identifier.uri |
DOI: 10.1109/icABCD59051.2023.10220553
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/13087
|
|
dc.description.abstract |
Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/document/10220553 |
en_US |
dc.relation.uri |
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10220553 |
en_US |
dc.relation.uri |
https://icabcd.org/2023/ |
en_US |
dc.source |
2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023 |
en_US |
dc.subject |
Topic modelling |
en_US |
dc.subject |
Machine learning |
en_US |
dc.subject |
Natural Language Processing |
en_US |
dc.subject |
Twitter |
en_US |
dc.subject |
Topic classification |
en_US |
dc.title |
Topic classification of tweets in the broadcasting domain using machine learning methods |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
6 |
en_US |
dc.description.note |
This is the preprint version of the paper. |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Data Science |
en_US |
dc.identifier.apacitation |
Sefara, T. J., & Rangata, M. R. (2023). Topic classification of tweets in the broadcasting domain using machine learning methods. http://hdl.handle.net/10204/13087 |
en_ZA |
dc.identifier.chicagocitation |
Sefara, Tshephisho J, and Mapitsi R Rangata. "Topic classification of tweets in the broadcasting domain using machine learning methods." <i>2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023</i> (2023): http://hdl.handle.net/10204/13087 |
en_ZA |
dc.identifier.vancouvercitation |
Sefara TJ, Rangata MR, Topic classification of tweets in the broadcasting domain using machine learning methods; 2023. http://hdl.handle.net/10204/13087 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Sefara, Tshephisho J
AU - Rangata, Mapitsi R
AB - Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available on Twitter, it is difficult to identify the category of topics in the broadcasting domain. This paper proposes the use of unsupervised learning to generate topics from unlabelled tweet data sets in the broadcasting domain using the latent Dirichlet allocation (LDA) method. Approximately six groups of topics were generated and each group was assigned a label or category. These labels were used to label the data by finding the dominating label in each tweet as the main category. Supervised learning was conducted to train six machine learning models which are multinomial logistic regression, XGBoost, decision trees, random forest, support vector machines, and multilayer perceptron (MLP). The models were able to learn from the data to predict the category of each tweet from the testing data. The models were evaluated using accuracy and the f1 score. Linear support vector machine and MLP obtained better classification results compared to other trained models.
DA - 2023-08
DB - ResearchSpace
DP - CSIR
J1 - 2023 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 3-4 August 2023
KW - Topic modelling
KW - Machine learning
KW - Natural Language Processing
KW - Twitter
KW - Topic classification
LK - https://researchspace.csir.co.za
PY - 2023
SM - 979-8-3503-1480-9
T1 - Topic classification of tweets in the broadcasting domain using machine learning methods
TI - Topic classification of tweets in the broadcasting domain using machine learning methods
UR - http://hdl.handle.net/10204/13087
ER -
|
en_ZA |
dc.identifier.worklist |
27061 |
en_US |