Topic models are unsupervised techniques that extract likely topics from text corpora, by creating probabilistic word-topic and topic-document associations. Evaluation of topic models is a challenge because (a) topic models are often employed on unlabelled data, so that a ground truth does not exist and (b) "soft" (probabilistic) document clusters are created by state-of-the-art topic models, which complicates comparisons even when ground truth labels are available. Perplexity has often been used as a performance measure, but can only be used for fixed vocabularies and feature sets. The authors turn to an alternative performance measure for topic models - topic stability - and compare its behaviour with perplexity when the vocabulary size is varied. They then evaluate two topic models, LDA and GaP, using topic stability. They also use labelled data to test topic stability on these two models, and show that topic stability has significant potential to evaluate topic models on both labelled and unlabelled corpora
Reference:
De Waal, A and Barnard, E. 2008. Evaluating topic models with stability. Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, 27-28 November 2008, pp 79-84
De Waal, A., & Barnard, E. (2008). Evaluating topic models with stability. PRASA 2008. http://hdl.handle.net/10204/3016
De Waal, A, and E Barnard. "Evaluating topic models with stability." (2008): http://hdl.handle.net/10204/3016
De Waal A, Barnard E, Evaluating topic models with stability; PRASA 2008; 2008. http://hdl.handle.net/10204/3016 .
Author:Sefara, Tshephisho J; Rangata, Mapitsi RDate:Aug 2023Twitter is one of the microblogging sites with millions of daily users. Broadcast companies use Twitter to share short messages to engage or share opinions about a particular topic or product. With a large number of conversations available ...Read more
Author:Moodley, Avashlin; Marivate, Vukosi NDate:Nov 2019In election cycles, the political-themed articles published by news providers present a rich source of information about election discourse. Extracting useful themes from a large article corpus manually is infeasible, text mining techniques ...Read more
Author:Mosangi, Damodar; Kesavan Pillai, Sreejarani; Moyo, Lumbidzani; Ray, Suprakas SDate:Aug 2016In this study, the hydrophobic even skin tone active, 4-hexylresorcinol (HR), was intercalated into a zinc aluminium layered double hydroxide (ZnAl-LDH) by a co-precipitation method and used as a controlled release ingredient in skin care ...Read morecb