ResearchSpace

Evaluating topic models with stability

Show simple item record

dc.contributor.author De Waal, A
dc.contributor.author Barnard, E
dc.date.accessioned 2009-02-18T08:33:20Z
dc.date.available 2009-02-18T08:33:20Z
dc.date.issued 2008-11
dc.identifier.citation De Waal, A and Barnard, E. 2008. Evaluating topic models with stability. Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, 27-28 November 2008, pp 79-84 en
dc.identifier.isbn 9780799223507
dc.identifier.uri http://hdl.handle.net/10204/3016
dc.description Pattern Recognition Association of South Africa (PRASA) 2008 en
dc.description.abstract Topic models are unsupervised techniques that extract likely topics from text corpora, by creating probabilistic word-topic and topic-document associations. Evaluation of topic models is a challenge because (a) topic models are often employed on unlabelled data, so that a ground truth does not exist and (b) "soft" (probabilistic) document clusters are created by state-of-the-art topic models, which complicates comparisons even when ground truth labels are available. Perplexity has often been used as a performance measure, but can only be used for fixed vocabularies and feature sets. The authors turn to an alternative performance measure for topic models - topic stability - and compare its behaviour with perplexity when the vocabulary size is varied. They then evaluate two topic models, LDA and GaP, using topic stability. They also use labelled data to test topic stability on these two models, and show that topic stability has significant potential to evaluate topic models on both labelled and unlabelled corpora en
dc.language.iso en en
dc.publisher PRASA 2008 en
dc.subject Topic models en
dc.subject Topic stability en
dc.subject Topic mode LDA en
dc.subject Topic mode GaP en
dc.title Evaluating topic models with stability en
dc.type Conference Presentation en
dc.identifier.apacitation De Waal, A., & Barnard, E. (2008). Evaluating topic models with stability. PRASA 2008. http://hdl.handle.net/10204/3016 en_ZA
dc.identifier.chicagocitation De Waal, A, and E Barnard. "Evaluating topic models with stability." (2008): http://hdl.handle.net/10204/3016 en_ZA
dc.identifier.vancouvercitation De Waal A, Barnard E, Evaluating topic models with stability; PRASA 2008; 2008. http://hdl.handle.net/10204/3016 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - De Waal, A AU - Barnard, E AB - Topic models are unsupervised techniques that extract likely topics from text corpora, by creating probabilistic word-topic and topic-document associations. Evaluation of topic models is a challenge because (a) topic models are often employed on unlabelled data, so that a ground truth does not exist and (b) "soft" (probabilistic) document clusters are created by state-of-the-art topic models, which complicates comparisons even when ground truth labels are available. Perplexity has often been used as a performance measure, but can only be used for fixed vocabularies and feature sets. The authors turn to an alternative performance measure for topic models - topic stability - and compare its behaviour with perplexity when the vocabulary size is varied. They then evaluate two topic models, LDA and GaP, using topic stability. They also use labelled data to test topic stability on these two models, and show that topic stability has significant potential to evaluate topic models on both labelled and unlabelled corpora DA - 2008-11 DB - ResearchSpace DP - CSIR KW - Topic models KW - Topic stability KW - Topic mode LDA KW - Topic mode GaP LK - https://researchspace.csir.co.za PY - 2008 SM - 9780799223507 T1 - Evaluating topic models with stability TI - Evaluating topic models with stability UR - http://hdl.handle.net/10204/3016 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record