dc.contributor.author |
De Waal, A
|
|
dc.contributor.author |
Barnard, E
|
|
dc.date.accessioned |
2009-02-18T08:33:20Z |
|
dc.date.available |
2009-02-18T08:33:20Z |
|
dc.date.issued |
2008-11 |
|
dc.identifier.citation |
De Waal, A and Barnard, E. 2008. Evaluating topic models with stability. Nineteenth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, South Africa, 27-28 November 2008, pp 79-84 |
en |
dc.identifier.isbn |
9780799223507 |
|
dc.identifier.uri |
http://hdl.handle.net/10204/3016
|
|
dc.description |
Pattern Recognition Association of South Africa (PRASA) 2008 |
en |
dc.description.abstract |
Topic models are unsupervised techniques that extract likely topics from text corpora, by creating probabilistic word-topic and topic-document associations. Evaluation of topic models is a challenge because (a) topic models are often employed on unlabelled data, so that a ground truth does not exist and (b) "soft" (probabilistic) document clusters are created by state-of-the-art topic models, which complicates comparisons even when ground truth labels are available. Perplexity has often been used as a performance measure, but can only be used for fixed vocabularies and feature sets. The authors turn to an alternative performance measure for topic models - topic stability - and compare its behaviour with perplexity when the vocabulary size is varied. They then evaluate two topic models, LDA and GaP, using topic stability. They also use labelled data to test topic stability on these two models, and show that topic stability has significant potential to evaluate topic models on both labelled and unlabelled corpora |
en |
dc.language.iso |
en |
en |
dc.publisher |
PRASA 2008 |
en |
dc.subject |
Topic models |
en |
dc.subject |
Topic stability |
en |
dc.subject |
Topic mode LDA |
en |
dc.subject |
Topic mode GaP |
en |
dc.title |
Evaluating topic models with stability |
en |
dc.type |
Conference Presentation |
en |
dc.identifier.apacitation |
De Waal, A., & Barnard, E. (2008). Evaluating topic models with stability. PRASA 2008. http://hdl.handle.net/10204/3016 |
en_ZA |
dc.identifier.chicagocitation |
De Waal, A, and E Barnard. "Evaluating topic models with stability." (2008): http://hdl.handle.net/10204/3016 |
en_ZA |
dc.identifier.vancouvercitation |
De Waal A, Barnard E, Evaluating topic models with stability; PRASA 2008; 2008. http://hdl.handle.net/10204/3016 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - De Waal, A
AU - Barnard, E
AB - Topic models are unsupervised techniques that extract likely topics from text corpora, by creating probabilistic word-topic and topic-document associations. Evaluation of topic models is a challenge because (a) topic models are often employed on unlabelled data, so that a ground truth does not exist and (b) "soft" (probabilistic) document clusters are created by state-of-the-art topic models, which complicates comparisons even when ground truth labels are available. Perplexity has often been used as a performance measure, but can only be used for fixed vocabularies and feature sets. The authors turn to an alternative performance measure for topic models - topic stability - and compare its behaviour with perplexity when the vocabulary size is varied. They then evaluate two topic models, LDA and GaP, using topic stability. They also use labelled data to test topic stability on these two models, and show that topic stability has significant potential to evaluate topic models on both labelled and unlabelled corpora
DA - 2008-11
DB - ResearchSpace
DP - CSIR
KW - Topic models
KW - Topic stability
KW - Topic mode LDA
KW - Topic mode GaP
LK - https://researchspace.csir.co.za
PY - 2008
SM - 9780799223507
T1 - Evaluating topic models with stability
TI - Evaluating topic models with stability
UR - http://hdl.handle.net/10204/3016
ER -
|
en_ZA |