dc.contributor.author |
Kleynhans, N
|
|
dc.date.accessioned |
2015-03-12T10:20:02Z |
|
dc.date.available |
2015-03-12T10:20:02Z |
|
dc.date.issued |
2014-11 |
|
dc.identifier.citation |
Kleynhans, N. 2014. Unsupervised topic modelling on South African parliament audio data. Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014 |
en_US |
dc.identifier.uri |
http://hdl.handle.net/10204/7947
|
|
dc.description |
Copyright: Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014. |
en_US |
dc.description.abstract |
Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Pattern Recognition Association of South Africa |
en_US |
dc.relation.ispartofseries |
Workflow;14051 |
|
dc.subject |
Speech recognition systems |
en_US |
dc.subject |
Spoken audio data |
en_US |
dc.subject |
South African parliament audio data |
en_US |
dc.subject |
CMU Sphinx word recognisers |
en_US |
dc.subject |
Hansard transcriptions |
en_US |
dc.title |
Unsupervised topic modelling on South African parliament audio data |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Kleynhans, N. (2014). Unsupervised topic modelling on South African parliament audio data. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7947 |
en_ZA |
dc.identifier.chicagocitation |
Kleynhans, N. "Unsupervised topic modelling on South African parliament audio data." (2014): http://hdl.handle.net/10204/7947 |
en_ZA |
dc.identifier.vancouvercitation |
Kleynhans N, Unsupervised topic modelling on South African parliament audio data; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7947 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Kleynhans, N
AB - Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament.
DA - 2014-11
DB - ResearchSpace
DP - CSIR
KW - Speech recognition systems
KW - Spoken audio data
KW - South African parliament audio data
KW - CMU Sphinx word recognisers
KW - Hansard transcriptions
LK - https://researchspace.csir.co.za
PY - 2014
T1 - Unsupervised topic modelling on South African parliament audio data
TI - Unsupervised topic modelling on South African parliament audio data
UR - http://hdl.handle.net/10204/7947
ER -
|
en_ZA |