ResearchSpace

Unsupervised topic modelling on South African parliament audio data

Show simple item record

dc.contributor.author Kleynhans, N
dc.date.accessioned 2015-03-12T10:20:02Z
dc.date.available 2015-03-12T10:20:02Z
dc.date.issued 2014-11
dc.identifier.citation Kleynhans, N. 2014. Unsupervised topic modelling on South African parliament audio data. Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014 en_US
dc.identifier.uri http://hdl.handle.net/10204/7947
dc.description Copyright: Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014. en_US
dc.description.abstract Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament. en_US
dc.language.iso en en_US
dc.publisher Pattern Recognition Association of South Africa en_US
dc.relation.ispartofseries Workflow;14051
dc.subject Speech recognition systems en_US
dc.subject Spoken audio data en_US
dc.subject South African parliament audio data en_US
dc.subject CMU Sphinx word recognisers en_US
dc.subject Hansard transcriptions en_US
dc.title Unsupervised topic modelling on South African parliament audio data en_US
dc.type Conference Presentation en_US
dc.identifier.apacitation Kleynhans, N. (2014). Unsupervised topic modelling on South African parliament audio data. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7947 en_ZA
dc.identifier.chicagocitation Kleynhans, N. "Unsupervised topic modelling on South African parliament audio data." (2014): http://hdl.handle.net/10204/7947 en_ZA
dc.identifier.vancouvercitation Kleynhans N, Unsupervised topic modelling on South African parliament audio data; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7947 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Kleynhans, N AB - Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament. DA - 2014-11 DB - ResearchSpace DP - CSIR KW - Speech recognition systems KW - Spoken audio data KW - South African parliament audio data KW - CMU Sphinx word recognisers KW - Hansard transcriptions LK - https://researchspace.csir.co.za PY - 2014 T1 - Unsupervised topic modelling on South African parliament audio data TI - Unsupervised topic modelling on South African parliament audio data UR - http://hdl.handle.net/10204/7947 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record