Unsupervised topic modelling on South African parliament audio data

Kleynhans, N

dc.contributor.author	Kleynhans, N
dc.date.accessioned	2015-03-12T10:20:02Z
dc.date.available	2015-03-12T10:20:02Z
dc.date.issued	2014-11
dc.identifier.citation	Kleynhans, N. 2014. Unsupervised topic modelling on South African parliament audio data. Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014	en_US
dc.identifier.uri	http://hdl.handle.net/10204/7947
dc.description	Copyright: Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, Cape Town, South Africa, 27-28 November 2014.	en_US
dc.description.abstract	Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament.	en_US
dc.language.iso	en	en_US
dc.publisher	Pattern Recognition Association of South Africa	en_US
dc.relation.ispartofseries	Workflow;14051
dc.subject	Speech recognition systems	en_US
dc.subject	Spoken audio data	en_US
dc.subject	South African parliament audio data	en_US
dc.subject	CMU Sphinx word recognisers	en_US
dc.subject	Hansard transcriptions	en_US
dc.title	Unsupervised topic modelling on South African parliament audio data	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	Kleynhans, N. (2014). Unsupervised topic modelling on South African parliament audio data. Pattern Recognition Association of South Africa. http://hdl.handle.net/10204/7947	en_ZA
dc.identifier.chicagocitation	Kleynhans, N. "Unsupervised topic modelling on South African parliament audio data." (2014): http://hdl.handle.net/10204/7947	en_ZA
dc.identifier.vancouvercitation	Kleynhans N, Unsupervised topic modelling on South African parliament audio data; Pattern Recognition Association of South Africa; 2014. http://hdl.handle.net/10204/7947 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Kleynhans, N AB - Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many text-based topic modelling and identification techniques that become available once the audio to text conversion has occurred. These approaches allow the management and presentation of spoken audio data in a more structured way. In this work, an accurate spoken topic identification system was developed to identify a dominant topic discussed in a South African parliamentary session. This was achieved by using CMU Sphinx word recognisers to convert the conversations to word representations and latent Dirichlet allocation topic modelling techniques. The best topic identification accuracy of 92:3% was obtained on 40 topics, derived from speech recogniser transcriptions and compared to the Hansard transcriptions of National Assembly sessions of the South African Parliament. DA - 2014-11 DB - ResearchSpace DP - CSIR KW - Speech recognition systems KW - Spoken audio data KW - South African parliament audio data KW - CMU Sphinx word recognisers KW - Hansard transcriptions LK - https://researchspace.csir.co.za PY - 2014 T1 - Unsupervised topic modelling on South African parliament audio data TI - Unsupervised topic modelling on South African parliament audio data UR - http://hdl.handle.net/10204/7947 ER -	en_ZA

Files in this item

Name: Kleynhans3_2014.pdf

Size: 162.3Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.