Resource development and experiments in automatic SA broadcast news transcription

Kamper, H; De Wet, Febe; Hain, T; Niesler, T

Resource development and experiments in automatic SA broadcast news transcription

http://hdl.handle.net/10204/5893

Abstract:

We present a description of the development and evaluation of a first South African broadcast news transcription system. We describe a number of speech resources which have been collected in the resource-scarce South African environment for system development purposes: a 20 hour corpus of South African English (SAE) broadcast news, a 109M word corpus of South African newspaper text collected for language modelling purposes, and a 60k word SAE pronunciation dictionary. The development of our system is based on similar state-of-the-art broadcast news transcription systems and uses cross-word triphone HMMs, MF-PLP features and per segment cepstral mean and per-bulletin cepstral variance normalisation. Our final system achieves a word error rate of 24.6%. We find that reasonable performance is achieved on spontaneous and telephone speech in our test data. Finally, we consider the recognition of MP3-compressed audio and show that performance deteriorates only at low bit-rates.

Reference:

Kamper, H, De Wet, F, Hain, T and Niesler, T. Resource development and experiments in automatic SA broadcast news transcription. SLTU 2012. 3rd Workshop on Spoken Technologies for Under-Resourced Languages, Cape Town, South Africa, 7-9 May 2012

Kamper, H., De Wet, F., Hain, T., & Niesler, T. (2012). Resource development and experiments in automatic SA broadcast news transcription. http://hdl.handle.net/10204/5893

Kamper, H, Febe De Wet, T Hain, and T Niesler. "Resource development and experiments in automatic SA broadcast news transcription." (2012): http://hdl.handle.net/10204/5893

Kamper H, De Wet F, Hain T, Niesler T, Resource development and experiments in automatic SA broadcast news transcription; 2012. http://hdl.handle.net/10204/5893 .

Download RIS

SLTU 2012. 3rd Workshop on Spoken Technologies for Under-Resourced Languages, Cape Town, South Africa, 7-9 May 2012

Kamper, H
De Wet, Febe
Hain, T
Niesler, T

May 2012

Broadcast news transcription
South African English
Under-resourced languages
English accents

Show full item record

Files in this item

DeWet2012.pdf

This item appears in the following Collection(s)

Conference Publications

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Resource development and experiments in automatic SA broadcast news transcription

Resource development and experiments in automatic SA broadcast news transcription

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect