Generation of segmented isiZulu text

Mkhwanazi, Sthembiso; Marais, Laurette

Generation of segmented isiZulu text

https://doi.org/10.55492/dhasa.v5i1.5034
http://hdl.handle.net/10204/13749

Abstract:

The complex morphology, conjunctive orthography and widespread occurrence of morphophonological alternation in the Nguni languages have given rise to several efforts towards morphological segmentation of tokens of Nguni languages. For supervised methods, annotated data is required, which currently exists as canonically segmented data in the NCHLT corpus and surface segmented data in the Ukwabelana corpus. In this paper, we present a method and segmentation strategy based on a computational grammar for isiZulu. The grammar, which itself has some limitations in processing speed and robustness to unexpected input, is used to create a new set of segmentations for the tokens of the Ukwabelana corpus.

Reference:

Mkhwanazi, S. & Marais, L. 2024. Generation of segmented isiZulu text. Journal of the Digital Humanities Association of Southern Africa, 5(1). http://hdl.handle.net/10204/13749

Mkhwanazi, S., & Marais, L. (2024). Generation of segmented isiZulu text. Journal of the Digital Humanities Association of Southern Africa, 5(1), http://hdl.handle.net/10204/13749

Mkhwanazi, Sthembiso, and Laurette Marais "Generation of segmented isiZulu text." Journal of the Digital Humanities Association of Southern Africa, 5(1) (2024) http://hdl.handle.net/10204/13749

Mkhwanazi S, Marais L. Generation of segmented isiZulu text. Journal of the Digital Humanities Association of Southern Africa, 5(1). 2024; http://hdl.handle.net/10204/13749.

Download RIS

Mkhwanazi, Sthembiso
Marais, Laurette

Feb 2024

Nguni languages
Agglutinative languages
Morphological segmentation
Language models
Segmented isiZulu text

Show full item record

Files in this item

RS_28061_Generation of segmented isiZulu text_2024.pdf

Source

Journal of the Digital Humanities Association of Southern Africa, 5(1)

This item appears in the following Collection(s)

Journal Articles

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.

Generation of segmented isiZulu text

Generation of segmented isiZulu text

This item appears in the following Collection(s)

Browse

All of ResearchSpace

This Collection

Quick Links

Legislation and compliance

General Enquiries

Social Connect