Text-based language identification of multilingual names

Giwa, O; Davel, MH

dc.contributor.author	Giwa, O
dc.contributor.author	Davel, MH
dc.date.accessioned	2017-09-08T07:05:39Z
dc.date.available	2017-09-08T07:05:39Z
dc.date.issued	2015-11
dc.identifier.citation	Giwa, O and Davel, MH. 2015. Text-based language identification of multilingual names. 2015 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 26-27 November 2015, Port Elizabeth, South Africa, 6pp.	en_US
dc.identifier.uri	http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7359517
dc.identifier.uri	http://hdl.handle.net/10204/9548
dc.description	Copyright: 2015 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text, kindly consult the publisher's website.	en_US
dc.description.abstract	Text-based language identification (T-LID) of isolated words has been shown to be useful for various speech processing tasks, including pronunciation modelling and data categorisation. When the words to be categorised are proper names, the task becomes more difficult: not only do proper names often have idiosyncratic spellings, they are also often considered to be multilingual. We, therefore, investigate how an existing T-LID technique can be adapted to perform multilingual word classification. That is, given a proper name, which may be either mono- or multilingual, we aim to determine how accurately we can predict how many possible source languages the word has, and what they are. Using a Joint Sequence Modelbased approach to T-LID and the SADE corpus – a newly developed proper names corpus of South African names – we experiment with different approaches to multilingual T-LID. We compare posterior-based and likelihood-based methods and obtain promising results on a challenging task.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartofseries	Worklist;16491
dc.subject	Text-based language identification	en_US
dc.subject	T-LID	en_US
dc.subject	Multilingual names	en_US
dc.subject	Speech technologies	en_US
dc.title	Text-based language identification of multilingual names	en_US
dc.type	Conference Presentation	en_US
dc.identifier.apacitation	Giwa, O., & Davel, M. (2015). Text-based language identification of multilingual names. IEEE. http://hdl.handle.net/10204/9548	en_ZA
dc.identifier.chicagocitation	Giwa, O, and MH Davel. "Text-based language identification of multilingual names." (2015): http://hdl.handle.net/10204/9548	en_ZA
dc.identifier.vancouvercitation	Giwa O, Davel M, Text-based language identification of multilingual names; IEEE; 2015. http://hdl.handle.net/10204/9548 .	en_ZA
dc.identifier.ris	TY - Conference Presentation AU - Giwa, O AU - Davel, MH AB - Text-based language identification (T-LID) of isolated words has been shown to be useful for various speech processing tasks, including pronunciation modelling and data categorisation. When the words to be categorised are proper names, the task becomes more difficult: not only do proper names often have idiosyncratic spellings, they are also often considered to be multilingual. We, therefore, investigate how an existing T-LID technique can be adapted to perform multilingual word classification. That is, given a proper name, which may be either mono- or multilingual, we aim to determine how accurately we can predict how many possible source languages the word has, and what they are. Using a Joint Sequence Modelbased approach to T-LID and the SADE corpus – a newly developed proper names corpus of South African names – we experiment with different approaches to multilingual T-LID. We compare posterior-based and likelihood-based methods and obtain promising results on a challenging task. DA - 2015-11 DB - ResearchSpace DP - CSIR KW - Text-based language identification KW - T-LID KW - Multilingual names KW - Speech technologies LK - https://researchspace.csir.co.za PY - 2015 T1 - Text-based language identification of multilingual names TI - Text-based language identification of multilingual names UR - http://hdl.handle.net/10204/9548 ER -	en_ZA

Files in this item

Name: Davel_16491_2015.pdf

Size: 59.03Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Conference Publications

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.