dc.contributor.author |
Giwa, O
|
|
dc.contributor.author |
Davel, MH
|
|
dc.date.accessioned |
2017-09-08T07:05:39Z |
|
dc.date.available |
2017-09-08T07:05:39Z |
|
dc.date.issued |
2015-11 |
|
dc.identifier.citation |
Giwa, O and Davel, MH. 2015. Text-based language identification of multilingual names. 2015 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech), 26-27 November 2015, Port Elizabeth, South Africa, 6pp. |
en_US |
dc.identifier.uri |
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7359517
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/9548
|
|
dc.description |
Copyright: 2015 IEEE. Due to copyright restrictions, the attached PDF file only contains the abstract of the full text item. For access to the full text, kindly consult the publisher's website. |
en_US |
dc.description.abstract |
Text-based language identification (T-LID) of isolated words has been shown to be useful for various speech processing tasks, including pronunciation modelling and data categorisation. When the words to be categorised are proper names, the task becomes more difficult: not only do proper names often have idiosyncratic spellings, they are also often considered to be multilingual. We, therefore, investigate how an existing T-LID technique can be adapted to perform multilingual word classification. That is, given a proper name, which may be either mono- or multilingual, we aim to determine how accurately we can predict how many possible source languages the word has, and what they are. Using a Joint Sequence Modelbased approach to T-LID and the SADE corpus – a newly developed proper names corpus of South African names – we experiment with different approaches to multilingual T-LID. We compare posterior-based and likelihood-based methods and obtain promising results on a challenging task. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
IEEE |
en_US |
dc.relation.ispartofseries |
Worklist;16491 |
|
dc.subject |
Text-based language identification |
en_US |
dc.subject |
T-LID |
en_US |
dc.subject |
Multilingual names |
en_US |
dc.subject |
Speech technologies |
en_US |
dc.title |
Text-based language identification of multilingual names |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Giwa, O., & Davel, M. (2015). Text-based language identification of multilingual names. IEEE. http://hdl.handle.net/10204/9548 |
en_ZA |
dc.identifier.chicagocitation |
Giwa, O, and MH Davel. "Text-based language identification of multilingual names." (2015): http://hdl.handle.net/10204/9548 |
en_ZA |
dc.identifier.vancouvercitation |
Giwa O, Davel M, Text-based language identification of multilingual names; IEEE; 2015. http://hdl.handle.net/10204/9548 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Giwa, O
AU - Davel, MH
AB - Text-based language identification (T-LID) of isolated words has been shown to be useful for various speech processing tasks, including pronunciation modelling and data categorisation. When the words to be categorised are proper names, the task becomes more difficult: not only do proper names often have idiosyncratic spellings, they are also often considered to be multilingual. We, therefore, investigate how an existing T-LID technique can be adapted to perform multilingual word classification. That is, given a proper name, which may be either mono- or multilingual, we aim to determine how accurately we can predict how many possible source languages the word has, and what they are. Using a Joint Sequence Modelbased approach to T-LID and the SADE corpus – a newly developed proper names corpus of South African names – we experiment with different approaches to multilingual T-LID. We compare posterior-based and likelihood-based methods and obtain promising results on a challenging task.
DA - 2015-11
DB - ResearchSpace
DP - CSIR
KW - Text-based language identification
KW - T-LID
KW - Multilingual names
KW - Speech technologies
LK - https://researchspace.csir.co.za
PY - 2015
T1 - Text-based language identification of multilingual names
TI - Text-based language identification of multilingual names
UR - http://hdl.handle.net/10204/9548
ER -
|
en_ZA |