dc.contributor.author |
Govender, Avashna
|
|
dc.date.accessioned |
2022-10-03T06:52:29Z |
|
dc.date.available |
2022-10-03T06:52:29Z |
|
dc.date.issued |
2022-05 |
|
dc.identifier.citation |
Govender, A. 2022. Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis. http://hdl.handle.net/10204/12496 . |
en_ZA |
dc.identifier.uri |
http://hdl.handle.net/10204/12496
|
|
dc.description.abstract |
Voice conversion (VC) is an important technique for the development of text-to-speech voices in the use case of lacking speech resources. VC can convert an audio signal from a source speaker to a specific target speaker whilst maintaining the linguistic information. The benefit of VC is that you only require a small amount of target data which therefore makes it possible to build high quality text-to-speech voices using only a limited amount of speech data. In this work, we implement VC using a Melspectrogram Generatative Adversarial Network called MelGAN-VC. This technique does not require parallel data and has been proven successful on as little as 1 hour of target speech data. The aim of this work was to build child voices by modifying the original one-to-one MelGAN-VC model to a many-to-many model and determine if there is any gain in using such a model. We found that using a many-to-many model performs better than the baseline one-to-one model in terms of speaker similarity and the naturalness of the output speech when using only 24 minutes of speech data. |
en_US |
dc.format |
Fulltext |
en_US |
dc.language.iso |
en |
en_US |
dc.relation.uri |
http://www.ist-africa.org/Conference2022/default.asp?page=schedule-print&schedule.id=4569&schedule.expanded=yes |
en_US |
dc.source |
IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022 |
en_US |
dc.subject |
Voice conversion |
en_US |
dc.subject |
Text-to-speech voices |
en_US |
dc.subject |
MelGAN-VC |
en_US |
dc.subject |
Melspectrogram Generatative Adversarial Network |
en_US |
dc.subject |
Speech data |
en_US |
dc.title |
Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.description.pages |
7 |
en_US |
dc.description.note |
Paper presented at the IST-Africa 2022 Conference, Virtual, South Africa, 16-20 May 2022 |
en_US |
dc.description.cluster |
Next Generation Enterprises & Institutions |
en_US |
dc.description.impactarea |
Voice Computing |
en_US |
dc.identifier.apacitation |
Govender, A. (2022). Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis. http://hdl.handle.net/10204/12496 |
en_ZA |
dc.identifier.chicagocitation |
Govender, Avashna. "Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis." <i>IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022</i> (2022): http://hdl.handle.net/10204/12496 |
en_ZA |
dc.identifier.vancouvercitation |
Govender A, Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis; 2022. http://hdl.handle.net/10204/12496 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Govender, Avashna
AB - Voice conversion (VC) is an important technique for the development of text-to-speech voices in the use case of lacking speech resources. VC can convert an audio signal from a source speaker to a specific target speaker whilst maintaining the linguistic information. The benefit of VC is that you only require a small amount of target data which therefore makes it possible to build high quality text-to-speech voices using only a limited amount of speech data. In this work, we implement VC using a Melspectrogram Generatative Adversarial Network called MelGAN-VC. This technique does not require parallel data and has been proven successful on as little as 1 hour of target speech data. The aim of this work was to build child voices by modifying the original one-to-one MelGAN-VC model to a many-to-many model and determine if there is any gain in using such a model. We found that using a many-to-many model performs better than the baseline one-to-one model in terms of speaker similarity and the naturalness of the output speech when using only 24 minutes of speech data.
DA - 2022-05
DB - ResearchSpace
DP - CSIR
J1 - IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022
KW - Voice conversion
KW - Text-to-speech voices
KW - MelGAN-VC
KW - Melspectrogram Generatative Adversarial Network
KW - Speech data
LK - https://researchspace.csir.co.za
PY - 2022
T1 - Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis
TI - Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis
UR - http://hdl.handle.net/10204/12496
ER -
|
en_ZA |
dc.identifier.worklist |
25900 |
en_US |