ResearchSpace

Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis

Show simple item record

dc.contributor.author Govender, Avashna
dc.date.accessioned 2022-10-03T06:52:29Z
dc.date.available 2022-10-03T06:52:29Z
dc.date.issued 2022-05
dc.identifier.citation Govender, A. 2022. Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis. http://hdl.handle.net/10204/12496 . en_ZA
dc.identifier.uri http://hdl.handle.net/10204/12496
dc.description.abstract Voice conversion (VC) is an important technique for the development of text-to-speech voices in the use case of lacking speech resources. VC can convert an audio signal from a source speaker to a specific target speaker whilst maintaining the linguistic information. The benefit of VC is that you only require a small amount of target data which therefore makes it possible to build high quality text-to-speech voices using only a limited amount of speech data. In this work, we implement VC using a Melspectrogram Generatative Adversarial Network called MelGAN-VC. This technique does not require parallel data and has been proven successful on as little as 1 hour of target speech data. The aim of this work was to build child voices by modifying the original one-to-one MelGAN-VC model to a many-to-many model and determine if there is any gain in using such a model. We found that using a many-to-many model performs better than the baseline one-to-one model in terms of speaker similarity and the naturalness of the output speech when using only 24 minutes of speech data. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri http://www.ist-africa.org/Conference2022/default.asp?page=schedule-print&schedule.id=4569&schedule.expanded=yes en_US
dc.source IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022 en_US
dc.subject Voice conversion en_US
dc.subject Text-to-speech voices en_US
dc.subject MelGAN-VC en_US
dc.subject Melspectrogram Generatative Adversarial Network en_US
dc.subject Speech data en_US
dc.title Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis en_US
dc.type Conference Presentation en_US
dc.description.pages 7 en_US
dc.description.note Paper presented at the IST-Africa 2022 Conference, Virtual, South Africa, 16-20 May 2022 en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Voice Computing en_US
dc.identifier.apacitation Govender, A. (2022). Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis. http://hdl.handle.net/10204/12496 en_ZA
dc.identifier.chicagocitation Govender, Avashna. "Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis." <i>IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022</i> (2022): http://hdl.handle.net/10204/12496 en_ZA
dc.identifier.vancouvercitation Govender A, Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis; 2022. http://hdl.handle.net/10204/12496 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Govender, Avashna AB - Voice conversion (VC) is an important technique for the development of text-to-speech voices in the use case of lacking speech resources. VC can convert an audio signal from a source speaker to a specific target speaker whilst maintaining the linguistic information. The benefit of VC is that you only require a small amount of target data which therefore makes it possible to build high quality text-to-speech voices using only a limited amount of speech data. In this work, we implement VC using a Melspectrogram Generatative Adversarial Network called MelGAN-VC. This technique does not require parallel data and has been proven successful on as little as 1 hour of target speech data. The aim of this work was to build child voices by modifying the original one-to-one MelGAN-VC model to a many-to-many model and determine if there is any gain in using such a model. We found that using a many-to-many model performs better than the baseline one-to-one model in terms of speaker similarity and the naturalness of the output speech when using only 24 minutes of speech data. DA - 2022-05 DB - ResearchSpace DP - CSIR J1 - IST-Africa 2022 Conference Proceedings, Virtual, South Africa, 16-20 May 2022 KW - Voice conversion KW - Text-to-speech voices KW - MelGAN-VC KW - Melspectrogram Generatative Adversarial Network KW - Speech data LK - https://researchspace.csir.co.za PY - 2022 T1 - Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis TI - Multi-MelGAN voice conversion for the creation of under-resourced child speech synthesis UR - http://hdl.handle.net/10204/12496 ER - en_ZA
dc.identifier.worklist 25900 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record