Gender identification is the task of identifying the gender of the speaker from the audio signal. Most gender identification systems are developed using datasets belonging to well-resourced languages. There has been little focus on creating gender identification systems for under resourced African languages. This paper presents the development of a gender identification system using a Sepedi speech dataset containing a duration of 55.7 hours made of 30776 males and 28337 females. We build a gender identification system using machine learning models that are trained using multilayer Perceptron (MLP), convolutional neural network (CNN), and long short-term memory (LSTM). Mid-term features are extracted from time domain features, frequency domain features and cepstral domain features, and normalised using the Z-score normalisation technique. XGBoost is used as a feature selection method to select important features. MLP achieved the same F-score and an accuracy of 94% for data with seen speakers while LSTM and CNN achieved the same F-score and an accuracy of 97%. We further evaluated the models on data with unseen speakers. All the models achieved good performance in F-score and accuracy.
Reference:
Sefara, T.J. & Mokgonyane, T. 2021. Gender identification in Sepedi speech corpus. http://hdl.handle.net/10204/12120 .
Sefara, T. J., & Mokgonyane, T. (2021). Gender identification in Sepedi speech corpus. http://hdl.handle.net/10204/12120
Sefara, Tshephisho J, and TB Mokgonyane. "Gender identification in Sepedi speech corpus." 2021 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 5-6 August 2021 (2021): http://hdl.handle.net/10204/12120
2021 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Durban, South Africa, 5-6 August 2021