ResearchSpace

A comparative study of over-sampling techniques as applied to seismic events

Show simple item record

dc.contributor.author Mokoatle, Mpho
dc.contributor.author Coleman, Toshka
dc.contributor.author Mokilane, Paul M
dc.date.accessioned 2024-01-04T06:42:46Z
dc.date.available 2024-01-04T06:42:46Z
dc.date.issued 2023-12
dc.identifier.citation Mokoatle, M., Coleman, T. & Mokilane, P.M. 2023. A comparative study of over-sampling techniques as applied to seismic events. http://hdl.handle.net/10204/13445 . en_ZA
dc.identifier.isbn 978-3-031-49001-9
dc.identifier.issn 1865-0937
dc.identifier.uri https://doi.org/10.1007/978-3-031-49002-6
dc.identifier.uri http://hdl.handle.net/10204/13445
dc.description.abstract The likelihood that an earthquake will occur in a specific location, within a specific time frame, and with ground motion intensity greater than a specific threshold is known as a seismic hazard. Predicting these types of hazards is crucial since doing so can enable early warnings, which can lessen the negative effects. Research is currently being executed in the field of machine learning to predict seismic events based on previously recorded incidents. However, because these events happen so infrequently, this presents a class imbalance problem to the machine learning or deep learning learners. As a result, this study provided a comparison of the performance of popular over-sampling techniques that seek to even out class imbalance in seismic events data. Specifically, this work applied SMOTE, SMOTENC, SMOTEN, BorderlineSMOTE, SVMSMOTE, and ADASYN to an open source Seismic Bumps dataset then trained several machine learning classifiers with stratified K-fold cross-validation for seismic hazard detection. The SVMSMOTE algorithm was the best over-sampling method as it produced classifiers with the highest overall accuracy, F1 score, recall, and precision of 100%, respectively, whereas the ADASYN over-sampling methodology showed the lowest performance in all the reported metrices of all the models. To our understanding, no research has been done comparing the effectiveness of the aforementioned over-sampling techniques for tasks involving seismic events. en_US
dc.format Fulltext en_US
dc.language.iso en en_US
dc.relation.uri https://2023.sacair.org.za/programme_overview/ en_US
dc.relation.uri https://link.springer.com/chapter/10.1007/978-3-031-49002-6_22 en_US
dc.source The Southern African Conference on AI Research (SACAIR 2023), Muldersdrift, Gauteng, 4-8 December 2023 en_US
dc.subject Seismic events en_US
dc.subject Machine learning en_US
dc.subject Oversampling en_US
dc.subject SVMSMOTE en_US
dc.subject SMOTE en_US
dc.subject ADASYN en_US
dc.title A comparative study of over-sampling techniques as applied to seismic events en_US
dc.type Conference Presentation en_US
dc.description.pages 15 en_US
dc.description.note This is the preprint version of the paper. The published version can be obtained via https://link.springer.com/chapter/10.1007/978-3-031-49002-6_22 en_US
dc.description.cluster Next Generation Enterprises & Institutions en_US
dc.description.impactarea Data Science en_US
dc.identifier.apacitation Mokoatle, M., Coleman, T., & Mokilane, P. M. (2023). A comparative study of over-sampling techniques as applied to seismic events. http://hdl.handle.net/10204/13445 en_ZA
dc.identifier.chicagocitation Mokoatle, Mpho, Toshka Coleman, and Paul M Mokilane. "A comparative study of over-sampling techniques as applied to seismic events." <i>The Southern African Conference on AI Research (SACAIR 2023), Muldersdrift, Gauteng, 4-8 December 2023</i> (2023): http://hdl.handle.net/10204/13445 en_ZA
dc.identifier.vancouvercitation Mokoatle M, Coleman T, Mokilane PM, A comparative study of over-sampling techniques as applied to seismic events; 2023. http://hdl.handle.net/10204/13445 . en_ZA
dc.identifier.ris TY - Conference Presentation AU - Mokoatle, Mpho AU - Coleman, Toshka AU - Mokilane, Paul M AB - The likelihood that an earthquake will occur in a specific location, within a specific time frame, and with ground motion intensity greater than a specific threshold is known as a seismic hazard. Predicting these types of hazards is crucial since doing so can enable early warnings, which can lessen the negative effects. Research is currently being executed in the field of machine learning to predict seismic events based on previously recorded incidents. However, because these events happen so infrequently, this presents a class imbalance problem to the machine learning or deep learning learners. As a result, this study provided a comparison of the performance of popular over-sampling techniques that seek to even out class imbalance in seismic events data. Specifically, this work applied SMOTE, SMOTENC, SMOTEN, BorderlineSMOTE, SVMSMOTE, and ADASYN to an open source Seismic Bumps dataset then trained several machine learning classifiers with stratified K-fold cross-validation for seismic hazard detection. The SVMSMOTE algorithm was the best over-sampling method as it produced classifiers with the highest overall accuracy, F1 score, recall, and precision of 100%, respectively, whereas the ADASYN over-sampling methodology showed the lowest performance in all the reported metrices of all the models. To our understanding, no research has been done comparing the effectiveness of the aforementioned over-sampling techniques for tasks involving seismic events. DA - 2023-12 DB - ResearchSpace DP - CSIR J1 - The Southern African Conference on AI Research (SACAIR 2023), Muldersdrift, Gauteng, 4-8 December 2023 KW - Seismic events KW - Machine learning KW - Oversampling KW - SVMSMOTE KW - SMOTE KW - ADASYN LK - https://researchspace.csir.co.za PY - 2023 SM - 978-3-031-49001-9 SM - 1865-0937 T1 - A comparative study of over-sampling techniques as applied to seismic events TI - A comparative study of over-sampling techniques as applied to seismic events UR - http://hdl.handle.net/10204/13445 ER - en_ZA
dc.identifier.worklist 27370 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record