ResearchSpace

Predicting incomplete gene microarray data with the use of supervised learning algorithms

Show simple item record

dc.contributor.author Twala, B
dc.contributor.author Phorah, M
dc.date.accessioned 2010-10-22T08:30:59Z
dc.date.available 2010-10-22T08:30:59Z
dc.date.issued 2010-10
dc.identifier.citation Twala, B and Phorah, M. 2010. Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognition Letters, Vol. 31(13), pp 2061–2069 en
dc.identifier.issn 0167-8655
dc.identifier.uri http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V15-502977W-1-1P&_cdi=5665&_user=958262&_pii=S0167865510001467&_origin=search&_coverDate=10/01/2010&_sk=999689986&view=c&wchp=dGLbVzb-zSkWA&md5=9de5840e663d3d130591d3c4244e42b9&ie=/sdarticle.pdf
dc.identifier.uri http://hdl.handle.net/10204/4485
dc.description Copyright: 2010 Elsevier. This is the post print version of the work. The definitive version is published in Pattern Recognition Letters, Vol. 31(13), pp 2061–2069 en
dc.description.abstract With the wealth of sequence data and the huge amount of data generated from molecular technologies, the issue of gene classification/prediction has become a central challenge in the field of microarray data analysis. This has led to the application of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain instances. However, in some research situations we often have to classify instances given incomplete vectors, which can affect the predictive accuracy of learned classifiers. The task of learning an accurate incomplete data classifier from instances raises a number of new issues some of which have not been properly addressed by bioinformatics research. Thus, an effective missing value estimation method is required for improving predictive accuracy. Results: The essence of the approach is the proposal that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT) that have specified values. The proposed approach exploits the total probability and Bayes’ theorems and it has three versions. We evaluate our approach with other supervised learning techniques including C5.0, classification and regression trees (CART), k-nearest neighbour (k-NN), linear discrimination (LD) naïve Bayes classifier (NBC), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and support vector machines (SVMs), from the point of view of their effect or tolerance of incomplete test data. Eight cancer related gene expression datasets are utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the proposed algorithm. en
dc.language.iso en en
dc.publisher Elsevier en
dc.relation.ispartofseries Journal Article en
dc.subject Supervised learning en
dc.subject Microarray data en
dc.subject Incomplete data en
dc.subject Prediction en
dc.title Predicting incomplete gene microarray data with the use of supervised learning algorithms en
dc.type Article en
dc.identifier.apacitation Twala, B., & Phorah, M. (2010). Predicting incomplete gene microarray data with the use of supervised learning algorithms. http://hdl.handle.net/10204/4485 en_ZA
dc.identifier.chicagocitation Twala, B, and M Phorah "Predicting incomplete gene microarray data with the use of supervised learning algorithms." (2010) http://hdl.handle.net/10204/4485 en_ZA
dc.identifier.vancouvercitation Twala B, Phorah M. Predicting incomplete gene microarray data with the use of supervised learning algorithms. 2010; http://hdl.handle.net/10204/4485. en_ZA
dc.identifier.ris TY - Article AU - Twala, B AU - Phorah, M AB - With the wealth of sequence data and the huge amount of data generated from molecular technologies, the issue of gene classification/prediction has become a central challenge in the field of microarray data analysis. This has led to the application of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain instances. However, in some research situations we often have to classify instances given incomplete vectors, which can affect the predictive accuracy of learned classifiers. The task of learning an accurate incomplete data classifier from instances raises a number of new issues some of which have not been properly addressed by bioinformatics research. Thus, an effective missing value estimation method is required for improving predictive accuracy. Results: The essence of the approach is the proposal that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT) that have specified values. The proposed approach exploits the total probability and Bayes’ theorems and it has three versions. We evaluate our approach with other supervised learning techniques including C5.0, classification and regression trees (CART), k-nearest neighbour (k-NN), linear discrimination (LD) naïve Bayes classifier (NBC), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and support vector machines (SVMs), from the point of view of their effect or tolerance of incomplete test data. Eight cancer related gene expression datasets are utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the proposed algorithm. DA - 2010-10 DB - ResearchSpace DP - CSIR KW - Supervised learning KW - Microarray data KW - Incomplete data KW - Prediction LK - https://researchspace.csir.co.za PY - 2010 SM - 0167-8655 T1 - Predicting incomplete gene microarray data with the use of supervised learning algorithms TI - Predicting incomplete gene microarray data with the use of supervised learning algorithms UR - http://hdl.handle.net/10204/4485 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record