Predicting incomplete gene microarray data with the use of supervised learning algorithms

Twala, B; Phorah, M

dc.contributor.author	Twala, B
dc.contributor.author	Phorah, M
dc.date.accessioned	2010-10-22T08:30:59Z
dc.date.available	2010-10-22T08:30:59Z
dc.date.issued	2010-10
dc.identifier.citation	Twala, B and Phorah, M. 2010. Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognition Letters, Vol. 31(13), pp 2061–2069	en
dc.identifier.issn	0167-8655
dc.identifier.uri	http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V15-502977W-1-1P&_cdi=5665&_user=958262&_pii=S0167865510001467&_origin=search&_coverDate=10/01/2010&_sk=999689986&view=c&wchp=dGLbVzb-zSkWA&md5=9de5840e663d3d130591d3c4244e42b9&ie=/sdarticle.pdf
dc.identifier.uri	http://hdl.handle.net/10204/4485
dc.description	Copyright: 2010 Elsevier. This is the post print version of the work. The definitive version is published in Pattern Recognition Letters, Vol. 31(13), pp 2061–2069	en
dc.description.abstract	With the wealth of sequence data and the huge amount of data generated from molecular technologies, the issue of gene classification/prediction has become a central challenge in the field of microarray data analysis. This has led to the application of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain instances. However, in some research situations we often have to classify instances given incomplete vectors, which can affect the predictive accuracy of learned classifiers. The task of learning an accurate incomplete data classifier from instances raises a number of new issues some of which have not been properly addressed by bioinformatics research. Thus, an effective missing value estimation method is required for improving predictive accuracy. Results: The essence of the approach is the proposal that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT) that have specified values. The proposed approach exploits the total probability and Bayes’ theorems and it has three versions. We evaluate our approach with other supervised learning techniques including C5.0, classification and regression trees (CART), k-nearest neighbour (k-NN), linear discrimination (LD) naïve Bayes classifier (NBC), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and support vector machines (SVMs), from the point of view of their effect or tolerance of incomplete test data. Eight cancer related gene expression datasets are utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the proposed algorithm.	en
dc.language.iso	en	en
dc.publisher	Elsevier	en
dc.relation.ispartofseries	Journal Article	en
dc.subject	Supervised learning	en
dc.subject	Microarray data	en
dc.subject	Incomplete data	en
dc.subject	Prediction	en
dc.title	Predicting incomplete gene microarray data with the use of supervised learning algorithms	en
dc.type	Article	en
dc.identifier.apacitation	Twala, B., & Phorah, M. (2010). Predicting incomplete gene microarray data with the use of supervised learning algorithms. http://hdl.handle.net/10204/4485	en_ZA
dc.identifier.chicagocitation	Twala, B, and M Phorah "Predicting incomplete gene microarray data with the use of supervised learning algorithms." (2010) http://hdl.handle.net/10204/4485	en_ZA
dc.identifier.vancouvercitation	Twala B, Phorah M. Predicting incomplete gene microarray data with the use of supervised learning algorithms. 2010; http://hdl.handle.net/10204/4485.	en_ZA
dc.identifier.ris	TY - Article AU - Twala, B AU - Phorah, M AB - With the wealth of sequence data and the huge amount of data generated from molecular technologies, the issue of gene classification/prediction has become a central challenge in the field of microarray data analysis. This has led to the application of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain instances. However, in some research situations we often have to classify instances given incomplete vectors, which can affect the predictive accuracy of learned classifiers. The task of learning an accurate incomplete data classifier from instances raises a number of new issues some of which have not been properly addressed by bioinformatics research. Thus, an effective missing value estimation method is required for improving predictive accuracy. Results: The essence of the approach is the proposal that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT) that have specified values. The proposed approach exploits the total probability and Bayes’ theorems and it has three versions. We evaluate our approach with other supervised learning techniques including C5.0, classification and regression trees (CART), k-nearest neighbour (k-NN), linear discrimination (LD) naïve Bayes classifier (NBC), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and support vector machines (SVMs), from the point of view of their effect or tolerance of incomplete test data. Eight cancer related gene expression datasets are utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the proposed algorithm. DA - 2010-10 DB - ResearchSpace DP - CSIR KW - Supervised learning KW - Microarray data KW - Incomplete data KW - Prediction LK - https://researchspace.csir.co.za PY - 2010 SM - 0167-8655 T1 - Predicting incomplete gene microarray data with the use of supervised learning algorithms TI - Predicting incomplete gene microarray data with the use of supervised learning algorithms UR - http://hdl.handle.net/10204/4485 ER -	en_ZA

Files in this item

Name: Phorah_2010.pdf

Size: 620.5Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Journal Articles

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.