Multiclass cancer classification using gene expression profiling and probabilistic neural networks.

Daniel P. Berrar, C. Stephen Downes, Werner Dubitzky

Research output: Chapter in Book/Report/Conference proceedingChapter

51 Citations (Scopus)

Abstract

Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Pages5-16
Number of pages12
Publication statusPublished - 2003
Externally publishedYes

Cite this

Berrar, D. P., Downes, C. S., & Dubitzky, W. (2003). Multiclass cancer classification using gene expression profiling and probabilistic neural networks. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (pp. 5-16)

Multiclass cancer classification using gene expression profiling and probabilistic neural networks. / Berrar, Daniel P.; Downes, C. Stephen; Dubitzky, Werner.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2003. p. 5-16.

Research output: Chapter in Book/Report/Conference proceedingChapter

Berrar, DP, Downes, CS & Dubitzky, W 2003, Multiclass cancer classification using gene expression profiling and probabilistic neural networks. in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. pp. 5-16.
Berrar DP, Downes CS, Dubitzky W. Multiclass cancer classification using gene expression profiling and probabilistic neural networks. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2003. p. 5-16
Berrar, Daniel P. ; Downes, C. Stephen ; Dubitzky, Werner. / Multiclass cancer classification using gene expression profiling and probabilistic neural networks. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2003. pp. 5-16
@inbook{0c1008781a3043e3bbbc51aa70672b6f,
title = "Multiclass cancer classification using gene expression profiling and probabilistic neural networks.",
abstract = "Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.",
author = "Berrar, {Daniel P.} and Downes, {C. Stephen} and Werner Dubitzky",
year = "2003",
language = "English",
pages = "5--16",
booktitle = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",

}

TY - CHAP

T1 - Multiclass cancer classification using gene expression profiling and probabilistic neural networks.

AU - Berrar, Daniel P.

AU - Downes, C. Stephen

AU - Dubitzky, Werner

PY - 2003

Y1 - 2003

N2 - Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.

AB - Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.

UR - http://www.scopus.com/inward/record.url?scp=0041627900&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0041627900&partnerID=8YFLogxK

M3 - Chapter

C2 - 12603013

AN - SCOPUS:0041627900

SP - 5

EP - 16

BT - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

ER -