Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.
|Title of host publication||Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing|
|Number of pages||12|
|Publication status||Published - 2003|