Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)

Daniel Berrar, Peter Flach

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

The receiver operating characteristic (ROC) has emerged as the gold standard for assessing and comparing the performance of classifiers in a wide range of disciplines including the life sciences. ROC curves are frequently summarized in a single scalar, the area under the curve (AUC). This article discusses the caveats and pitfalls of ROC analysis in clinical microarray research, particularly in relation to (i) the interpretation of AUC (especially a value close to 0.5); (ii) model comparisons based on AUC; (iii) the differences between ranking and classification; (iv) effects due to multiple hypotheses testing; (v) the importance of confidence intervals for AUC; and (vi) the choice of the appropriate performance metric. With a discussion of illustrative examples and concrete real-world studies, this article highlights critical misconceptions that can profoundly impact the conclusions about the observed performance.

Original languageEnglish
Article numberbbr008
Pages (from-to)83-97
Number of pages15
JournalBriefings in Bioinformatics
Volume13
Issue number1
DOIs
Publication statusPublished - 2012 Jan
Externally publishedYes

Fingerprint

Microarrays
Classifiers
Testing

Keywords

  • Area under the curve
  • Microarrays
  • Model evaluation
  • Multiple testing
  • Receiver operating characteristic

ASJC Scopus subject areas

  • Molecular Biology
  • Information Systems

Cite this

Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). / Berrar, Daniel; Flach, Peter.

In: Briefings in Bioinformatics, Vol. 13, No. 1, bbr008, 01.2012, p. 83-97.

Research output: Contribution to journalArticle

@article{c8d0d91e046641fb884818c7f097ee5a,
title = "Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)",
abstract = "The receiver operating characteristic (ROC) has emerged as the gold standard for assessing and comparing the performance of classifiers in a wide range of disciplines including the life sciences. ROC curves are frequently summarized in a single scalar, the area under the curve (AUC). This article discusses the caveats and pitfalls of ROC analysis in clinical microarray research, particularly in relation to (i) the interpretation of AUC (especially a value close to 0.5); (ii) model comparisons based on AUC; (iii) the differences between ranking and classification; (iv) effects due to multiple hypotheses testing; (v) the importance of confidence intervals for AUC; and (vi) the choice of the appropriate performance metric. With a discussion of illustrative examples and concrete real-world studies, this article highlights critical misconceptions that can profoundly impact the conclusions about the observed performance.",
keywords = "Area under the curve, Microarrays, Model evaluation, Multiple testing, Receiver operating characteristic",
author = "Daniel Berrar and Peter Flach",
year = "2012",
month = "1",
doi = "10.1093/bib/bbr008",
language = "English",
volume = "13",
pages = "83--97",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)

AU - Berrar, Daniel

AU - Flach, Peter

PY - 2012/1

Y1 - 2012/1

N2 - The receiver operating characteristic (ROC) has emerged as the gold standard for assessing and comparing the performance of classifiers in a wide range of disciplines including the life sciences. ROC curves are frequently summarized in a single scalar, the area under the curve (AUC). This article discusses the caveats and pitfalls of ROC analysis in clinical microarray research, particularly in relation to (i) the interpretation of AUC (especially a value close to 0.5); (ii) model comparisons based on AUC; (iii) the differences between ranking and classification; (iv) effects due to multiple hypotheses testing; (v) the importance of confidence intervals for AUC; and (vi) the choice of the appropriate performance metric. With a discussion of illustrative examples and concrete real-world studies, this article highlights critical misconceptions that can profoundly impact the conclusions about the observed performance.

AB - The receiver operating characteristic (ROC) has emerged as the gold standard for assessing and comparing the performance of classifiers in a wide range of disciplines including the life sciences. ROC curves are frequently summarized in a single scalar, the area under the curve (AUC). This article discusses the caveats and pitfalls of ROC analysis in clinical microarray research, particularly in relation to (i) the interpretation of AUC (especially a value close to 0.5); (ii) model comparisons based on AUC; (iii) the differences between ranking and classification; (iv) effects due to multiple hypotheses testing; (v) the importance of confidence intervals for AUC; and (vi) the choice of the appropriate performance metric. With a discussion of illustrative examples and concrete real-world studies, this article highlights critical misconceptions that can profoundly impact the conclusions about the observed performance.

KW - Area under the curve

KW - Microarrays

KW - Model evaluation

KW - Multiple testing

KW - Receiver operating characteristic

UR - http://www.scopus.com/inward/record.url?scp=84855679008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84855679008&partnerID=8YFLogxK

U2 - 10.1093/bib/bbr008

DO - 10.1093/bib/bbr008

M3 - Article

VL - 13

SP - 83

EP - 97

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 1

M1 - bbr008

ER -