Knowledge discovery in biology and biotechnology texts: A review of techniques, evaluation strategies, and applications

J. Natarajan, D. Berrar, C. J. Hack, W. Dubitzky

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

Original languageEnglish
Pages (from-to)31-52
Number of pages22
JournalCritical Reviews in Biotechnology
Volume25
Issue number1-2
DOIs
Publication statusPublished - 2005
Externally publishedYes

Keywords

  • Bioinformatics
  • Information extraction
  • Information retrieval
  • Knowledge discovery in text
  • Text mining

ASJC Scopus subject areas

  • Biotechnology

Cite this

Knowledge discovery in biology and biotechnology texts : A review of techniques, evaluation strategies, and applications. / Natarajan, J.; Berrar, D.; Hack, C. J.; Dubitzky, W.

In: Critical Reviews in Biotechnology, Vol. 25, No. 1-2, 2005, p. 31-52.

Research output: Contribution to journalArticle

@article{fe11c7331f1d4ccbaf3ee7849969ebf3,
title = "Knowledge discovery in biology and biotechnology texts: A review of techniques, evaluation strategies, and applications",
abstract = "Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.",
keywords = "Bioinformatics, Information extraction, Information retrieval, Knowledge discovery in text, Text mining",
author = "J. Natarajan and D. Berrar and Hack, {C. J.} and W. Dubitzky",
year = "2005",
doi = "10.1080/07388550590935571",
language = "English",
volume = "25",
pages = "31--52",
journal = "Critical Reviews in Biotechnology",
issn = "0738-8551",
publisher = "Informa Healthcare",
number = "1-2",

}

TY - JOUR

T1 - Knowledge discovery in biology and biotechnology texts

T2 - A review of techniques, evaluation strategies, and applications

AU - Natarajan, J.

AU - Berrar, D.

AU - Hack, C. J.

AU - Dubitzky, W.

PY - 2005

Y1 - 2005

N2 - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

AB - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

KW - Bioinformatics

KW - Information extraction

KW - Information retrieval

KW - Knowledge discovery in text

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=19944403928&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944403928&partnerID=8YFLogxK

U2 - 10.1080/07388550590935571

DO - 10.1080/07388550590935571

M3 - Article

C2 - 15999851

AN - SCOPUS:19944403928

VL - 25

SP - 31

EP - 52

JO - Critical Reviews in Biotechnology

JF - Critical Reviews in Biotechnology

SN - 0738-8551

IS - 1-2

ER -