TY - JOUR
T1 - Knowledge discovery in biology and biotechnology texts
T2 - A review of techniques, evaluation strategies, and applications
AU - Natarajan, J.
AU - Berrar, D.
AU - Hack, C. J.
AU - Dubitzky, W.
PY - 2005/6/13
Y1 - 2005/6/13
N2 - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.
AB - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.
KW - Bioinformatics
KW - Information extraction
KW - Information retrieval
KW - Knowledge discovery in text
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=19944403928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19944403928&partnerID=8YFLogxK
U2 - 10.1080/07388550590935571
DO - 10.1080/07388550590935571
M3 - Review article
C2 - 15999851
AN - SCOPUS:19944403928
VL - 25
SP - 31
EP - 52
JO - Critical Reviews in Biotechnology
JF - Critical Reviews in Biotechnology
SN - 0738-8551
IS - 1-2
ER -