Towards data warehousing and mining of protein unfolding simulation data

Daniel Berrar, Frederic Stahl, Candida Silva, J. Rui Rodrigues, Rui M M Brito, Werner Dubitzky

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Objectives. The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. Methods. To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. Results. To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. Conclusions. Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the anal ysis of molecular dynamics unfolding data.

Original languageEnglish
Pages (from-to)307-317
Number of pages11
JournalJournal of Clinical Monitoring and Computing
Volume19
Issue number4-5
DOIs
Publication statusPublished - 2005 Oct
Externally publishedYes

Keywords

  • Data mining
  • Data warehousing
  • Grid
  • Molecular dynamics simulation
  • Protein unfolding
  • Transthyretin

ASJC Scopus subject areas

  • Anesthesiology and Pain Medicine
  • Health Informatics
  • Health Information Management

Cite this

Berrar, D., Stahl, F., Silva, C., Rodrigues, J. R., Brito, R. M. M., & Dubitzky, W. (2005). Towards data warehousing and mining of protein unfolding simulation data. Journal of Clinical Monitoring and Computing, 19(4-5), 307-317. https://doi.org/10.1007/s10877-005-0676-z

Towards data warehousing and mining of protein unfolding simulation data. / Berrar, Daniel; Stahl, Frederic; Silva, Candida; Rodrigues, J. Rui; Brito, Rui M M; Dubitzky, Werner.

In: Journal of Clinical Monitoring and Computing, Vol. 19, No. 4-5, 10.2005, p. 307-317.

Research output: Contribution to journalArticle

Berrar, D, Stahl, F, Silva, C, Rodrigues, JR, Brito, RMM & Dubitzky, W 2005, 'Towards data warehousing and mining of protein unfolding simulation data', Journal of Clinical Monitoring and Computing, vol. 19, no. 4-5, pp. 307-317. https://doi.org/10.1007/s10877-005-0676-z
Berrar, Daniel ; Stahl, Frederic ; Silva, Candida ; Rodrigues, J. Rui ; Brito, Rui M M ; Dubitzky, Werner. / Towards data warehousing and mining of protein unfolding simulation data. In: Journal of Clinical Monitoring and Computing. 2005 ; Vol. 19, No. 4-5. pp. 307-317.
@article{330b3cc67fdc4c9bbb0f205898fad101,
title = "Towards data warehousing and mining of protein unfolding simulation data",
abstract = "Objectives. The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. Methods. To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. Results. To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. Conclusions. Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the anal ysis of molecular dynamics unfolding data.",
keywords = "Data mining, Data warehousing, Grid, Molecular dynamics simulation, Protein unfolding, Transthyretin",
author = "Daniel Berrar and Frederic Stahl and Candida Silva and Rodrigues, {J. Rui} and Brito, {Rui M M} and Werner Dubitzky",
year = "2005",
month = "10",
doi = "10.1007/s10877-005-0676-z",
language = "English",
volume = "19",
pages = "307--317",
journal = "Journal of Clinical Monitoring and Computing",
issn = "1387-1307",
publisher = "Springer Netherlands",
number = "4-5",

}

TY - JOUR

T1 - Towards data warehousing and mining of protein unfolding simulation data

AU - Berrar, Daniel

AU - Stahl, Frederic

AU - Silva, Candida

AU - Rodrigues, J. Rui

AU - Brito, Rui M M

AU - Dubitzky, Werner

PY - 2005/10

Y1 - 2005/10

N2 - Objectives. The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. Methods. To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. Results. To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. Conclusions. Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the anal ysis of molecular dynamics unfolding data.

AB - Objectives. The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. Methods. To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. Results. To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. Conclusions. Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the anal ysis of molecular dynamics unfolding data.

KW - Data mining

KW - Data warehousing

KW - Grid

KW - Molecular dynamics simulation

KW - Protein unfolding

KW - Transthyretin

UR - http://www.scopus.com/inward/record.url?scp=30344471278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30344471278&partnerID=8YFLogxK

U2 - 10.1007/s10877-005-0676-z

DO - 10.1007/s10877-005-0676-z

M3 - Article

VL - 19

SP - 307

EP - 317

JO - Journal of Clinical Monitoring and Computing

JF - Journal of Clinical Monitoring and Computing

SN - 1387-1307

IS - 4-5

ER -