Introduction

In a groundbreaking paper published in the International Journal of Digital Curation, researchers from the University of Illinois at Urbana-Champaign outline their innovative approaches to curating scientific workflows for biomolecular Nuclear Magnetic Resonance (bioNMR) spectroscopy. The authors, led by Douglas D. Heintz and Michael R. Gryk, provide insights into enhancing the reproducibility and reusability of bioNMR data by leveraging a workflow management system and adopting a new data model for digital preservation.

Curating Scientific Workflows for Biomolecular Nuclear Magnetic Resonance Spectroscopy

BioNMR is an indispensable tool in the field of structural biology, offering detailed insights into the molecular structures of biological macromolecules. However, the complexity of bioNMR data and the intricacies involved in its acquisition, analysis, and interpretation have posed significant challenges in ensuring that research is reproducible and data is reusable. The work by Heintz, Gryk, and colleagues, detailed in their paper (DOI: 10.2218/ijdc.v13i1.657), is a response to the pressing need for robust curation methodologies in the bioNMR community.

CONNJUR Workflow Builder and the NMRbox Platform

One of the key components of the curation efforts detailed in the paper is the CONNJUR Workflow Builder (CWB), a management system designed to streamline bioNMR workflows. CWB enables researchers to orchestrate bioNMR experiments, analyze spectra, and manage bioNMR data more efficiently. It allows for the integration of different software used in the bioNMR pipeline, facilitating a seamless flow of data through the various stages of analysis.

The team’s advancements are integrated into the NMRbox cloud-computing platform, which has been developed as a comprehensive solution for the bioNMR scientific community. NMRbox provides an accessible, virtual desktop environment equipped with a comprehensive set of bioNMR software tools, leading to a more collaborative and reproducible research ecosystem.

Adoption of the PREMIS Data Model for Digital Preservation

To further enhance the curation process, the paper discusses the refactoring of the workflow data model based on the PREMIS (Preservation Metadata Initiative) standard for digital preservation. By incorporating PREMIS, the curation model is able to systematically capture critical data provenance information, essential for understanding the origin, context, and history of the bioNMR data.

The PREMIS model has been designed to ensure that metadata necessary for the preservation of digital information remains intact. This allows future users to interpret the scientific data accurately, regardless of advances in technology or shifts in the domain expertise.

Provenance, Packaging, and Portability of BioNMR Data

A significant contribution by Heintz and Gryk is the implementation of a new file structure that bundles the original binary NMR data files with the PREMIS XML records. By packaging these components together using a standardized archival utility, the authors ensure that both the data and its provenance are linked and can be preserved jointly.

This bundling technique fosters data portability and makes sharing of bioNMR datasets across disparate systems practical and efficient, hence improving reproducibility. It is a pivotal step towards meeting the data sharing requirements set forth by journals and funding agencies alike.

Navigating Benefits and Limitations: A Discussion

Heintz and Gryk discuss the benefits of their approach, particularly in terms of how it sustains the scientific integrity of bioNMR data and facilitates the retention of knowledge within the scientific community. However, they also address the limitations, such as the investment needed to adopt new technologies and the need for training researchers to use these advanced curation systems.

Despite these challenges, the authors underscore the criticality of these advancements for the progression of structural biology research and the greater scientific dialogue. The curated workflows not only assist current researchers but also serve as an educational resource for the next generation of scientists.

Future Directions

As the research community continues to embrace open science and the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) data, initiatives like those described in this paper will become increasingly important. Plans for refining existing systems and developing new technologies are highlighted, with a long-term goal of creating an environment where the reproducibility and validity of scientific results are guaranteed.

In their conclusion, Heintz and Gryk invite further collaboration within the bioNMR field and the broader scientific community. The interplay between technology, methodology, and domain science heralds an era where scientific discovery is not just about individual findings but also about the collective pursuit of knowledge that is thoroughly curated and preserved for future generations.

Keywords

1. BioNMR data curation
2. Scientific workflow management
3. Reproducibility in bioNMR
4. Preservation metadata PREMIS
4. NMRbox bioNMR platform

References

1. Heintz, D. D., & Gryk, M. R. (2018). Curating Scientific Workflows for Biomolecular Nuclear Magnetic Resonance Spectroscopy. International Journal of Digital Curation, 13(1), 286–293. https://doi.org/10.2218/ijdc.v13i1.657

2. Bowers, S., & Ludäscher, B. (2005). Actor-Oriented Design of Scientific Workflows. In L. Delcambre, C. Kop, H. C. Mayr, J. Mylopoulos, & O. Pastor (Eds.), Conceptual Modeling – ER 2005. Lecture Notes in Computer Science, vol 3716 (pp. 369–384). https://doi.org/10.1007/11568322_24

3. Ellis, H. J. C., Nowling, R. J., Vyas, J., Martyn, T. O., & Gryk, M. R. (2011). Iterative Development of an Application to Support Nuclear Magnetic Resonance Data Analysis of Proteins. Proceedings of the International Conference on Information Technology: New Generations, 2011, 1014–1020. https://doi.org/10.1109/ITNG.2011.215

4. Fenwick, M., Weatherby, G., Vyas, J., Sesanker, C., Martyn, T. O., Ellis, H. J. C., & Gryk, M. R. (2015a). CONNJUR Workflow Builder: A Software Integration Environment for Spectral Reconstruction. Journal of Biomolecular NMR, 62, 313–326. https://doi.org/10.1007/s10858-015-9946-3

5. Maciejewski, M. W., Schuyler, A. D., Gryk, M. R., Moraru, I. I., Romero, P. R., Ulrich, E. L., Eghbalnia, H. R., Livny, M., Delaglio, F., & Hoch, J. C. (2017). NMRbox: A Resource for Biomolecular NMR Computation. Biophysical Journal, 112(8), 1529–1534. https://doi.org/10.1016/j.bpj.2017.03.011

6. Stodden, V., McNutt, M., Bailey, D. H., Deelman, E., Gil, Y., Hanson, B., Heroux, M. A., Ioannidis, J. P. A., & Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354, 1240–1241. https://doi.org/10.1126/science.aah6168

7. Ulrich, E. L., Akutsu, H., Doreleijers, J. F., Harano, Y., Ioannidis, Y. E., Lin, J., Livny, M., Mading, S., Miller, Z., Nakatani, E., Schulte, C. F., Tolmie, D. E., Wenger, R. K., Yao, H., & Markley, J. L. (2008). BioMagResBank. Nucleic Acids Research, 36, D402–D408. https://doi.org/10.1093/nar/gkm957

8. Verdi, K. K., Ellis, H. J., & Gryk, M. R. (2007). Conceptual-level workflow modeling of scientific experiments using NMR as a case study. BMC Bioinformatics, 8, 31. https://doi.org/10.1186/1471-2105-8-31

9. Willoughby, C., & Frey, J. G. (2017). Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source. International Journal of Digital Curation, 12, 72–87. https://doi.org/10.2218/ijdc.v12i1.532