Publication details
Onedata4Sci: Life-science experimental datasets management system
Authors | |
---|---|
Year of publication | 2023 |
Type | Conference abstract |
MU Faculty or unit | |
Citation | |
Description | In many scientific disciplines, especially life-sciences, expensive equipment is shared nowadays (like cryoEM devices, optical microscopes, …). The users – scientists request specific experiments from facilities, which perform the experiments on their behalf. The outcome of such an experiment is a dataset, which can get quite large in many cases (tens of gigabytes to terabytes). Data are then processed in order to draw scientific conclusions from their interpretation, and the results are published. However, today more and more emphasis is being placed on sharing the primary data itself - not only for the purpose of verification of scientific findings, but also for the re-use of the dataset to be used in future research. Automatic/manual annotation with appropriate metadata, storage or archiving of the dataset, assignment of DOIs, and subsequent publication of the dataset in disciplinary metadata catalogues or data repositories are necessary. To address these challenges, we design and develop a system Onedata4Sci, that automates acquiring, sharing, and publishing of data produced by specialized scientific devices. The proposed solution automatically makes experimental data available to the scientific community in a predefined way. It is particularly useful for on-the-fly processing in local or distant data centers, real-time analysis, or archiving to permanent storage according to defined quality of service (e.g., data distribution). The solution includes a web-based system that can be used to manage emerging datasets and annotate them with metadata (automatically extracted from the data produced by the instruments or manually entered by users according to defined templates). The system makes it easy to automate the individual steps of dataset preparation, checking compliance with FAIR principles, and publishing the dataset to the scientific community. The development of the system is guided by FAIR principles and national EOSC-CZ activities. |
Related projects: |