CCES Unicamp

Reproducibility and Reuse of Experiments in eScience: Workflows, Ontologies and Scripts

Date:14/12/2018 – 14:00

Candidate: Lucas Augusto Montalvao Carvalho

Advisor: Profa. Claudia Bauzer Medeiros

Institute of Computing

Abstract:

Scripts and Scientific Workflow Management Systems (SWfMSs) are common approaches that have been used to automate the execution flow of processes and data analysis in scientific (computational) experiments. Although widely used in many disciplines, scripts are hard to understand,
adapt, reuse, and reproduce. For this reason, several solutions have been proposed to aid experiment
reproducibility for script-based environments. However, they neither allow to fully document the experiment nor do they help when third parties want to reuse just part of the code. SWfMSs, on the other hand, help documentation and reuse by supporting scientists in the design and execution of their experiments, which are specified and run as interconnected (reusable) workflow components (a.k.a. building blocks).

While workflows are better than scripts for understandability and reuse, they still require additional documentation. During experiment design, scientists frequently create workflow variants, e.g., by changing workflow components. Reuse and reproducibility require understanding and tracking variant provenance, a time-consuming task. This thesis aims to support reproducibility and reuse of computational experiments.

To meet these challenges, we address two research problems: (1) understanding a computational experiment, and (2) extending a computational experiment. Our work towards solving these problems led us to choose workflows and ontologies to answer both problems.

The main contributions of this thesis are thus:

(i) to present the requirements for the conversion of scripts to reproducible research;
(ii) to propose a methodology that guides the scientists through the process of conversion of script-based experiments into reproducible workflow research objects;
(iii) to design and implement features for quality assessment of computational experiments;
(iv) to design and implement W2Share, a framework to support the conversion methodology, which exploits tools and standards that have been developed by the scientific community to promote reuse and reproducibility;
(v) to design and implement OntoSoft-VFF, a framework for capturing information about software and workflow components to support scientists manage workflow exploration and evolution.

Our work is showcased via use cases in Molecular Dynamics, Bioinformatics and Weather Forecasting.

Related posts

Computational study of tryptophan fluorescence in proteins by molecular dynamics simulations

cces cces

A dynamic perspective on cross-linking data XL-MS

cces cces

Contact-based models in the study of folding and thermal diffusion in proteins

cces cces