CCES Unicamp

Converting scripts into reproducible Workflow Research Objects

Full Article URL:…

Scientific discovery and analysis are increasingly computational and data-driven. While scripting languages, such as Python, R and Perl, are the means of choice of the majority of scientists to encode and run their data analysis, scripts are generally not amenable to reuse or reproducibility. Scripts do rarely get reused or even shared with third party scientists. We argue in this paper that the reproducibility of scripts can be promoted by converting them into workflow research objects. A workflow research object encodes a script into a production (executable) workflow that is accompanied by annotations, example datasets and provenance traces of their execution, thereby allowing third party users to understand the data analysis encoded by the original script, run the associated workflow using the same or different dataset, or even repurpose it for a different analysis. To this end, we present a methodology for converting scripts into workflow research objects in a principled manner, guided by requirements that we elicited for this purpose. The methodology exploits tools and standards that have been developed by the community, in particular YesWorkflow, Research Objects and the W3C PROV. It is showcased using a real world use case from the field of Molecular Dynamics.

Related posts



Data-flow analysis and optimization for data coherence in heterogeneous architectures

cces cces

CCES stories: Rodrigo Leandro Silveira

Leandro Martinez
WP Twitter Auto Publish Powered By :