Aug 5th 2019
FAPESP Sao Paulo Advanced School on Learning from Data
Claudia Bauzer Medeiros
The term “Open Science” regards the wide dissemination of all material associated with scientific discovery, thereby contributing to the advancement of science. Examples of such material include, for instance, specifications of experiments and equipment, documents, methods, algorithms, or all kinds of data – in particular digital data. Data, here, should be considered in a broader context, including a wide variety of digital content, such as code, models, video, sound, spreadsheets, executable workflows. The adequate sharing of such data has become a key issue in the Open Science movement, and is now considered part of good scientific practices. Data sharing demands appropriate planning of the data lifecycle – from data collection to storage and preservation.This, in turn, poses many scientific challenges – for computer scientists, but also for the scientists whose data have to be managed. Here, data volume is just one issue to be tackled; data heterogeneity, quality and curation, preservation and retrieval are also of foremost importance, thus opening new research fronts for data scientists. This lecture will discuss some of the challenges in Research Data Management, in particular those posed by the Open Science movement, which require that data be shareable and accessible, while at the same time preserving integrity and privacy.