Edson Borin
Instituto de Computação e Centro de Computação em Engenharia e Ciências – Universidade Estadual de Campinas, Campinas, SP.
Historically, humanity has relied on the scientific method to advance science. This method, which involves observing, formulating hypotheses, and performing experiments to corroborate or refute these hypotheses, has long required the construction of expensive prototypes to perform the experiments. However, with the emergence of computer systems and the development of numerical methods for solving computational models, many of the experiments started to be run with computer simulations, which significantly reduced the cost and time of experimentation. Consequently, computation and computational modeling have become a mainstay of modern science.
Simulation of the interaction of proteins in the human body with body proteins with hormones by CCES researchers.
Computational models replicate behavior observed in the physical world. Thus, to perform an experiment, one simply models the problem of interest in the computer and runs simulation programs to solve the model. For example, a scientist can model proteins in the human body and simulate the interaction of these proteins with hormones. The figure above illustrates a simulation of the interaction of proteins in the human body with hormones performed by researchers at the Center for Computation in Engineering and Science (CCES) at the University of Campinas.
The more detailed or complex the computational model, the greater the amount of computational operations that must be performed to complete the simulation. For example, simulating the interactions of a drug with a protein involves studying how the positions of hundreds of thousands of atoms vary in time, including those of the drug, the protein, and the surrounding water or cell membrane, and requires a much larger amount of computational operations than simulating the interaction of proteins in the human body with a hormone. Consequently, more sophisticated experiments require the use of high-performance computing, or High-Performance Computing (HPC), where special computational systems are used to accelerate the simulation of the computational models. These computing systems, also called supercomputers, are usually composed of multiple computers connected through a high-speed network to make up a single high-performance system. The Kahuna supercomputer (http://cces.unicamp.br/computing-resources/) is one of the supercomputers maintained by the University of Campinas (UNICAMP) to support the research conducted by researchers at the Center for Computation in Engineering and Science (CCES).
IBM Blue Gene/P supercomputer at Argonne National Laboratory
(Source: Argonne National Laboratory’s Flickr page)
The acquisition and maintenance of a high-performance computing system are costly processes and few research groups have exclusive access to this type of technology. In fact, many of the world’s supercomputers are purchased and maintained by government organizations that provide access via the Internet for shared use by scientists from a variety of educational and research institutions. This model allows scientists not to have to worry about the acquisition and maintenance of the system, however, competition for the computational resource can make access time-consuming, making the cycle of experimentation and analysis of results a slow process. The picture above shows a photo of the IBM Blue Gene/P Supercomputer, maintained by Argonne National Laboratory for shared use by scientists from different research groups.
Companies such as Amazon, Google, and Microsoft have large data processing centers, or data centers, to run the programs that support their business. Amazon, one of the largest online stores in the world, has several data centers so that during times of high demand for purchases, such as the Christmas and Thanksgiving holidays, the system is able to meet the demand without hindering the user experience. On the other hand, on normal days when demand is not high, most of the computers in its data centers are idle.
Recent advances in computer systems are allowing companies that own large data centers to rent out idle computers to customers. In this business model, known as cloud computing, the customer accesses the data center computers via the Internet and pays only for the use, not the purchase or maintenance, of the system. In fact, Amazon, Google, and Microsoft already rent the computers in their data centers distributed around the world using this model. Since the amount of resources and computing power in these data centers is very large, this business model can be used to offer on-demand high-performance computing services, where scientists who need to run simulations that require high computing performance would access the system over the Internet and pay only for the processing time of the simulation.
The advantages of high performance computing in the computational cloud are numerous, however, when we compare this model with the supercomputer procurement model and the shared system model, the following advantages stand out: (a) the user pays only for usage and there is no need to purchase or maintain costly computing systems or maintain the supporting infrastructure, such as cooling, high-power transformers, etc; (b) the accreditation process in the system is fast and there is no competition for computing resources, which eliminates the waiting time for these resources; and (c) the computing infrastructure is varied and can be adjusted for the needs of the application, or the computing model.
Despite the advantages mentioned above, data center computing systems are not developed for the sole purpose of high performance computing, and the use of these systems for this purpose still presents challenges. Thus, researchers at the Center for Computation Engineering and Science have been researching and developing methods and technologies to enable high performance computing in cloud computing. One of the technologies developed by CCES scientists is the SPITS (Scalable Partial Idempotent Task System) programming model and the PY-PITS execution system, which facilitate the use of cloud resources for high-performance processing. This technology has already been used to develop tools for processing large seismic data to support oil and gas exploration activities in Brazil.
In a recent experiment, researchers at the Unicamp Center for Petroleum Studies used this technology to process a large seismic data using 40 high-performance computers from Amazon’s data centers, totaling 2504 computational cores and 313 Nvidia Tesla V100 GPUs. The processing took 40 minutes and cost approximately $190. The computational power offered by this system (Peak performance calculated based on the system specifications: approximately 2.33 PetaFLOP/s) is approximately twice the computational power offered by the Santos Dumont supercomputer (peak performance reported in the Santos Dumont information portal -https://sdumont.lncc.br/-: approximately 1.1 PetaFLOP/s), one of the largest supercomputers in Latin America. Furthermore, it is worth noting that the acquisition of an equivalent system, with 313 Nvidia Tesla V100 GPUs, would cost at least 3.1 million dollars, since each of these GPUs costs approximately 10 thousand dollars.
The agility of access to the computing resource and the difference in cost of processing in the cloud to purchasing a computing system, illustrated by the previous experiment, show that cloud computing has the potential to facilitate access to high-performance computing technologies. In this context, open-source technologies developed by CCES scientists, such as SPITS, are expected to play an important role in popularizing access to high-performance computing technologies and advancing science.
Related work
[1] BORIN, E.; RODRIGUES, I. L. ; NOVO, A. T. ; SACRAMENTO, J. D.; BRETERNITZ JR., M.; TYGEL, M. Efficient and Fault Tolerant Computation of Partially Idempotent Tasks. In: 14th International Congress of the Brazilian Geophysical Society & EXPOGEF, Rio de Janeiro, Brazil, 3-6 August 2015, p. 367-372.
[2] BORIN, E.; BENEDICTO, C. ; RODRIGUES, I. L. ; PISANI, F. ; TYGEL, M.; BRETERNITZ JR., M. PY-PITS: A Scalable Python Runtime System for the Computation of Partially Idempotent Tasks. In: 5th Workshop on Parallel Programming Models, 2016, Los Angeles.
[3] OKITA, N. T. ; COIMBRA, T. A. A. ; RODAMILANS, C. B. ; TYGEL, M.; BORIN, E. Optimizing the execution costs of high-performance geophysics software on the cloud. In: EAGE Annual 81st conference + exhibition, 2019, London, UK.
[4] OKITA, N. T.; COIMBRA, T. A. A.; TYGEL, M.; BORIN, E. A heuristic to optimize the execution cost of distributed seismic processing programs on the cloud. In: Society of Exploration Geophysicists Annual Meeting (SEG’19). 2019
[5] OKITA, N. T. ; COIMBRA, T. A. A. ; RODAMILANS, C. B. ; TYGEL, M. ; BORIN, E. Using SPITS to optimize the cost of high-performance geophysics processing on the cloud. In: EAGE Workshop on High Performance Computing for Upstream, 2018, Santander, Colombia.