Selecting efficient VM types to train deep learning models on Amazon SageMaker

The cloud has become a popular environment for running Deep Learning (DL) applications. Public cloud providers charge by the amount of time the resources are actually used, with the price per hour depending on the configuration of the chosen cloud instance. Instances are usually provided in the form of a VM that gives access to a certain hardware configuration, and may also come with a pre-configured software environment. More advanced, and theoretically faster, VMs are usually more expensive, but may not necessarily provide the best performance for all applications. Therefore, in order to choose the best instance (or VM type), users must consider the relative performances (and consequent cost) of different VMs when running their specific target application. Taking this into account, we propose a model to estimate the relative performance and cost of training deep learning applications running in different VM instances. This model is built upon observations derived from the performance profile of executions of three different DL applications, on 12 different public cloud instances. We argue that this model is a valuable tool for cloud users looking for optimal VM types to train their deep learning applications on the cloud.

R. K. Tesser, A. Marques and E. Borin, “Selecting efficient VM types to train deep learning models on Amazon SageMaker,” 2021 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2021, pp. 20-27. https://doi.org/10.1109/SBAC-PADW53941.2021.00014

The OpenMP Cluster Programming Model

Scientific Dissemination: The CCES website

Related posts

Analysis of the onset and evolution of a dynamic stall vortex on a periodic plunging aerofoil

Giants’ cooperation: a draft genome of the giant ciliate Muniziella cunhai suggests its ecological role in the capybara’s digestive metabolism

O eScience na Revista Pesquisa Fapesp