CCES Unicamp

Automatic Offloading to FPGA Accelerators

Date: 10/17/2018, 14:00

Candidate: Ciro Luiz Araujo Ceissler

Advisor: Guido Costa Souza de Araújo

Auditório do IC 3 – UNICAMP

Abstract:

The sheer amount of computing resources required to run modern cloud workloads has put a lot of pressure on the design of power efficient cluster nodes. A deep integration among hardware, software, network, and systems for each application is necessary to increase the computational usage effectiveness, but to combine them requires a lot of development time and in-depth knowledge. To address this problem, many vendors have proposed CPU-FPGA integrated architectures, e.g., Intel HARP and Microsoft Catapult, that can deliver efficient power-performance executions and flexibility. Unfortunately, the integration of FPGA accelerated applications to software is a challenging endeavour that does not have a seamless programming model. This dissertation discusses in details the HardCloud, an extension of the OpenMP API that eases the task of offloading computation to FPGA accelerators. The OpenMP 4.0 specification introduces new directives that enable the transfer of computation to heterogeneous computing devices (e.g., GPUs, cryptography co-processors or DSP), although the specification does not provide all necessary information to accomplish the offload to FPGA. An OpenMP runtime code generation modification, inside the Clang/LLVM compiler, extends this programming model to support Intel HARP 2 platform offload by specifying a pre-generated FPGA application. Further, a hardware interface abstraction provides an effortless method to connect the IP core with the OpenMP runtime and reading or writing the shared variables in a seamless way, while manages the access to the FPGA interfaces. Since the FPGA bitstream generation is CPU/memory intensive and demands many hours, the developer is capable of select the simulator instead of the platform to a rapid evaluation. Additionally, an automatic mechanism to verify and validate the hardware, which compares the hardware and software output variables values, is available and reduce the test environment development effort. Experimental results using the Clang/LLVM compiler and the Intel HARP 2 architecture show that HardCloud can considerably simplify such task while producing good speed-ups for a set of well-known applications.

Related posts

Multi-scale failure analysis of laminated composites using the boundary element method

cces cces

Bridging multiscale dynamic analysis for heterogeneous materials using the boundary element method

cces cces

Multiscale Modeling of Dynamic Failure in 3D Polycrystalline Materials using BEM and MD

cces cces