Date: Oct 12, 2020
Candidate: Nusrat Jahan Lisa
Prof. Dr. Wolfgang Lehner and Prof. Dr. Torben Bach Pedersen,
Instituto de Computação
With an increasingly large amount of data being collected in numerous application areas, the importance of online analytical processing (OLAP) workloads increases constantly. OLAP queries typically access only a small number of columns but a high number of rows and are, thus, most efficiently executed by column-stores. With the significant developments in the main memory domain even large datasets can be entirely held in the main memory. Thus, main memory column-stores have been established as state-of-the-art for OLAP scenarios. In these systems, all values of every column are encoded as a sequence of integer values and, thus, query processing is completely done on these integer sequences. To improve query processing, vectorization based the Single Instruction Multiple Data (SIMD) parallel paradigm is a state-of-the-art technique. Aside from vectorization, lightweight integer compression algorithms also play an important role to reduce the necessary memory space. Unfortunately, there is no single-best lightweight integer compression algorithm, and the algorithm selection decision depends most importantly on the data characteristics. Nevertheless, vectorization and integer compression complement each other, and the combined usage improves the query performance. Unfortunately, the benefits of vectorization are limited on modern x86-processors due to predefined and fixed SIMD instruction set extensions. Nowadays, the Field Programmable Gate Array (FPGA) offers a novel opportunity with regard to hardware reconfigurable capability. For example, we can use an arbitrary length of processor word in FPGA leading to a higher performance, we can prepare proper pipeline-based custom-made database accelerators, and we can develop embedded systems through utilizing such accelerators. Moreover, modern hybrid CPU-FPGA systems have a direct data communication channel between the main memory and FPGA which is useful for throughput acceleration. Based on these advantages, this thesis examines the utilization of FPGA for main memory column-stores. This examination is two-fold. First, we investigate the column scan on compressed data as important operation and second, we systematically look at lightweight integer compression. These two aspects are considered from the hardware perspective to guarantee a certain level of query performance acceleration. In particular, this thesis explores different embedded design options and proposes an adaptive lightweight integer compression system. Based on a comprehensive evaluation, we find out the optimal design constraint as per implementation mechanism for column scan and lightweight integer compression. Finally, we conclude this thesis by mentioning our upcoming research activities.