REDUCER: Elimination of Repetitive Codes for Accelerated Iterative Compilation

keywords: Iterative compilation, code redundancy, LLVM, IR, big data
Low Level Virtual Machine (LLVM) is a widely adopted open source compiler providing numerous optimization opportunities. The discovery of the best optimization sequence in this large space is done via iterative compilation, which incurs substantial overheads, especially for big data applications operating on high volume and variety datasets. The large search space is mostly comprised of identical codes generated via different optimizations. However, no mechanism is implemented inside the LLVM compiler to suppress the redundant testings. In this regard, this paper proposes REDUCER for eliminating the identical code executions by performing Intermediate Representation (IR) level comparisons. REDUCER has been tested using the well-accepted MiCOMP technique in LLVM 3.8 and 9.0 compiler, with embedded (cBench) and big data workloads. In comparison to MiCOMP 19.5 k experiments, REDUCER lowers the experiment count up to 327, i.e. 98 %, and on average to 4 375, i.e. 77 %, for cBench (LLVM-3.8). Similarly, for LLVM-9.0 the reductions are up to 1 931, i.e. 90 %, and on average 5 863, i.e. 69.9 %. Due to the significant experiment reduction, for embedded workloads, the iterative compilation is up to 58.6× and on average 4.1× faster with REDUCER (LLVM-3.8) than MiCOMP, whereas, with REDUCER (LLVM-9.0) the compilation is up to 8.5× and on average 2.9× faster. Moreover, REDUCER is found to be scalable and efficient for big data workloads where the iterative compilation is reduced to few days, as code is compared one time only for a single application tested on multiple datasets.
reference: Vol. 40, 2021, No. 3, pp. 543–574