Nested-Loops Tiling for Parallelization and Locality Optimization

keywords: Nested loops parallelization, loop tiling, data locality, parallel\break computing
Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tiled loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tilable loops with maximum coarse grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time.
mathematics subject classification 2000: 68N19
reference: Vol. 36, 2017, No. 3, pp. 566–596