Parallel Tiled Code Generation with Loop Permutation within Tiles

keywords: Optimizing compilers, tiling, loop permutation, transitive closure, dependence graph, code locality, automatic parallelization
An approach of generation of tiled code with an arbitrary order of loops within tiles is presented. It is based on the transitive closure of the program dependence graph and derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. The approach is explained by means of a working example. Details of an implementation of the approach in the TRACO compiler are outlined. Increasing tiled program performance due to loop permutation within tiles is illustrated on real-life programs from the NAS Parallel Benchmark suite. An analysis of speed-up and scalability of parallel tiled code with loop permutation is presented.
mathematics subject classification 2000: 68N20, 65Y05, 52Bxx, 97E60, 05-XX
reference: Vol. 36, 2017, No. 6, pp. 1261–1282