Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried dependences (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions preclude the parallelization of many computational intensive DOACROSS loops. In such loops, either the compiler finds at least one loop-carried dependence or it cannot prove, at compile-time, that the loop is free of such dependences, even though they might never show-up at runtime. In any case, most compilers end-up not parallelizing DOACROSS loops. This paper brings three contributions to address this problem. First, it integrates three algorithms (TLS, DOAX, and BDX) into a simple OpenMP clause that enables the programmer to select the best algorithm for a given loop. Second, it proposes an annotation approach to separate the sequential components of a loop, thus exposing other components to parallelization. Finally, it shows that loop-carried probability is an effective metric to decide when to use TLS or other non-speculative techniques (e.g. DOAX or BDX) to parallelize DOACROSS loops. Experimental results reveal that, for certain loops, slowdowns can be transformed in 2× speed-ups by quickly selecting the appropriate algorithm.
Luis Mattos, Divino César Lucas, Juan Salamanca, João P. L. de Carvalho, Márcio M. Perreira and Guido Araujo, “DOACROSS Parallelization based on Component Annotation and Loop-carried Probability”, International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2018), Lyon, France