CCES Unicamp

Using Hardware-Transactional-Memory Support to Implement Thread-Level Speculation

When a compiler cannot prove that a loop can be executed in parallel but it can estimate with high probability that the loop iterations will be independent at runtime, it can schedule the parallel execution of the loop speculatively. A mechanism is then necessary to detect when a dependence does occur at runtime and to re -execute the loop iterations that were compromised. This technique is called Thread-Level Speculation (TLS). For performance, TLS requires hardware mechanisms that support four key features: conflict detection, speculative storage, in-order commit of transactions, and transaction roll-back. However, to this day there is no off-the-shelf processor that provides direct support for TLS. Speculative execution is supported, however, in the form of Hardware Transactional Memory (HTM) available in processors such as the Intel Core and the IBM POWER8. HTM implements three out of the four key features required by TLS: conflict detection, speculative storage, and transaction roll-back. This paper presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for loop parallelization with Thread-Level Speculation (TLS) and describes a careful evaluation of the implementation of TLS on the HTM extensions available in such machines. The sample implementation of TLS over HTM described in this paper also provides evidence that the programming effort to implement TLS over HTM support is non-trivial. Thus the paper also describes an extension to OpenMP that both makes TLS more accessible to OpenMP programmers and allows for the easy tuning of TLS parameters. As a result, it provides evidence to support several important claims about the performance of TLS over HTM in the Intel Core and the IBM POWER8 architectures. Experimental results reveal that by implementing TLS on top of HTM, speed-ups of up to 3.8x can be obtained for some loops.
Using Hardware-Transactional-Memory Support to Implement Thread-Level Speculation

Full Article URL:

When a compiler cannot prove that a loop can be executed in parallel but it can estimate with high probability that the loop iterations will be independent at runtime, it can schedule the parallel execution of the loop speculatively. A mechanism is then necessary to detect when a dependence does occur at runtime and to re -execute the loop iterations that were compromised. This technique is called Thread-Level Speculation (TLS). For performance, TLS requires hardware mechanisms that support four key features: conflict detection, speculative storage, in-order commit of transactions, and transaction roll-back. However, to this day there is no off-the-shelf processor that provides direct support for TLS. Speculative execution is supported, however, in the form of Hardware Transactional Memory (HTM) available in processors such as the Intel Core and the IBM POWER8. HTM implements three out of the four key features required by TLS: conflict detection, speculative storage, and transaction roll-back. This paper presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for loop parallelization with Thread-Level Speculation (TLS) and describes a careful evaluation of the implementation of TLS on the HTM extensions available in such machines. The sample implementation of TLS over HTM described in this paper also provides evidence that the programming effort to implement TLS over HTM support is non-trivial. Thus the paper also describes an extension to OpenMP that both makes TLS more accessible to OpenMP programmers and allows for the easy tuning of TLS parameters. As a result, it provides evidence to support several important claims about the performance of TLS over HTM in the Intel Core and the IBM POWER8 architectures. Experimental results reveal that by implementing TLS on top of HTM, speed-ups of up to 3.8x can be obtained for some loops.

Related posts

O eScience na Revista Pesquisa Fapesp

escience

Self-tearing and self-peeling of folded graphene nanoribbons

cces cces

São Paulo School of Advanced Science on Learning from Data

cces cces