Hugo Guiroux - Understanding the performance of mutual exclusion algorithms on modern multicore machines

09:00

Monday

Dec

2018

Thesis defence

Place:

Campus - Amphi MJK

Speaker:

Hugo Guiroux

Teams:

ERODS

Keywords:

scalability
multicore architecture
performance
mutual exclusion
lock

Jury :

Tim Harris, Principal Engineer, Amazon Cambridge, reviewer
Gaël Thomas, professeur des universités, Telecom SudParis, reviewer
Andrzej Duda, professeur des universités, Grenoble INP, examiner
Pascal Felber, professeur des universités, Université de Neuchâtel, examiner
Vivien Quéma, professeur des universités, Grenoble INP, thesis director
Renaud Lachaize, maître de conférences, Université Grenoble Alpes, thesis co-supervisor

A plethora of optimized mutual exclusion lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and synchronization. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this thesis, we perform a thorough and practical analysis, with the goal of providing software developers with enough information to achieve fast, scalable and energy-efficient synchronization in their systems. First, we provide a performance study of 28 state-of-the-art mutex lock algorithms, on 40 applications, and four different multicore machines. We not only consider throughput (traditionally the main performance metric), but also energy efficiency and tail latency, which are becoming increasingly important. Second, we present an in-depth analysis in which we summarize our findings for all the studied applications. In particular, we describe nine different lock-related performance bottlenecks, and propose six guidelines helping software developers with their choice of a lock algorithm according to the different lock properties and the application characteristics.

From our detailed analysis, we make a number of observations regarding locking algorithms and application behaviors, several of which have not been previously discovered: (i) applications not only stress the lock/unlock interface, but also the full locking API (e.g., trylocks, condition variables), (ii) the memory footprint of a lock can directly affect the application performance, (iii) for many applications, the interaction between locks and scheduling is an important application performance factor, (iv) lock tail latencies may or may not affect application tail latency, (v) no single lock is systematically the best, (vi) choosing the best lock is difficult (as it depends on many factors such as the workload and the machine), and (vii) energy efficiency and throughput go hand in hand in the context of lock algorithms.
These findings highlight that locking involves more considerations than the simple “lock – unlock” interface and call for further research on designing low-memory footprint adaptive locks that fully and efficiently support the full lock interface, and consider all performance metrics.