MINOR: plock: use an ARMv8 instruction barrier for the pause instruction
authorYour Name <you@example.com>
Sat, 28 Nov 2020 15:37:14 +0000 (15:37 +0000)
committerChristopher Faulet <cfaulet@haproxy.com>
Mon, 14 Dec 2020 10:36:34 +0000 (11:36 +0100)
As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.

This should be backported as far as 2.2, possibly even 2.0.

(cherry picked from commit 1e237d037b3a45ec92d1dfa80dfd2c6bd7fc3af9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 367c1dbed1e3c5493c22e974fa01cef0f5238ebc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 909dc3e911d1fba2317b2a1f77895932e3a7da60)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

include/import/atomic-ops.h

index 0081f9a..5c312ad 100644 (file)
 #else
 /* generic implementations */
 
+#if defined(__aarch64__)
+
+/* This was shown to improve fairness on modern ARMv8 such as Neoverse N1 */
+#define pl_cpu_relax() do {                            \
+               asm volatile("isb" ::: "memory");       \
+       } while (0)
+
+#else
+
 #define pl_cpu_relax() do {             \
                asm volatile("");       \
        } while (0)
 
+#endif
+
 /* full memory barrier */
 #define pl_mb() do {                    \
                __sync_synchronize();   \