From: Your Name <you@example.com>
Date: Sat, 28 Nov 2020 15:37:14 +0000 (+0000)
Subject: MINOR: plock: use an ARMv8 instruction barrier for the pause instruction
X-Git-Tag: v2.1.11~46
X-Git-Url: http://git.haproxy.org/?a=commitdiff_plain;h=eb38345fd10f08da9aa67a44d6d0da12169b5811;p=haproxy-2.1.git

MINOR: plock: use an ARMv8 instruction barrier for the pause instruction

As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.

This should be backported as far as 2.2, possibly even 2.0.

(cherry picked from commit 1e237d037b3a45ec92d1dfa80dfd2c6bd7fc3af9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 367c1dbed1e3c5493c22e974fa01cef0f5238ebc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 909dc3e911d1fba2317b2a1f77895932e3a7da60)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
---

diff --git a/include/import/atomic-ops.h b/include/import/atomic-ops.h
index 0081f9a..5c312ad 100644
--- a/include/import/atomic-ops.h
+++ b/include/import/atomic-ops.h
@@ -524,10 +524,21 @@
 #else
 /* generic implementations */
 
+#if defined(__aarch64__)
+
+/* This was shown to improve fairness on modern ARMv8 such as Neoverse N1 */
+#define pl_cpu_relax() do {				\
+		asm volatile("isb" ::: "memory");	\
+	} while (0)
+
+#else
+
 #define pl_cpu_relax() do {             \
 		asm volatile("");       \
 	} while (0)
 
+#endif
+
 /* full memory barrier */
 #define pl_mb() do {                    \
 		__sync_synchronize();   \