As suggested by @AGSaidi in issue #958, on ARMv8 its convenient to use
an "isb" instruction in pl_cpu_relax() to improve fairness. Without it
I've met a few watchdog conditions on valid locks with 16 threads,
indicating that some threads couldn't manage to get it in 2 seconds. I
never happened again with it. In addition, the performance increased
by slightly more than 5% thanks to the reduced contention.
This should be backported as far as 2.2, possibly even 2.0.
(cherry picked from commit
1e237d037b3a45ec92d1dfa80dfd2c6bd7fc3af9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
367c1dbed1e3c5493c22e974fa01cef0f5238ebc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
909dc3e911d1fba2317b2a1f77895932e3a7da60)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
#else
/* generic implementations */
+#if defined(__aarch64__)
+
+/* This was shown to improve fairness on modern ARMv8 such as Neoverse N1 */
+#define pl_cpu_relax() do { \
+ asm volatile("isb" ::: "memory"); \
+ } while (0)
+
+#else
+
#define pl_cpu_relax() do { \
asm volatile(""); \
} while (0)
+#endif
+
/* full memory barrier */
#define pl_mb() do { \
__sync_synchronize(); \