From 787dc20952aecdfb9322fb511f07d1d21b5733c7 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Wed, 8 Jul 2020 19:45:50 +0200 Subject: [PATCH] BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD() When running multi-threaded tests on my NanoPI-Fire3 (8 A53 cores), I managed to occasionally get either a bus error or a segfault in the scheduler, but only when running at a high connection rate (injecting on a tcp-request connection reject rule). The bug is rare and happens around once per million connections. I could never reproduce it with less than 4 threads nor on A72 cores. Haproxy 2.1.0 would also fail there but not 2.1.7. Every time the crash happened with the TL_URGENT task list corrupted, though it was not immediately after the LIST_SPLICE() call, indicating background activity survived the MT_LIST_BEHEAD() operation. This queue is where the shared runqueue is transferred, and the shared runqueue gets fast inter-thread tasklet wakeups from idle conn takeover and new connections. Comparing the MT_LIST_BEHEAD() and MT_LIST_DEL() implementations, it's quite obvious that a few barriers are missing from the former, and these will simply fail on weakly ordered caches. Two store barriers were added before the break() on failure, to match what is done on the normal path. Missing them almost always results in a segfault which is quite rare but consistent (after ~3M connections). The 3rd one before updating n->prev seems intuitively needed though I coudln't make the code fail without it. It's present in MT_LIST_DEL so better not be needlessly creative. The last one is the most important one, and seems to be the one that helps a concurrent MT_LIST_ADDQ() detect a late failure and try again. With this, the code survives at least 30M connections. Interestingly the exact same issue was addressed in 2.0-dev2 for MT_LIST_DEL with commit 690d2ad4d ("BUG/MEDIUM: list: add missing store barriers when updating elements and head"). This fix must be backported to 2.1 as MT_LIST_BEHEAD() is also used there. It's only tagged as medium because it will only affect entry-level CPUs like Cortex A53 (x86 are not affected), and requires load levels that are very hard to achieve on such machines to trigger it. In practice it's unlikely anyone will ever hit it. --- include/haproxy/list.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/haproxy/list.h b/include/haproxy/list.h index 98d9f69..1e45b63 100644 --- a/include/haproxy/list.h +++ b/include/haproxy/list.h @@ -349,6 +349,7 @@ continue; \ if (_p == (lh)) { \ (lh)->prev = _p; \ + __ha_barrier_store(); \ _n = NULL; \ break; \ } \ @@ -361,12 +362,15 @@ if (_n == (lh)) { \ (lh)->next = _n; \ (lh)->prev = _p; \ + __ha_barrier_store(); \ _n = NULL; \ break; \ } \ (lh)->next = (lh); \ (lh)->prev = (lh); \ + __ha_barrier_store(); \ _n->prev = _p; \ + __ha_barrier_store(); \ _p->next = NULL; \ __ha_barrier_store(); \ break; \ -- 1.7.10.4