From: Willy Tarreau Date: Mon, 30 Nov 2020 13:58:53 +0000 (+0100) Subject: BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link X-Git-Url: http://git.haproxy.org/?a=commitdiff_plain;h=a5632fbddfffd02207a476fc7466222f711a133e;p=haproxy-2.1.git BUG/MEDIUM: task: close a possible data race condition on a tasklet's list link In issue #958 Ashley Penney reported intermittent crashes on AWS's ARM nodes which would not happen on x86 nodes. After investigation it turned out that the Neoverse N1 CPU cores used in the Graviton2 CPU are much more aggressive than the usual Cortex A53/A72/A55 or any x86 regarding memory ordering. The issue that was triggered there is that if a tasklet_wakeup() call is made on a tasklet scheduled to run on a foreign thread and that tasklet is just being dequeued to be processed, there can be a race at two places: - if MT_LIST_TRY_ADDQ() happens between MT_LIST_BEHEAD() and LIST_SPLICE_END_DETACHED() if the tasklet is alone in the list, because the emptiness tests matches ; - if MT_LIST_TRY_ADDQ() happens during LIST_DEL_INIT() in run_tasks_from_lists(), then depending on how LIST_DEL_INIT() ends up being implemented, it may even corrupt the adjacent nodes while they're being reused for the in-tree storage. This issue was introduced in 2.2 when support for waking up remote tasklets was added. Initially the attachment of a tasklet to a list was enough to know its status and this used to be stable information. Now it's not sufficient to rely on this anymore, thus we need to use a different information. This patch solves this by adding a new task flag, TASK_IN_LIST, which is atomically set before attaching a tasklet to a list, and is only removed after the tasklet is detached from a list. It is checked by tasklet_wakeup_on() so that it may only be done while the tasklet is out of any list, and is cleared during the state switch when calling the tasklet. Note that the flag is not set for pure tasks as it's not needed. However this introduces a new special case: the function tasklet_remove_from_tasklet_list() needs to keep both states in sync and cannot check both the state and the attachment to a list at the same time. This function is already limited to being used by the thread owning the tasklet, so in this case the test remains reliable. However, just like its predecessors, this function is wrong by design and it should probably be replaced with a stricter one, a lazy one, or be totally removed (it's only used in checks to avoid calling a possibly scheduled event, and when freeing a tasklet). Regardless, for now the function exists so the flag is removed only if the deletion could be done, which covers all cases we're interested in regarding the insertion. This removal is safe against a concurrent tasklet_wakeup_on() since MT_LIST_DEL() guarantees the atomic test, and will ultimately clear the flag only if the task could be deleted, so the flag will always reflect the last state. This should be carefully be backported as far as 2.2 after some observation period. This patch depends on previous patch "MINOR: task: remove __tasklet_remove_from_tasklet_list()". (cherry picked from commit 4d6c594998a47d2c62ff74fba36f5798bea1a228) Signed-off-by: Christopher Faulet (cherry picked from commit 3027cdc621d8fe6926361fccb1ea3e33a1c30853) [wt: been in 2.3 for 3 versions, take it now; minor adaptations around MT_LIST API changes (ADDQ/TRY_ADDQ/ADDQ_NOCHECK)] Signed-off-by: Willy Tarreau (cherry picked from commit 543be72e34b660d9bf1f2ecc176400c1cb3c4c7c) [wt: contrary to the initial belief, 2.1 is affected and easily crashes on high connect rates with many threads (16 is enough). tasklet_wakeup() is simpler in 2.1 (single queue); single LIST_DEL location in process_runnable_tasks(); now tested and survives at least 3000 times longer] Signed-off-by: Willy Tarreau --- diff --git a/include/proto/task.h b/include/proto/task.h index c4e9ea1..24f9392 100644 --- a/include/proto/task.h +++ b/include/proto/task.h @@ -242,23 +242,28 @@ static inline struct task *task_unlink_rq(struct task *t) static inline void tasklet_wakeup(struct tasklet *tl) { + unsigned short state = tl->state; + + do { + /* do nothing if someone else already added it */ + if (state & TASK_IN_LIST) + return; + } while (!_HA_ATOMIC_CAS(&tl->state, &state, state | TASK_IN_LIST)); + + /* at this point we're the first ones to add this task to the list */ + if (likely(tl->tid < 0)) { /* this tasklet runs on the caller thread */ - if (LIST_ISEMPTY(&tl->list)) { - LIST_ADDQ(&task_per_thread[tid].task_list, &tl->list); - _HA_ATOMIC_ADD(&tasks_run_queue, 1); - } + LIST_ADDQ(&task_per_thread[tid].task_list, &tl->list); } else { /* this tasklet runs on a specific thread */ - if (MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list) == 1) { - _HA_ATOMIC_ADD(&tasks_run_queue, 1); - if (sleeping_thread_mask & (1UL << tl->tid)) { - _HA_ATOMIC_AND(&sleeping_thread_mask, ~(1UL << tl->tid)); - wake_thread(tl->tid); - } + MT_LIST_ADDQ(&task_per_thread[tl->tid].shared_tasklet_list, (struct mt_list *)&tl->list); + if (sleeping_thread_mask & (1UL << tl->tid)) { + _HA_ATOMIC_AND(&sleeping_thread_mask, ~(1UL << tl->tid)); + wake_thread(tl->tid); } } - + _HA_ATOMIC_ADD(&tasks_run_queue, 1); } /* Insert a tasklet into the tasklet list. If used with a plain task instead, @@ -272,11 +277,16 @@ static inline void tasklet_insert_into_tasklet_list(struct tasklet *tl) /* Try to remove a tasklet from the list. This call is inherently racy and may * only be performed on the thread that was supposed to dequeue this tasklet. + * This way it is safe to call MT_LIST_DEL without first removing the + * TASK_IN_LIST bit, which must absolutely be removed afterwards in case + * another thread would want to wake this tasklet up in parallel. */ static inline void tasklet_remove_from_tasklet_list(struct tasklet *t) { - if (MT_LIST_DEL((struct mt_list *)&t->list)) + if (MT_LIST_DEL((struct mt_list *)&t->list)) { + _HA_ATOMIC_AND(&t->state, ~TASK_IN_LIST); _HA_ATOMIC_SUB(&tasks_run_queue, 1); + } } /* diff --git a/include/types/task.h b/include/types/task.h index bc516f1..fae8b32 100644 --- a/include/types/task.h +++ b/include/types/task.h @@ -36,6 +36,7 @@ #define TASK_QUEUED 0x0004 /* The task has been (re-)added to the run queue */ #define TASK_SHARED_WQ 0x0008 /* The task's expiration may be updated by other * threads, must be set before first queue/wakeup */ +#define TASK_IN_LIST 0x0040 /* tasklet is in a tasklet list */ #define TASK_WOKEN_INIT 0x0100 /* woken up for initialisation purposes */ #define TASK_WOKEN_TIMER 0x0200 /* woken up because of expired timer */ diff --git a/src/task.c b/src/task.c index a7bbb32..6061ada 100644 --- a/src/task.c +++ b/src/task.c @@ -403,10 +403,11 @@ void process_runnable_tasks() _HA_ATOMIC_SUB(&tasks_run_queue, 1); t = (struct task *)LIST_ELEM(task_per_thread[tid].task_list.n, struct tasklet *, list); + LIST_DEL_INIT(&((struct tasklet *)t)->list); + __ha_barrier_store(); state = (t->state & TASK_SHARED_WQ) | TASK_RUNNING; state = _HA_ATOMIC_XCHG(&t->state, state); __ha_barrier_atomic_store(); - LIST_DEL_INIT(&((struct tasklet *)t)->list); ti->flags &= ~TI_FL_STUCK; // this thread is still running activity[tid].ctxsw++;