BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler()
Since 2.0 with commit
e6a02fa65a ("MINOR: threads: add a "stuck" flag
to the thread_info struct"), the TH_FL_STUCK flag was set by the
debugger to flag that a thread was stuck and report it in the output.
However, two commits later (
2bfefdbaef "MAJOR: watchdog: implement a
thread lockup detection mechanism"), this flag was used to detect that
a thread had already been reported as stuck. The problem is that it
seldom happens that a "show threads" command instantly crashes because
it calls debug_handler(), which sets the flag, and if the watchdog timer
was about to trigger before going back to the scheduler, the watchdog
believes that the thread has been stuck for a while and will kill the
process.
The issue was magnified in 3.1 with the lower-delay warning, because
it's possible for a thread to die on the next wakeup after the first
warning (which calls debug_handler() hence sets the STUCK flag).
One good approach would have been to use two distinct flags, one for
"stuck" as reported by the debug handler, and one for "stuck" as seen
by the watchdog. However, one could also argue that since the second
commit, given that the wdt monitors the threads, there's no point any
more for the debug handler to set the flag itself. Removing this code
means that two consecutive "show threads" will not report "stuck" until
the watchdog sets it, which aligns better with expectations.
This can be backported to all stable releases. This code has changed a
bit over time, the "if" block and the harmless variables just need to
be removed.
(cherry picked from commit
1151fe68186cf862882f147de208c509c25d525e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>