From bd773295ae61210befceba7a8af7784b04706c5c Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Wed, 16 Oct 2024 18:08:39 +0200
Subject: [PATCH] BUG/MEDIUM: queue: make sure never to queue when there's no
 more served conns

Since commit 53f52e67a0 ("BUG/MEDIUM: queue: always dequeue the backend when
redistributing the last server"), we've got two reports again still showing
the theoretically impossible condition in pendconn_add(), including a single
threaded one.

Thanks to the traces, the issue could be tracked down to the redispatch part.
In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the
LB if there are pending connections in the backend, since it indicates that
previous attempts already failed, so we directly return SRV_STATUS_FULL. And
contrary to a previous belief, it is possible to meet this condition with
be->served==0 when redispatching (and likely with maxconn not greater than
the number of threads).

The problem is that in this case, the entry is queued and then the
pendconn_must_try_again() function checks if any connections are currently
being served to detect if we missed a race, and tries again, but that
situation is not caused by a concurrent thread and will never fix itself,
resulting in the loop.

All that part around pendconn_must_try_again() is still quite brittle, and
a safer approach would involve a sequence counter to detect new arrivals
and dequeues during the pendconn_add() call. But it's more sensitive work,
probably for a later fix.

This fix must be backported wherever the fix above was backported. Thanks
to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from
Criteo for their help on tracking this one!

(cherry picked from commit ca275d99ce02e72d707fc87da133d739cdda5146)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
---
 src/backend.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend.c b/src/backend.c
index ff698fe..4fb4cc1 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -681,7 +681,7 @@ int assign_server(struct stream *s)
 		/* if there's some queue on the backend, with certain algos we
 		 * know it's because all servers are full.
 		 */
-		if (s->be->queue.length && s->be->queue.length != s->be->beconn &&
+		if (s->be->queue.length && s->be->served && s->be->queue.length != s->be->beconn &&
 		    (((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_FAS)||   // first
 		     ((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_RR) ||   // roundrobin
 		     ((s->be->lbprm.algo & (BE_LB_KIND|BE_LB_NEED|BE_LB_PARM)) == BE_LB_ALGO_SRR))) { // static-rr
-- 
1.7.10.4