From 1c759956112996245eaccbf20e2506b9c9cbceb2 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Tue, 10 Dec 2019 18:38:09 +0100 Subject: [PATCH] BUG/MAJOR: dns: add minimalist error processing on the Rx path It was reported in bug #399 that the DNS sometimes enters endless loops after hours working fine. The issue is caused by a lack of error processing in the DNS's recv() path combined with an exclusive recv OR send in the UDP layer, resulting in some errors causing CPU loops that will never stop until the process is restarted. The basic cause is that the FD_POLL_ERR and FD_POLL_HUP flags are sticky on the FD, and contrary to a stream socket, receiving an error on a datagram socket doesn't indicate that this socket cannot be used anymore. Thus the Rx code must at least handle this situation and flush the error otherwise it will constantly be reported. In theory this should not be a big issue but in practise it is due to another bug in the UDP datagram handler which prevents the send() callback from being called when Rx readiness was reported, so the situation cannot go away. It happens way more easily with threads enabled, so that there is no dead time between the moment the FD is disabled and another recv() is called, such as in the example below where the request was sent to a closed port on the loopback provoking an ICMP unreachable to be sent back: [pid 20888] 18:26:57.826408 sendto(29, ";\340\1\0\0\1\0\0\0\0\0\1\0031wt\2eu\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 35, 0, NULL, > [pid 20893] 18:26:57.826566 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 ECONNREFUSED (Connection refused) [pid 20889] 18:26:57.826601 recvfrom(29, 0x7f97c76182f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20892] 18:26:57.826630 recvfrom(29, 0x7f97c5cf02f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20891] 18:26:57.826684 recvfrom(29, 0x7f97c66162f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20895] 18:26:57.826716 recvfrom(29, 0x7f97bffda2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20894] 18:26:57.826747 recvfrom(29, 0x7f97c4cee2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20888] 18:26:58.419838 recvfrom(29, 0x7ffcc8712c20, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) [pid 20893] 18:26:58.419900 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) (... hundreds before next sendto() ...) This situation was handled by clearing HUP and ERR when recv() returns <0. A second case was handled, there was a control for a missing dgram handler, but it does nothing, causing the FD to ring again if this situation ever happens. After looking at the rest of the code, it doesn't seem possible to face such a situation because these handlers are registered during startup, but at least we need to handle it properly. A third case was handled, that's mainly a small optimization. With threads and massive responses, due to the large lock around the loop, it's likely that some threads will have seen fd_recv_ready() and will wait at the lock(). But if they wait here, chances are that other threads will have eliminated pending data and issued fd_cant_recv(). In this case, better re-check fd_recv_ready() before performing the recv() call to avoid the huge amounts of syscalls that happen on massively threaded setups. This patch must be backported as far as 1.6 (the atomic AND just needs to be turned to a regular AND). --- src/dns.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/dns.c b/src/dns.c index c52f1ca..c131f08 100644 --- a/src/dns.c +++ b/src/dns.c @@ -1537,18 +1537,25 @@ static void dns_resolve_recv(struct dgram_conn *dgram) return; /* no need to go further if we can't retrieve the nameserver */ - if ((ns = dgram->owner) == NULL) + if ((ns = dgram->owner) == NULL) { + _HA_ATOMIC_AND(&fdtab[fd].ev, ~(FD_POLL_HUP|FD_POLL_ERR)); + fd_stop_recv(fd); return; + } resolvers = ns->resolvers; HA_SPIN_LOCK(DNS_LOCK, &resolvers->lock); /* process all pending input messages */ - while (1) { + while (fd_recv_ready(fd)) { /* read message received */ memset(buf, '\0', resolvers->accepted_payload_size + 1); if ((buflen = recv(fd, (char*)buf , resolvers->accepted_payload_size + 1, 0)) < 0) { - /* FIXME : for now we consider EAGAIN only */ + /* FIXME : for now we consider EAGAIN only, but at + * least we purge sticky errors that would cause us to + * be called in loops. + */ + _HA_ATOMIC_AND(&fdtab[fd].ev, ~(FD_POLL_HUP|FD_POLL_ERR)); fd_cant_recv(fd); break; } -- 1.7.10.4