Willy Tarreau [Wed, 11 Dec 2019 15:29:10 +0000 (16:29 +0100)]
BUILD/MINOR: unix sockets: silence an absurd gcc warning about strncpy()
Apparently gcc developers decided that strncpy() semantics are no longer
valid and now deserve a warning, especially if used exactly as designed.
This results in issue #304. Let's just remove one to the target size to
please her majesty gcc, the God of C Compilers, who tries hard to make
users completely eliminate any use of string.h and reimplement it by
themselves at much higher risks. Pfff....
This can be backported to stable version, the fix is harmless since it
ignores the last zero that is already set on next line.
(cherry picked from commit
719e07c989be48a69fcfcaa404d12d7478de8a1b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 14:51:37 +0000 (15:51 +0100)]
BUG/MINOR: listener: fix off-by-one in state name check
As reported in issue #380, the state check in listener_state_str() is
invalid as it allows state value 9 to report crap. We don't use such
a state value so the issue should never happen unless the memory is
already corrupted, but better clean this now while it's harmless.
This should be backported to all maintained branches.
(cherry picked from commit
fec56c6a76463d40be3e15eee297aa8d2b67362a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 14:43:45 +0000 (15:43 +0100)]
BUG/MINOR: server: make "agent-addr" work on default-server line
As reported in issue #408, "agent-addr" doesn't work on default-server
lines. This is due to the transcription of the old "addr" option in commit
6e5e0d8f9e ("MINOR: server: Make 'default-server' support 'addr' keyword.")
which correctly assigns it to the check.addr and agent.addr fields, but
which also copies the default check.addr into both the check's and the
agent's addr fields. Thus the default agent's address is never used.
This fix makes sure to copy the check from the check and the agent from
the agent. However it's worth noting that if "addr" is specified on the
server line, it will still overwrite both the check and the agent's
addresses.
This must be backported as far as 1.8.
(cherry picked from commit
2444108f16868ccde928d97ffa3db847ddad89fb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 14:06:30 +0000 (15:06 +0100)]
BUG/MINOR: listener: do not immediately resume on transient error
The listener supports a "transient error" situation, which corresponds
to those situations where accept fails badly but poll() reports an event.
This happens for example when a listener is paused, or on out of FD. The
same mechanism is used when facing a maxconn or maxsessrate limitation.
When this happens, the listener is disabled for up to 100ms and put back
into the global listener queue so that it automatically wakes up again
as soon as the conditions change from an existing connection releasing
one resource, or the system recovers from a transient issue.
The listener_accept() function has a bug in its exit path causing a
freshly limited listener to be immediately enabled again because all
the conditions are met (connection count < max). It doesn't take into
account the fact that the listener might have been queued and must
first wait for the timeout to expire before doing so. The impact is
that upon certain errors, the faulty process will busy loop on the
accept code without sleeping. This is the scenario reported and
diagnosed by @hedong0411 in issue #382.
This commit fixes it by verifying that the global queue's delay is
at least expired before deciding to resume the listener. Another
approach could consist in having an extra state like LI_DELAY for
situations where only a delay is acceptable, but this would probably
not bring anything except more complex code.
This issue was introduced with the lock-free listener accept code
(commits 3f0d02b and
82c9789a) that were backported to 1.8.20+ and
1.9.7+, so this fix must be backported to the relevant branches.
(cherry picked from commit
cdcba115b8a6d3773d5bd3c0fe6f8c239d356eab)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 13:24:07 +0000 (14:24 +0100)]
BUG/MINOR: mworker: properly pass SIGTTOU/SIGTTIN to workers
If a new process is started with -sf and it fails to bind, it may send
a SIGTTOU to the master process in hope that it will temporarily unbind.
Unfortunately this one doesn't catch it and stops to background instead
of forwarding the signal to the workers. The same is true for SIGTTIN.
This commit simply implements an extra signal handler for the master to
deal with such signals that must be passed down to the workers. It must
be backported as far as 1.8, though there the code differs in that it's
entirely in haproxy.c and doesn't require an extra sig handler.
(cherry picked from commit
d26c9f9465de24d2414f4a46653fc20fd2871ac4)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 11:05:39 +0000 (12:05 +0100)]
BUG/MINOR: log: fix minor resource leaks on logformat error path
As reported by Ilya in issue #392, Coverity found that we're leaking
allocated strings on error paths in parse_logformat(). Let's use a
proper exit label for failures instead of seeding return 0 everywhere.
This should be backported to all supported versions.
(cherry picked from commit
51013e82d4931c4f0ce6f7fc99788a39cc6960ed)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 10:55:52 +0000 (11:55 +0100)]
DOC: remove references to the outdated architecture.txt
As mentionned in bug #405 we continue to reference architecture.txt from
places in the doc despite this file not being packaged for many years.
Better drop the reference if it's confusing.
(cherry picked from commit
9ef75ecea129e3f77aaeb38cf5c163d0eb73742e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Julien Pivotto [Tue, 10 Dec 2019 12:11:17 +0000 (13:11 +0100)]
DOC: proxies: HAProxy only supports 3 connection modes
The 4th one (forceclose) has been deprecated and deleted from the
documentation in
10c6c16cde0b0b473a1ab16e958a7d6b61ed36fc
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
(cherry picked from commit
21ad31531601299eb52a56b50f90f491f46e4a88)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 08:11:58 +0000 (09:11 +0100)]
BUG/MINOR: tasks: only requeue a task if it was already in the queue
Commit
0742c314c3 ("BUG/MEDIUM: tasks: Make sure we switch wait queues
in task_set_affinity().") had a slight side effect on expired timeouts,
which is that when used before a timeout is updated, it will cause an
existing task to be requeued earlier than its expected timeout when done
before being updated, resulting in the next poll wakup timeout too early
or even instantly if the previous wake up was done on a timeout. This is
visible in strace when health checks are enabled because there are two
poll calls, one of which has a short or zero delay. The correct solution
is to only requeue a task if it was already in the queue.
This can be backported to all branches having the fix above.
(cherry picked from commit
440d09b2448fcddea6877054300a95ba5b55dac7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 11 Dec 2019 05:38:14 +0000 (06:38 +0100)]
DOC: listeners: add a few missing transitions
Some disable() transitions were missing, and the distinction between
multi-threaded and single-threaded transitions was not mentioned.
(cherry picked from commit
4ac36d691ab40a29ec4a1fb43a3270f31114feda)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 17:12:04 +0000 (18:12 +0100)]
BUG/MEDIUM: proto_udp/threads: recv() and send() must not be exclusive.
This is a complement to previous fix for bug #399. The exclusion between
the recv() and send() calls prevents send handlers from being called if
rx readiness is reported. The DNS code can trigger this situations with
threads where the fd_recv_ready() flag disappears between the test in
dgram_fd_handler() and the second test in dns_resolve_recv() while a
thread calls fd_cant_recv(), and this situation can sustain itself for
a while. With 8 threads and an error in the socket queue, placing a
printf on the return statement in dns_resolve_recv() scrolls very fast.
Simply removing the "else" in dgram_fd_handler() addresses the issue.
This fix must be backported as far as 1.6.
(cherry picked from commit
d7f76a0a50f4ac6b32d2317c675b3752133ef6d2)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 17:38:09 +0000 (18:38 +0100)]
BUG/MAJOR: dns: add minimalist error processing on the Rx path
It was reported in bug #399 that the DNS sometimes enters endless loops
after hours working fine. The issue is caused by a lack of error
processing in the DNS's recv() path combined with an exclusive recv OR
send in the UDP layer, resulting in some errors causing CPU loops that
will never stop until the process is restarted.
The basic cause is that the FD_POLL_ERR and FD_POLL_HUP flags are sticky
on the FD, and contrary to a stream socket, receiving an error on a
datagram socket doesn't indicate that this socket cannot be used anymore.
Thus the Rx code must at least handle this situation and flush the error
otherwise it will constantly be reported. In theory this should not be a
big issue but in practise it is due to another bug in the UDP datagram
handler which prevents the send() callback from being called when Rx
readiness was reported, so the situation cannot go away. It happens way
more easily with threads enabled, so that there is no dead time between
the moment the FD is disabled and another recv() is called, such as in
the example below where the request was sent to a closed port on the
loopback provoking an ICMP unreachable to be sent back:
[pid 20888] 18:26:57.826408 sendto(29, ";\340\1\0\0\1\0\0\0\0\0\1\0031wt\2eu\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 35, 0, NULL, >
[pid 20893] 18:26:57.826566 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 ECONNREFUSED (Connection refused)
[pid 20889] 18:26:57.826601 recvfrom(29, 0x7f97c76182f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20892] 18:26:57.826630 recvfrom(29, 0x7f97c5cf02f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20891] 18:26:57.826684 recvfrom(29, 0x7f97c66162f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20895] 18:26:57.826716 recvfrom(29, 0x7f97bffda2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20894] 18:26:57.826747 recvfrom(29, 0x7f97c4cee2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20888] 18:26:58.419838 recvfrom(29, 0x7ffcc8712c20, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 20893] 18:26:58.419900 recvfrom(29, 0x7f97c54ef2f0, 513, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
(... hundreds before next sendto() ...)
This situation was handled by clearing HUP and ERR when recv()
returns <0.
A second case was handled, there was a control for a missing dgram
handler, but it does nothing, causing the FD to ring again if this
situation ever happens. After looking at the rest of the code, it
doesn't seem possible to face such a situation because these handlers
are registered during startup, but at least we need to handle it
properly.
A third case was handled, that's mainly a small optimization. With
threads and massive responses, due to the large lock around the loop,
it's likely that some threads will have seen fd_recv_ready() and will
wait at the lock(). But if they wait here, chances are that other
threads will have eliminated pending data and issued fd_cant_recv().
In this case, better re-check fd_recv_ready() before performing the
recv() call to avoid the huge amounts of syscalls that happen on
massively threaded setups.
This patch must be backported as far as 1.6 (the atomic AND just
needs to be turned to a regular AND).
(cherry picked from commit
1c759956112996245eaccbf20e2506b9c9cbceb2)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Olivier Houchard [Tue, 10 Dec 2019 17:22:55 +0000 (18:22 +0100)]
BUG/MEDIUM: kqueue: Make sure we report read events even when no data.
When we have a EVFILT_READ event, an optimization was made, and the FD was
not reported as ready to receive if there were no data available. That way,
if the socket was closed by our peer (the EV8EOF flag was set), and there were
no remaining data to read, we would just close(), and avoid doing a recv().
However, it may be fine for TCP socket, but it is not for UDP.
If we send data via UDP, and we receive an error, the only way to detect it
is to attempt a recv(). However, in this case, kevent() will report a read
event, but with no data, so we'd just ignore that read event, nothing would be
done about it, and the poller would be woken up by it over and over.
To fix this, report read events if either we have data, or the EV_EOF flag
is not set.
This should be backported to 2.1, 2.0, 1.9 and 1.8.
(cherry picked from commit
eaefc3c5032506e89cceb6ad5fdd1c5955c4ea66)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 15:06:53 +0000 (16:06 +0100)]
DOC: document the listener state transitions
This was done by reading all the code affecting a listener's state,
hopefully it will save some time in the future.
(cherry picked from commit
977afab3f824b8322853e27abbaaa752f601a504)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 08:30:05 +0000 (09:30 +0100)]
BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept()
Recent fix
4c044e274c ("BUG/MEDIUM: listener/thread: fix a race when
pausing a listener") is insufficient and moves the race slightly farther.
What now happens is that if we're limiting a listener due to a transient
error such as an accept() error for example, or because the proxy's
maxconn was reached, another thread might in the mean time have switched
again to LI_READY and at the end of the function we'll disable polling on
this FD, resulting in a listener that never accepts anything anymore. It
can more easily happen when sending SIGTTOU/SIGTTIN to temporarily pause
the listeners to let another process bind next to them.
What this patch does instead is to move all enable/disable operations at
the end of the function and condition them to the state. The listener's
state is checked under the lock and the FD's polling state adjusted
accordingly so that the listener's state and the FD always remain 100%
synchronized. It was verified with 16 threads that the cost of taking
that lock is not measurable so that's fine.
This should be backported to the same branches the patch above is
backported to.
(cherry picked from commit
92079934a913a330a57e6d84eba3dca68c0cde8e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 07:42:21 +0000 (08:42 +0100)]
BUG/MINOR: listener: also clear the error flag on a paused listener
When accept() fails because a listener is temporarily paused, the
FD might have both FD_POLL_HUP and FD_POLL_ERR bits set. While we do
not exploit FD_POLL_ERR here it's better to clear it because it is
reported on "show fd" and is confusing.
This may be backported to all versions.
(cherry picked from commit
20aeb1c7cd38907d704a4d769695b9ea264fa4c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 07:37:04 +0000 (08:37 +0100)]
BUG/MINOR: listener/threads: always use atomic ops to clear the FD events
There was a leftover of the single-threaded era when removing the
FD_POLL_HUP flag from the listeners. By not using an atomic operation
to clear the flag, another thread acting on the same listener might
have lost some events, though this would have resulted in that thread
to reprocess them immediately on the next loop pass.
This should be backported as far as 1.8.
(cherry picked from commit
7cdeb61701729ef7b4d2ed97e590860478ad718d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 10 Dec 2019 06:11:35 +0000 (07:11 +0100)]
BUG/MINOR: proxy: make soft_stop() also close FDs in LI_PAUSED state
The proxies' soft_stop() function closes the FDs in all opened states
except LI_PAUSED. This means that a transient error on a listener might
cause it to turn back to the READY state if it happens exactly when a
reload signal is received.
This must be backported to all supported versions.
(cherry picked from commit
67878d7bdcb88683bdbf6fba851901a31f348eb8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 6 Dec 2019 15:20:49 +0000 (16:20 +0100)]
BUG/MEDIUM: mux-fcgi: Handle cases where the HTX EOM block cannot be inserted
During the HTTP response parsing, if there is not enough space in the channel's
buffer, it is possible to fail to add the HTX EOM block while all data in the
rxbuf were consumed. As for the h1 mux, we must notify the conn-stream the
buffer is full to have a chance to add the HTX EOM block later. In this case, we
must also be carefull to not report a server abort by setting too early the
CS_FL_EOS flag on the conn-stream.
To do so, the FCGI_SF_APPEND_EOM flag must be set on the FCGI stream to know the
HTX EOM block is missing.
This patch must be backported to 2.1.
(cherry picked from commit
f950c2e97e999f2d4728389dbab2d0955a5cc993)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 6 Dec 2019 14:59:05 +0000 (15:59 +0100)]
BUG/MINOR: mux-h1: Be sure to set CS_FL_WANT_ROOM when EOM can't be added
During the message parsing, when the HTX buffer is full and only the HTX EOM
block cannot be added, it is important to notify the conn-stream that some
processing must still be done but it is blocked because there is not enough room
in the buffer. The way to do so is to set the CS_FL_WANT_ROOM flag on the
conn-stream. Otherwise, because all data are received and consumed, the mux is
not called anymore to add this last block, leaving the message unfinished from
the HAProxy point of view. The only way to unblock it is to receive a shutdown
for reads or to hit a timeout.
This patch must be backported to 2.1 and 2.0. The 1.9 does not seem to be
affected.
(cherry picked from commit
7aae858001f99dd4a80e3f533284cda5702d501a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Olivier Houchard [Fri, 29 Nov 2019 15:18:51 +0000 (16:18 +0100)]
BUG/MEDIUM: checks: Make sure we set the task affinity just before connecting.
In process_chk_conn(), make sure we set the task affinity to the current
thread as soon as we're attempting a connection (and reset the affinity to
"any thread" if we detect a failure).
We used to only set the task affinity if connect_conn_chk() returned
SF_ERR_NONE, however for TCP checks, SF_ERR_UP is returned, so for those
checks, the task could still run on any thread, and this could lead to a
race condition where the connection runs on one thread, while the task runs
on another one, which could create random memory corruption and/or crashes.
This may fix github issue #369.
This should be backported to 2.1, 2.0 and 1.9.
(cherry picked from commit
aebeff74fc7eaef12728b1fc15b2d42d93a7767a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Olivier Houchard [Thu, 5 Dec 2019 14:11:19 +0000 (15:11 +0100)]
BUG/MEDIUM: tasks: Make sure we switch wait queues in task_set_affinity().
In task_set_affinity(), leave the wait_queue if any before changing the
affinity, and re-enter a wait queue once it is done. If we don't do that,
the task may stay in the wait queue of another thread, and we later may
end up modifying that wait queue while holding no lock, which could lead
to memory corruption.
THis should be backported to 2.1, 2.0 and 1.9.
(cherry picked from commit
0742c314c35c2c96b72e42076c76d6a6786045ba)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Thu, 5 Dec 2019 11:30:55 +0000 (12:30 +0100)]
BUG/MINOR: mux-h1: Fix conditions to know whether or not we may receive data
The h1_recv_allowed() function is inherited from the h2 multiplexer. But for the
h1, conditions to know if we may receive data are less complex because there is
no multiplexing and because data are not parsed when received. So now, following
rules are respected :
* if an error or a shutdown for reads was detected on the connection we must
not attempt to receive
* if the input buffer failed to be allocated or is full, we must not try to
receive
* if the input processing is busy waiting for the output side, we may attempt
to receive
* otherwise must may not attempt to receive
This patch must be backported as far as 1.9.
(cherry picked from commit
2545a0b352ffb49e68e8945c9b8ce7e633d7e8b0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 5 Dec 2019 10:18:31 +0000 (11:18 +0100)]
BUG/MINOR: mux-h1: Don't rely on CO_FL_SOCK_RD_SH to set H1C_F_CS_SHUTDOWN
The CO_FL_SOCK_RD_SH flag is only set when a read0 is received. So we must not
rely on it to set the H1 connection in shutdown state (H1C_F_CS_SHUTDOWN). In
fact, it is suffisant to set the connection in shutdown state when the shutdown
for writes is forwared to the sock layer.
This patch must be backported as far as 1.9.
(cherry picked from commit
7b109f2f8b9cb493d9f6c01f1613bc54a6f71ba3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 5 Dec 2019 09:23:37 +0000 (10:23 +0100)]
BUG/MEDIUM: mux-h1: Never reuse H1 connection if a shutw is pending
On the server side, when a H1 stream is detached from the connection, if the
connection is not reusable but some outgoing data remain, the connection is not
immediatly released. In this case, the connection is not inserted in any idle
connection list. But it is still attached to the session. Because of that, it
can be erroneously reused. h1_avail_streams() always report a free slot if no
stream is attached to the connection, independently on the connection's
state. It is obviously a bug. If a second request is handled by the same session
(it happens with H2 connections on the client side), this connection is reused
before we close it.
There is small window to hit the bug, but it may lead to very strange
behaviors. For instance, if a first h2 request is quickly aborted by the client
while it is blocked in the mux on the server side (so before any response is
received), a second request can be processed and sent to the server. Because the
connection was not closed, the possible reply to the first request will be
interpreted as a reply to the second one. It is probably the bug described by
Peter Fröhlich in the issue #290.
To fix the bug, a new flag has been added to know if an H1 connection is idle or
not. So now, H1C_F_CS_IDLE is set when a connection is idle and useable to
handle a new request. If it is set, we try to add the connection in an idle
connection list. And h1_avail_streams() only relies on this flag
now. Concretely, this flag is set when a K/A stream is detached and both the
request and the response are in DONE state. It is exclusive to other H1C_F_CS
flags.
This patch must be backported as far as 1.9.
(cherry picked from commit
aaa67bcef299486f1cdb75ef28b3ec6c39713ae6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emmanuel Hocdet [Wed, 6 Nov 2019 15:05:34 +0000 (16:05 +0100)]
BUG/MINOR: ssl: certificate choice can be unexpected with openssl >= 1.1.1
It's regression from 9f9b0c6 "BUG/MEDIUM: ECC cert should work with
TLS < v1.2 and openssl >= 1.1.1". Wilcard EC certifcate could be selected
at the expense of specific RSA certificate.
In any case, specific certificate should always selected first, next wildcard.
Reflect this rule in a loop to avoid any bug in certificate selection changes.
Fix issue #394.
It should be backported as far as 1.8.
(cherry picked from commit
3777e3ad14f2ce54b6662fd0db56413dde9ec9fa)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
Willy Tarreau [Thu, 5 Dec 2019 06:40:32 +0000 (07:40 +0100)]
BUG/MEDIUM: listener/thread: fix a race when pausing a listener
There exists a race in the listener code where a thread might disable
receipt on a listener's FD then turn it to LI_PAUSED while at the same
time another one faces EAGAIN on accept() and enables it again via
fd_cant_recv(). The result is that the FD is in LI_PAUSED state with
its polling still enabled. listener_accept() does not do anything then
and doesn't disable the FD either, resulting in a thread eating all the
CPU as reported in issue #358. A solution would be to take the listener's
lock to perform the fd_cant_recv() call and do it only if the FD is still
in LI_READY state, but this would be totally overkill while in practice
the issue only happens during shutdown.
Instead what is done here is that when leaving we recheck the state and
disable polling if the listener is not in LI_READY state, which never
happens except when being limited. In the worst case there could be one
extra check per thread for the time required to converge, which is
absolutely nothing.
This fix was successfully tested, and should be backported to all
versions using the lock-free listeners, which means all those containing
commit
3f0d02bb ("MAJOR: listener: do not hold the listener lock in
listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.
(cherry picked from commit
4c044e274c16fde42863c476449895b0fd603818)
Signed-off-by: Willy Tarreau <w@1wt.eu>
William Lallemand [Wed, 4 Dec 2019 14:33:01 +0000 (15:33 +0100)]
BUG/MINOR: ssl/cli: don't overwrite the filters variable
When a crt-list line using an already used ckch_store does not contain
filters, it will overwrite the ckchs->filters variable with 0.
This problem will generate all sni_ctx of this ckch_store without
filters. Filters generation mustn't be allowed in any case.
Must be backported in 2.1.
(cherry picked from commit
920b0352389be2f615494e6c2b1327b11bfd1dda)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Dec 2019 17:13:04 +0000 (18:13 +0100)]
BUG/MINOR: stream-int: avoid calling rcv_buf() when splicing is still possible
In si_cs_recv(), we can end up with a partial splice() call that will be
followed by an attempt to us rcv_buf(). Sometimes this works and places
data into the buffer, which then prevent splicing from being used, and
this causes splice() and recvfrom() calls to alternate. Better simply
refrain from calling rcv_buf() when there are data in the pipe and still
data to be forwarded. Usually this indicates that we've ate everything
available and that we still want to use splice() on subsequent calls.
This should be backported to 2.1 and 2.0.
(cherry picked from commit
c640ef1a7de5d13504599f85ca3cf3c282128a05)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Dec 2019 17:08:45 +0000 (18:08 +0100)]
BUG/MEDIUM: stream-int: don't subscribed for recv when we're trying to flush data
If we cannot splice incoming data using rcv_pipe() due to remaining data
in the buffer, we must not subscribe to the mux but instead tag the
stream-int as blocked on missing Rx room. Otherwise when data are
flushed, calling si_chk_rcv() will have no effect because the WAIT_EP
flag remains present, and we'll end in an rx timeout. This case is very
hard to reproduce, and requires an inversion of the polling side in the
middle of a transfer. This can only happen when the client and the server
are using similar links and when splicing is enabled. It typically takes
hundreds of MB to GB for the problem to happen, and tends to be magnified
by the use of option contstats which causes process_stream() to be called
every 5s and to try again to recv.
This fix must be backported to 2.1, 2.0, and possibly 1.9.
(cherry picked from commit
1ac5f208042ff571c9341aed0380ca52c084a261)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Dec 2019 07:29:22 +0000 (08:29 +0100)]
DOC: move the "group" keyword at the right place
It looks like "hard-stop-after", "h1-case-adjust" and "h1-case-adjust-file"
were added before "group", breaking alphabetical ordering.
(cherry picked from commit
11770ce64ba29b91178e61ce6dc11c7713b7469d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Julien Pivotto [Wed, 27 Nov 2019 14:49:54 +0000 (15:49 +0100)]
DOC: Fix ordered list in summary
Section 6 about the cache was placed between 7 and 8. This should
be backported to 2.1.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
(cherry picked from commit
6ccee41ae8bbd27c7df6388c0ece890fa84c269f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Mathias Weiersmueller [Mon, 2 Dec 2019 08:43:40 +0000 (09:43 +0100)]
DOC: clarify matching strings on binary fetches
Add clarification and example to string matching on binary samples,
as comparison stops at first null byte due to strncmp behaviour.
Backporting all the way down to 1.5 is suggested as it might save
from headaches.
(cherry picked from commit
cb250fc9843a335fffe44ed6b15570e5b7cd2a35)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Tim Duesterhus [Wed, 27 Nov 2019 21:35:27 +0000 (22:35 +0100)]
DOC: Clarify behavior of server maxconn in HTTP mode
In HTTP mode the number of concurrent requests is limited, not the
number of actual connections.
(cherry picked from commit
cefbbd98116cc97b43711b784638789c5557e7e6)
Signed-off-by: Willy Tarreau <w@1wt.eu>
William Lallemand [Tue, 3 Dec 2019 12:32:54 +0000 (13:32 +0100)]
BUG/MINOR: ssl/cli: 'ssl cert' cmd only usable w/ admin rights
The 3 commands 'set ssl cert', 'abort ssl cert' and 'commit ssl cert'
must be only usable with admin rights over the CLI.
Must be backported in 2.1.
(cherry picked from commit
230662a0dd66b97ff46e7e3304a69f95ebccbcb8)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>
Christopher Faulet [Mon, 2 Dec 2019 10:29:04 +0000 (11:29 +0100)]
BUG/MINOR: stats: Fix HTML output for the frontends heading
Since the flag STAT_SHOWADMIN was removed, the frontends heading in the HTML
output appears unaligned because the space reserved for the checkbox (not
displayed for frontends) is not inserted.
This patch fixes the issue #390. It must be backported to 2.1.
(cherry picked from commit
bc271ec113ab25b2617da73a8b11fce1a56675d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 2 Dec 2019 09:33:31 +0000 (10:33 +0100)]
BUG/MINOR: fcgi-app: Make the directive pass-header case insensitive
The header name configured by the directive "pass-header", in the "fcgi-app"
section, must be case insensitive. For now, it must be in lowercase to match an
header. Internally, header names are in lowercase but there is no reason to
impose this syntax in the configuration.
This patch must be backported to 2.1.
(cherry picked from commit
bc96c90614b5de59bd8c9d58913111157ed079e7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emmanuel Hocdet [Thu, 24 Oct 2019 16:33:10 +0000 (18:33 +0200)]
BUG/MINOR: ssl: fix SSL_CTX_set1_chain compatibility for openssl < 1.0.2
Commit
1c65fdd5 "MINOR: ssl: add extra chain compatibility" really implement
SSL_CTX_set0_chain. Since ckch can be used to init more than one ctx with
openssl < 1.0.2 (commit
89f58073 for X509_chain_up_ref compatibility),
SSL_CTX_set1_chain compatibility is required.
This patch must be backported to 2.1.
(cherry picked from commit
140b64fb562fb08cecf93ca6bec99822f7d556fb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Fri, 29 Nov 2019 15:48:43 +0000 (16:48 +0100)]
DOC: ssl/cli: set/commit/abort ssl cert
Document the "set/commit/abort ssl cert" CLI commands in management.txt.
Must be backported in 2.1.
(cherry picked from commit
6ab08b3fd468b62ef08d0b6a71e5e7bf6f88e64b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 29 Nov 2019 10:18:45 +0000 (11:18 +0100)]
BUG/MINOR: http-htx: Don't make http_find_header() fail if the value is empty
http_find_header() is used to find the next occurrence of a header matching on
its name. When found, the matching header is returned with the corresponding
value. This value may be empty. Unfortunatly, because of a bug, an empty value
make the function fail.
This patch must be backported to 2.1, 2.0 and 1.9.
(cherry picked from commit
f3ad62996fa3bab907429ce25a2ada10186b4242)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Wed, 27 Nov 2019 14:41:31 +0000 (15:41 +0100)]
BUILD/MINOR: trace: fix use of long type in a few printf format strings
Building on a 32-bit platform produces these warnings in trace code:
src/stream.c: In function 'strm_trace':
src/stream.c:226:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " req=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
^
src/stream.c:229:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 9 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " res=(%p .fl=0x%08x .ana=0x%08x .exp(r,w,a)=(%u,%u,%u) .o=%lu .tot=%llu .to_fwd=%u)",
^
src/mux_fcgi.c: In function 'fcgi_trace':
src/mux_fcgi.c:443:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " - VAL=%lu", *val);
^
src/mux_h1.c: In function 'h1_trace':
src/mux_h1.c:290:29: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka const unsigned int}' [-Wformat=]
chunk_appendf(&trace_buf, " - VAL=%lu", *val);
^
Let's just cast the type to long. This should be backported to 2.1.
(cherry picked from commit
e18f53e01c927273e65374f43ba3417cea136d01)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 27 Nov 2019 13:00:51 +0000 (14:00 +0100)]
BUG/MINOR: h1: Don't test the host header during response parsing
During the H1 message parsing, the host header is tested to be sure it matches
the request's authority, if defined. When there are multiple host headers, we
also take care they are all the same. Of course, these tests must only be
performed on the requests. A host header in a response has no special meaning.
This patch must be backported to 2.1.
(cherry picked from commit
bc7c03eba39ec9f0b94734399853bbece1e1a250)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Dauchy [Tue, 26 Nov 2019 11:56:26 +0000 (12:56 +0100)]
BUG/MINOR: contrib/prometheus-exporter: decode parameter and value only
we were decoding all substring and then parsing; this could lead to
consider & and = in decoding result as delimiters where it should not.
this patch reverses the order by first parsing and then decoding each key
and value separately.
we also stop parsing after number sign (#).
This patch should be backported to 2.1 and 2.0
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
(cherry picked from commit
c65f656d75091db3087a752dbc956159392fc8f2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Mon, 25 Nov 2019 18:47:40 +0000 (19:47 +0100)]
[RELEASE] Released version 2.1.0
Released version 2.1.0 with the following main changes :
- BUG/MINOR: init: fix set-dumpable when using uid/gid
- MINOR: init: avoid code duplication while setting identify
- BUG/MINOR: ssl: ssl_pkey_info_index ex_data can store a dereferenced pointer
- BUG/MINOR: ssl: fix crt-list neg filter for openssl < 1.1.1
- MINOR: peers: Alway show the table info for disconnected peers.
- MINOR: peers: Add TX/RX heartbeat counters.
- MINOR: peers: Add debugging information to "show peers".
- BUG/MINOR: peers: Wrong null "server_name" data field handling.
- MINOR: ssl/cli: 'abort ssl cert' deletes an on-going transaction
- BUG/MEDIUM: mworker: don't fill the -sf argument with -1 during the reexec
- BUG/MINOR: peers: "peer alive" flag not reset when deconnecting.
- BUILD/MINOR: ssl: fix compiler warning about useless statement
- BUG/MEDIUM: stream-int: Don't loose events on the CS when an EOS is reported
- MINOR: contrib/prometheus-exporter: filter exported metrics by scope
- MINOR: contrib/prometheus-exporter: Add a param to ignore servers in maintenance
- BUILD: debug: Avoid warnings in dev mode with -02 because of some BUG_ON tests
- BUG/MINOR: mux-h1: Fix tunnel mode detection on the response path
- BUG/MINOR: http-ana: Properly catch aborts during the payload forwarding
- DOC: Update http-buffer-request description to remove the part about chunks
- BUG/MINOR: stream-int: Fix si_cs_recv() return value
- DOC: internal: document the init calls
- MEDIUM: dns: Add resolve-opts "ignore-weight"
- MINOR: ssl: ssl_sock_prepare_ctx() return an error code
- MEDIUM: ssl/cli: apply SSL configuration on SSL_CTX during commit
- MINOR: ssl/cli: display warning during 'commit ssl cert'
- MINOR: version: report the version status in "haproxy -v"
- MINOR: version: emit the link to the known bugs in output of "haproxy -v"
- DOC: Add documentation about the use-service action
- MINOR: ssl: fix possible null dereference in error handling
- BUG/MINOR: ssl: fix curve setup with LibreSSL
- BUG/MINOR: ssl: Stop passing dynamic strings as format arguments
- CLEANUP: ssl: check if a transaction exists once before setting it
- BUG/MINOR: cli: fix out of bounds in -S parser
- MINOR: ist: add ist_find_ctl()
- BUG/MAJOR: h2: reject header values containing invalid chars
- BUG/MAJOR: h2: make header field name filtering stronger
- BUG/MAJOR: mux-h2: don't try to decode a response HEADERS frame in idle state
- MINOR: h2: add a function to report H2 error codes as strings
- MINOR: mux-h2/trace: report the connection and/or stream error code
- SCRIPTS: create-release: show the correct origin name in suggested commands
- SCRIPTS: git-show-backports: add "-s" to proposed cherry-pick commands
- BUG/MEDIUM: trace: fix a typo causing an incorrect startup error
- BUILD: reorder the objects in the makefile
- DOC: mention in INSTALL haproxy 2.1 is a stable stable version
- MINOR: version: indicate that this version is stable
Willy Tarreau [Mon, 25 Nov 2019 18:27:55 +0000 (19:27 +0100)]
MINOR: version: indicate that this version is stable
Also indicate that it will get fixes till ~Q1 2021.
Willy Tarreau [Mon, 25 Nov 2019 18:19:24 +0000 (19:19 +0100)]
DOC: mention in INSTALL haproxy 2.1 is a stable stable version
Let's switch back to the stable wording now.
Willy Tarreau [Mon, 25 Nov 2019 18:03:59 +0000 (19:03 +0100)]
BUILD: reorder the objects in the makefile
After a number of reorganization, addition of fcgi and the removal of
the legacy mode, some late files ended up being slow to build and were
slowing down the parallel build. Let's reorder them based on the build
time. Full build went down from 8.3-9.2s to 6.8s.
Willy Tarreau [Mon, 25 Nov 2019 18:43:31 +0000 (19:43 +0100)]
BUG/MEDIUM: trace: fix a typo causing an incorrect startup error
Since commit 88ebd40 ("MINOR: trace: add allocation of buffer-sized
trace buffers") we have a trace buffer allocated at boot time. But
there was a copy-paste error there making the test verify that the
trash was allocated instead of the trace buffer. The result is that
depending on the link order either the test will succeed or fail,
preventing haproxy from starting at all.
No backport is needed.
Willy Tarreau [Mon, 25 Nov 2019 14:51:47 +0000 (15:51 +0100)]
SCRIPTS: git-show-backports: add "-s" to proposed cherry-pick commands
Since we're using signed-off-by tags for backports, let's add -s to
the command so that we can finally copy-paste it!
Willy Tarreau [Mon, 25 Nov 2019 14:49:31 +0000 (15:49 +0100)]
SCRIPTS: create-release: show the correct origin name in suggested commands
create-release shows the next steps at the end and suggest to use
"git push origin master" but on my machine it's not "origin" so let's
determine it using git config and only use origin as a fall back.
Willy Tarreau [Sun, 24 Nov 2019 13:57:00 +0000 (14:57 +0100)]
MINOR: mux-h2/trace: report the connection and/or stream error code
We were missing the error code when tracing a call to h2s_error() or
h2c_error(), let's report it when it's set.
Willy Tarreau [Sun, 24 Nov 2019 13:56:03 +0000 (14:56 +0100)]
MINOR: h2: add a function to report H2 error codes as strings
Just like we have frame type to string, let's have error to string to
improve debugging and traces.
Willy Tarreau [Sun, 24 Nov 2019 13:57:53 +0000 (14:57 +0100)]
BUG/MAJOR: mux-h2: don't try to decode a response HEADERS frame in idle state
Christopher found another issue in the H2 backend implementation that
results from a miss in the H2 spec: the processing of a HEADERS frame
is always permitted in IDLE state, but this doesn't make sense on the
response path! And here when facing such a frame, we try to decode it
while we didn't allocate any stream, so we end up trying to fill the
idle stream's buffer (read-only) and crash.
What we're doing here is that if we get a HEADERS frame in IDLE state
from a server, we terminate the connection with a PROTOCOL_ERROR. No
such transition seems to be permitted by the spec but it seems to be
the only sane solution.
This fix must be backported as far as 1.9. Note that in 2.0 and earlier
there's no h2_frame_check_vs_state() function, instead the check is
inlined in h2_process_demux().
Willy Tarreau [Sun, 24 Nov 2019 09:34:39 +0000 (10:34 +0100)]
BUG/MAJOR: h2: make header field name filtering stronger
Tim Düsterhus found that the amount of sanitization we perform on HTTP
header field names received in H2 is insufficient. Currently we reject
upper case letters as mandated by RFC7540#8.1.2, but section 10.3 also
requires that intermediaries translating streams to HTTP/1 further
refine the filtering to also reject invalid names (which means any name
that doesn't match a token). There is a small trick here which is that
the colon character used to start pseudo-header names doesn't match a
token, so pseudo-header names fall into that category, thus we have to
swap the pseudo-header name lookup with this check so that we only check
from the second character (past the ':') in case of pseudo-header names.
Another possibility could have been to perform this check only in the
HTX-to-H1 trancoder but doing would still expose the configured rules
and logs to such header names.
This fix must be backported as far as 1.8 since this bug could be
exploited and serve as the base for an attack. In 2.0 and earlier,
functions h2_make_h1_request() and h2_make_h1_trailers() must also
be adapted to sanitize requests coming in legacy mode.
Willy Tarreau [Fri, 22 Nov 2019 15:02:43 +0000 (16:02 +0100)]
BUG/MAJOR: h2: reject header values containing invalid chars
Tim Düsterhus reported an annoying problem in the H2 decoder related to
an ambiguity in the H2 spec. The spec says in section 10.3 that HTTP/2
allows header field values that are not valid (since they're binary) and
at the same time that an H2 to H1 gateway must be careful to reject headers
whose values contain \0, \r or \n.
Till now, and for the sake of the ability to maintain end-to-end binary
transparency in H2-to-H2, the H2 mux wouldn't reject this since it does
not know what version will be used on the other side.
In theory we should in fact perform such a check when converting an HTX
header to H1. But this causes a problem as it means that all our rule sets,
sample fetches, captures, logs or redirects may still find an LF in a header
coming from H2. Also in 2.0 and older in legacy mode, the frames are instantly
converted to H1 and HTX couldn't help there. So this means that in practice
we must refrain from delivering such a header upwards, regardless of any
outgoing protocol consideration.
Applying such a lookup on all headers leaving the mux comes with a
significant performance hit, especially for large ones. A first attempt
was made at placing this into the HPACK decoder to refrain from learning
invalid literals but error reporting becomes more complicated. Additional
tests show that doing this within the HTX transcoding loop benefits from
the hot L1 cache, and that by skipping up to 8 bytes per iteration the
CPU cost remains within noise margin, around ~0.5%.
This patch must be backported as far as 1.8 since this bug could be
exploited and serve as the base for an attack. In 2.0 and earlier the
fix must also be added to functions h2_make_h1_request() and
h2_make_h1_trailers() to handle legacy mode. It relies on previous patch
"MINOR: ist: add ist_find_ctl()" to speed up the control bytes lookup.
All credits go to Tim for his detailed bug report and his initial patch.
Willy Tarreau [Fri, 22 Nov 2019 14:58:53 +0000 (15:58 +0100)]
MINOR: ist: add ist_find_ctl()
This new function looks for the first control character in a string (a
char whose value is between 0x00 and 0x1F included) and returns it, or
NULL if there is none. It is optimized for quickly evicting non-matching
strings and scans ~0.43 bytes per cycle. It can be used as an accelerator
when it's needed to look up several of these characters (e.g. CR/LF/NUL).
William Lallemand [Mon, 25 Nov 2019 08:58:37 +0000 (09:58 +0100)]
BUG/MINOR: cli: fix out of bounds in -S parser
Out of bounds when the number or arguments is greater than
MAX_LINE_ARGS.
Fix issue #377.
Must be backported in 2.0 and 1.9.
William Dauchy [Sun, 24 Nov 2019 14:04:20 +0000 (15:04 +0100)]
CLEANUP: ssl: check if a transaction exists once before setting it
trivial patch to fix issue #351
Fixes:
bc6ca7ccaa72 ("MINOR: ssl/cli: rework 'set ssl cert' as 'set/commit'")
Reported-by: Илья Шипицин <chipitsine@gmail.com>
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Tim Duesterhus [Sat, 23 Nov 2019 22:52:30 +0000 (23:52 +0100)]
BUG/MINOR: ssl: Stop passing dynamic strings as format arguments
gcc complains rightfully:
src/ssl_sock.c: In function ‘ssl_sock_prepare_all_ctx’:
src/ssl_sock.c:5507:3: warning: format not a string literal and no format arguments [-Wformat-security]
ha_warning(errmsg);
^
src/ssl_sock.c:5509:3: warning: format not a string literal and no format arguments [-Wformat-security]
ha_alert(errmsg);
^
src/ssl_sock.c: In function ‘cli_io_handler_commit_cert’:
src/ssl_sock.c:10208:3: warning: format not a string literal and no format arguments [-Wformat-security]
chunk_appendf(trash, err);
Introduced in
8b453912ce9a4e1a3b1329efb2af04d1e470852e.
Lukas Tribus [Sun, 24 Nov 2019 17:20:40 +0000 (18:20 +0100)]
BUG/MINOR: ssl: fix curve setup with LibreSSL
Since commit 9a1ab08 ("CLEANUP: ssl-sock: use HA_OPENSSL_VERSION_NUMBER
instead of OPENSSL_VERSION_NUMBER") we restrict LibreSSL to the OpenSSL
1.0.1 API, to avoid breaking LibreSSL every minute. We set
HA_OPENSSL_VERSION_NUMBER to 0x1000107fL if LibreSSL is detected and
only allow curves to be configured if HA_OPENSSL_VERSION_NUMBER is at
least 0x1000200fL.
However all relevant LibreSSL releases actually support settings curves,
which is now broken. Fix this by always allowing curve configuration when
using LibreSSL.
Reported on GitHub in issue #366.
Fixes: 9a1ab08 ("CLEANUP: ssl-sock: use HA_OPENSSL_VERSION_NUMBER instead
of OPENSSL_VERSION_NUMBER").
William Dauchy [Sat, 23 Nov 2019 20:14:33 +0000 (21:14 +0100)]
MINOR: ssl: fix possible null dereference in error handling
recent commit
8b453912ce9a ("MINOR: ssl: ssl_sock_prepare_ctx() return an error code")
converted all errors handling; in this patch we always test `err`, but
three of them are missing. I did not found a plausible explanation about
it.
this should fix issue #374
Fixes:
8b453912ce9a ("MINOR: ssl: ssl_sock_prepare_ctx() return an error code")
Reported-by: Илья Шипицин <chipitsine@gmail.com>
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Christopher Faulet [Fri, 22 Nov 2019 14:34:17 +0000 (15:34 +0100)]
DOC: Add documentation about the use-service action
The use-service action may be used in tcp-request and http-request rules. It was
added to customize HAproxy reply to a client using an applet (initially a lua
applet). But the documentation was missing.
This patch may be backported as far as 1.6.
Willy Tarreau [Thu, 21 Nov 2019 17:48:20 +0000 (18:48 +0100)]
MINOR: version: emit the link to the known bugs in output of "haproxy -v"
The link to the known bugs page for the current version is built and
reported there. When it is a development version (less than 2 dots),
instead a link to github open issues is reported as there's no way to
be sure about the current situation in this case and it's better that
users report their trouble there.
Willy Tarreau [Thu, 21 Nov 2019 17:07:30 +0000 (18:07 +0100)]
MINOR: version: report the version status in "haproxy -v"
As discussed on Discourse here:
https://discourse.haproxy.org/t/haproxy-branch-support-lifetime/4466
it's not always easy for end users to know the lifecycle of the version
they are using. This patch introduces a "Status" line in the output of
"haproxy -vv" indicating whether it's a development, stable, long-term
supported version, possibly with an estimated end of life for the branch
when it can be anticipated (e.g. for stable versions). This field should
be adjusted when creating a major release to reflect the new status.
It may make sense to backport this to other branches to clarify the
situation.
William Lallemand [Thu, 21 Nov 2019 15:41:07 +0000 (16:41 +0100)]
MINOR: ssl/cli: display warning during 'commit ssl cert'
Display the warnings on the CLI during a commit of the certificates.
William Lallemand [Thu, 21 Nov 2019 15:30:34 +0000 (16:30 +0100)]
MEDIUM: ssl/cli: apply SSL configuration on SSL_CTX during commit
Apply the configuration of the ssl_bind_conf on the generated SSL_CTX.
It's a little bit hacky at the moment because the ssl_sock_prepare_ctx()
function was made for the configuration parsing, not for being using at
runtime. Only the 'verify' bind keyword seems to cause a file access so
we prevent it before calling the function.
William Lallemand [Thu, 21 Nov 2019 14:48:10 +0000 (15:48 +0100)]
MINOR: ssl: ssl_sock_prepare_ctx() return an error code
Rework ssl_sock_prepare_ctx() so it fills a buffer with the error
messages instead of using ha_alert()/ha_warning(). Also returns an error
code (ERR_*) instead of the number of errors.
Daniel Corbett [Sun, 17 Nov 2019 14:48:56 +0000 (09:48 -0500)]
MEDIUM: dns: Add resolve-opts "ignore-weight"
It was noted in #48 that there are times when a configuration
may use the server-template directive with SRV records and
simultaneously want to control weights using an agent-check or
through the runtime api. This patch adds a new option
"ignore-weight" to the "resolve-opts" directive.
When specified, any weight indicated within an SRV record will
be ignored. This is for both initial resolution and ongoing
resolution.
Willy Tarreau [Wed, 20 Nov 2019 15:45:15 +0000 (16:45 +0100)]
DOC: internal: document the init calls
INITCALLs are used a lot in the code now and were not documented, resulting
in each user having to grep for functions in other files. This doc is not
perfect but aims at improving the situation. It documents what's been
available since 1.9 and may be backported there if it helps though it's
unlikely to be needed as it's mostly aimed at developers.
Christopher Faulet [Wed, 20 Nov 2019 15:42:00 +0000 (16:42 +0100)]
BUG/MINOR: stream-int: Fix si_cs_recv() return value
The previous patch on this function (
36b536d6c "BUG/MEDIUM: stream-int: Don't
loose events on the CS when an EOS is reported") contains a bug. The return
value is based on the conn-stream's flags. But it may be reset if the CS is
closed. Ironically it was exactly the purpose of this patch...
This patch must be backported to 2.0 and 1.9.
Christopher Faulet [Tue, 19 Nov 2019 15:27:25 +0000 (16:27 +0100)]
DOC: Update http-buffer-request description to remove the part about chunks
The limitation on the first chunk for chunked requests was true for the legacy
HTTP mode. But, it does not exist with the HTX. Becaue, the legacy HTTP mode was
removed in 2.1, this limitation does not exist anymore.
Christopher Faulet [Fri, 15 Nov 2019 10:29:40 +0000 (11:29 +0100)]
BUG/MINOR: http-ana: Properly catch aborts during the payload forwarding
When no data filter are registered on a channel, if the message length is known,
the HTTP payload is infinitely forwarded to save calls to process_stream(). When
we finally fall back again in XFER_BODY analyzers, we detect the end of the
message by checking channel flags. If CF_EOI or CF_SHUTR is set, we switch the
message in DONE state. For CF_EOI, it is relevant. But not for CF_SHUTR. a
shutdown for reads without the end of input must be interpreted as an abort for
messages with a known length.
Because of this bug, some aborts are not properly handled and reported. Instead,
we interpret it as a legitimate shutdown.
This patch must be backported to 2.0.
Christopher Faulet [Fri, 15 Nov 2019 10:14:23 +0000 (11:14 +0100)]
BUG/MINOR: mux-h1: Fix tunnel mode detection on the response path
There are two issues with the way tunnel mode is detected on the response
path. First, when a response with an unknown content length is handled, the
request is also switched in tunnel mode. It is obviously wrong. Because it was
done on the server side only (so not during the request parsing), it is no
noticeable effects.
The second issue is about the way protocol upgrades are handled. The request is
switched in tunnel mode from the time the 101 response is processed. So an
unfinished request may be switched in tunnel mode too early. It is not a common
use, but a protocol upgrade on a POST is allowed. Thus, parsing of the payload
may be hijacked. It is especially bad for chunked payloads.
Now, conditions to switch the request in tunnel mode reflect what should be
done. Especially for the second issue. We wait the end of the request to switch
it in tunnel mode.
This patch must be backported to 2.0 and 1.9. Note that these versions are only
affected by the second issue but the patch cannot be easily splitted.
Christopher Faulet [Mon, 18 Nov 2019 14:50:25 +0000 (15:50 +0100)]
BUILD: debug: Avoid warnings in dev mode with -02 because of some BUG_ON tests
Some BUG_ON() tests emit a warning because of a potential null pointer
dereference on an HTX block. In fact, it should never happen, but now, GCC is
happy.
This patch must be backported to 2.0.
Christopher Faulet [Tue, 19 Nov 2019 13:18:24 +0000 (14:18 +0100)]
MINOR: contrib/prometheus-exporter: Add a param to ignore servers in maintenance
By passing the parameter "no-maint" in the query-string, it is now possible to
ignore servers in maintenance. It means that the metrics for servers in this
state will not be exported.
Christopher Faulet [Mon, 18 Nov 2019 13:47:08 +0000 (14:47 +0100)]
MINOR: contrib/prometheus-exporter: filter exported metrics by scope
Now, the prometheus exporter parses the HTTP query-string to filter or to adapt
the exported metrics. In this first version, it is only possible select the
scopes of metrics to export. To do so, one or more parameters with "scope" as
name must be passed in the query-string, with one of those values: global,
frontend, backend, server or '*' (means all). A scope parameter with no value
means to filter out all scopes (nothing is returned). The scope parameters are
parsed in their appearance order in the query-string. So an empty scope will
reset all scopes already parsed. But it can be overridden by following scope
parameters in the query-string. By default everything is exported.
The filtering can also be done on prometheus scraping configuration, but general
aim is to optimise the source of data to improve load and scraping time. This is
particularly true for huge configuration with thousands of backends and servers.
Also note that this configuration was possible on the previous official haproxy
exporter but with even more parameters to select the needed metrics. Here we
thought it was sufficient to simply avoid a given type of metric. However, more
filters are still possible.
Thanks to William Dauchy. This patch is based on his work.
Christopher Faulet [Wed, 20 Nov 2019 10:56:33 +0000 (11:56 +0100)]
BUG/MEDIUM: stream-int: Don't loose events on the CS when an EOS is reported
In si_cs_recv(), when a shutdown for reads is handled, the conn-stream may be
closed. It happens when the ouput channel is closed for writes or if
SI_FL_NOHALF is set on the stream-interface. In this case, conn-stream's flags
are reset. Thus, if an error (CS_FL_ERROR) or an end of input (CS_FL_EOI) is
reported by the mux, the event is lost. si_cs_recv() does not report these
events by itself. It relies on si_cs_process() to report them to the
stream-interface and/or the channel.
For instance, if CS_FL_EOS and CS_FL_EOI are set by the H1 multiplexer during a
call to si_cs_recv() on the server side, if the conn-stream is closed (read0 +
SI_FL_NOHALF), the CS_FL_EOI flag is lost. Thus, this may lead the stream to
interpret it as a server abort.
Now, conn-stream's flags are processed at the end of si_cs_recv(). The function
is responsible to set the right flags on the stream-interface and/or the
channel. Due to this patch, the function is now almost linear. Except some early
checks at the beginning, there is only one return statement. It also fixes a
potential bug because of an inconsistency between the splicing and the buffered
receipt. On the first case, CS_FL_EOS if handled before errors on the connection
or the conn-stream. On the second one, it is the opposite.
This patch must be backported to 2.0 and 1.9.
Eric Salama [Wed, 20 Nov 2019 10:33:40 +0000 (11:33 +0100)]
BUILD/MINOR: ssl: fix compiler warning about useless statement
There is a compiler warning after commit
a9363eb6 ("BUG/MEDIUM: ssl:
'tune.ssl.default-dh-param' value ignored with openssl > 1.1.1"):
src/ssl_sock.c: In function 'ssl_sock_prepare_ctx':
src/ssl_sock.c:4481:4: error: statement with no effect [-Werror=unused-value]
Fix it by adding a (void)
Frédéric Lécaille [Wed, 20 Nov 2019 10:17:30 +0000 (11:17 +0100)]
BUG/MINOR: peers: "peer alive" flag not reset when deconnecting.
The peer flags (->flags member of peer struct) are reset by __peer_session_deinit()
function. PEER_F_ALIVE flag which is used by the heartbeat part of the peer protocol
to mark a peer as being alive was not reset by this function. This simple patch adds
add the statement to this.
Note that, at this time, there was no identified issue due to this missing reset.
Must be backported to 2.0.
William Lallemand [Tue, 19 Nov 2019 16:04:18 +0000 (17:04 +0100)]
BUG/MEDIUM: mworker: don't fill the -sf argument with -1 during the reexec
Upon a reexec_on_failure, if the process tried to exit after the
initialization of the process structure but before it was filled with a
PID, the PID in the mworker_proc structure is set to -1.
In this particular case the -sf argument is filled with -1 and haproxy
will exit with the usage message because of that argument.
Should be backported in 2.0.
William Lallemand [Tue, 19 Nov 2019 14:51:51 +0000 (15:51 +0100)]
MINOR: ssl/cli: 'abort ssl cert' deletes an on-going transaction
This patch introduces the new CLI command 'abort ssl cert' which abort
an on-going transaction and free its content.
This command takes the name of the filename of the transaction as an
argument.
Frédéric Lécaille [Wed, 13 Nov 2019 16:50:34 +0000 (17:50 +0100)]
BUG/MINOR: peers: Wrong null "server_name" data field handling.
As the peers protocol expects to parse at least one encoded integer value for
each stick-table data field even when not configured on the local side,
about the "server_name" data field we must emit something even if it has
not been set (no server was configured for instance).
As this data field is made of first one encoded integer which is the length
of the remaining data (the dictionary cache entry), we encode the length 0
when emitting such an absent dictionary cache entry.
On the remote side, when we decode such an integer with 0 as value, we stop
parsing the data field and that's it.
Must be backported to 2.0.
Frédéric Lécaille [Thu, 7 Nov 2019 14:22:33 +0000 (15:22 +0100)]
MINOR: peers: Add debugging information to "show peers".
This patch adds three counters to help in debugging peers protocol issues
to "peer" struct:
->no_hbt counts the number of reconnection period without receiving heartbeat
->new_conn counts the number of reconnections after ->reconnect timeout expirations.
->proto_err counts the number of protocol errors.
Frédéric Lécaille [Wed, 6 Nov 2019 10:51:26 +0000 (11:51 +0100)]
MINOR: peers: Add TX/RX heartbeat counters.
Add RX/TX heartbeat counters to "peer" struct to have an idead about which
peer is alive or not.
Dump these counters values on the CLI via "show peers" command.
Frédéric Lécaille [Wed, 6 Nov 2019 09:41:03 +0000 (10:41 +0100)]
MINOR: peers: Alway show the table info for disconnected peers.
This patch enable us to dump the stick-table information of remote or local peers
without already opened peer session. This may be the case also for the local peer
during synchronizations with an old processus (reload).
Emmanuel Hocdet [Mon, 4 Nov 2019 14:49:46 +0000 (15:49 +0100)]
BUG/MINOR: ssl: fix crt-list neg filter for openssl < 1.1.1
Certificate selection in client_hello_cb (openssl >= 1.1.1) correctly
handles crt-list neg filter. Certificate selection for openssl < 1.1.1
has not been touched for a while: crt-list neg filter is not the same
than his counterpart and is wrong. Fix it to mimic the same behavior
has is counterpart.
It should be backported as far as 1.6.
Emmanuel Hocdet [Mon, 4 Nov 2019 17:19:32 +0000 (18:19 +0100)]
BUG/MINOR: ssl: ssl_pkey_info_index ex_data can store a dereferenced pointer
With CLI cert update, sni_ctx can be removed at runtime. ssl_pkey_info_index
ex_data is filled with one of sni_ctx.kinfo pointer but SSL_CTX can be shared
between sni_ctx. Remove and free a sni_ctx can lead to a segfault when
ssl_pkey_info_index ex_data is used (in ssl_sock_get_pkey_algo). Removing the
dependency on ssl_pkey_info_index ex_data is the easiest way to fix the issue.
William Dauchy [Sun, 17 Nov 2019 14:47:16 +0000 (15:47 +0100)]
MINOR: init: avoid code duplication while setting identify
since the introduction of mworker, the setuid/setgid was duplicated in
two places; try to improve that by creating a dedicated function.
this patch does not introduce any functional change.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
William Dauchy [Sun, 17 Nov 2019 14:47:15 +0000 (15:47 +0100)]
BUG/MINOR: init: fix set-dumpable when using uid/gid
in mworker mode used with uid/gid settings, it was not possible to get
a coredump despite the set-dumpable option.
indeed prctl(2) manual page specifies the dumpable attribute is reverted
to `/proc/sys/fs/suid_dumpable` in a few conditions such as process
effective user and group are changed.
this patch moves the whole set-dumpable logic before the polling code in
order to catch all possible cases where we could have changed the
uid/gid. It however does not cover the possible segfault at startup.
this patch should be backported in 2.0.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Willy Tarreau [Fri, 15 Nov 2019 17:49:37 +0000 (18:49 +0100)]
[RELEASE] Released version 2.1-dev5
Released version 2.1-dev5 with the following main changes :
- BUG/MEDIUM: ssl/cli: don't alloc path when cert not found
- BUG/MINOR: ssl/cli: unable to update a certificate without bundle extension
- BUG/MINOR: ssl/cli: fix an error when a file is not found
- MINOR: ssl/cli: replace the default_ctx during 'commit ssl cert'
- DOC: fix date and http_date keywords syntax
- MINOR: peers: Add "log" directive to "peers" section.
- BUG/MEDIUM: mux-h1: Disable splicing for chunked messages
- BUG/MEDIUM: stream: Be sure to support splicing at the mux level to enable it
- MINOR: flt_trace: Rename macros to print trace messages
- MINOR: trace: Add a set of macros to trace events if HA is compiled with debug
- MEDIUM: stream/trace: Register a new trace source with its events
- MINOR: doc: http-reuse connection pool fix
- BUG/MEDIUM: stream: Be sure to release allocated captures for TCP streams
- MINOR: http-ana: Remove the unused function http_reset_txn()
- BUG/MINOR: action: do-resolve now use cached response
- BUG: dns: timeout resolve not applied for valid resolutions
- DOC: management: fix typo on "cache_lookups" stats output
- BUG/MINOR: stream: init variables when the list is empty
- BUG/MEDIUM: tasks: Make tasklet_remove_from_tasklet_list() no matter the tasklet.
- BUG/MINOR: queue/threads: make the queue unlinking atomic
- BUG/MEDIUM: Make sure we leave the session list in session_free().
- CLEANUP: session: slightly simplify idle connection cleanup logic
- MINOR: memory: also poison the area on freeing
- CLEANUP: cli: use srv_shutdown_streams() instead of open-coding it
- CLEANUP: stats: use srv_shutdown_streams() instead of open-coding it
- BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition
- BUILD: contrib/da: remove an "unused" warning
- BUG/MEDIUM: filters: Don't call TCP callbacks for HTX streams
- MEDIUM: filters: Adapt filters API to allow again TCP filtering on HTX streams
- MINOR: freq_ctr: Make the sliding window sums thread-safe
- MINOR: stream: Remove the lock on the proxy to update time stats
- MINOR: counters: Add fields to store the max observed for {q,c,d,t}_time
- MINOR: stats: Report max times in addition of the averages for sessions
- MINOR: contrib/prometheus-exporter: Report metrics about max times for sessions
- BUG/MINOR: contrib/prometheus-exporter: Rename some metrics
- MINOR: contrib/prometheus-exporter: report the number of idle conns per server
- DOC: Add missing stats fields in the management manual
- BUG/MINOR: mux-h1: Properly catch parsing errors on payload and trailers
- BUG/MINOR: mux-h1: Don't set CS_FL_EOS on a read0 when receiving data to pipe
- MINOR: mux-h1: Set EOI on the conn-stream when EOS is reported in TUNNEL state
- MINOR: sink: Set the default max length for a message to BUFSIZE
- MINOR: ring: make the parse function automatically set the handler/release
- BUG/MINOR: log: make "show startup-log" use a ring buffer instead
- MINOR: stick-table: allow sc-set-gpt0 to set value from an expression
Cédric Dufour [Wed, 6 Nov 2019 17:38:53 +0000 (18:38 +0100)]
MINOR: stick-table: allow sc-set-gpt0 to set value from an expression
Allow the sc-set-gpt0 action to set GPT0 to a value dynamically evaluated from
its <expr> argument (in addition to the existing static <int> alternative).
Willy Tarreau [Fri, 15 Nov 2019 14:16:57 +0000 (15:16 +0100)]
BUG/MINOR: log: make "show startup-log" use a ring buffer instead
The copy of the startup logs used to rely on a re-allocated memory area
on the fly, that would attempt to be delivered at once over the CLI. But
if it's too large (too many warnings) it will take time to start up, and
may not even show up on the CLI as it doesn't fit in a buffer.
The ring buffer infrastructure solves all this with no more code, let's
switch to this instead. It simply requires a parsing function to attach
the ring via ring_attach_cli() and all the rest is automatically handled.
Initially this was imagined as a code cleanup, until a test with a config
involving 100k backends and just one occurrence of
"load-server-state-from-file global" in the defaults section took approx
20 minutes to parse due to the O(N^2) cost of concatenating the warnings
resulting in ~1 TB of data to be copied, while it took only 0.57s with
the ring.
Ideally this patch should be backported to 2.0 and 1.9, though it relies
on the ring infrastructure which will then also need to be backported.
Configs able to trigger the bug are uncommon, so another workaround for
older versions without backporting the rings would consist in simply
limiting the size of the error message in print_message() to something
always printable, which will only return the first errors.
Willy Tarreau [Fri, 15 Nov 2019 14:07:21 +0000 (15:07 +0100)]
MINOR: ring: make the parse function automatically set the handler/release
ring_attach_cli() is called by the keyword parsing function to dump a
ring to the CLI. It can only work with a specific handler and release
function. Let's make it set them appropriately instead of having the
caller know these functions. This way adding a command to dump a ring
is as simple as declaring a parsing function calling ring_attach_cli().
Christopher Faulet [Fri, 15 Nov 2019 14:10:12 +0000 (15:10 +0100)]
MINOR: sink: Set the default max length for a message to BUFSIZE
It was set to MAX_SYSLOG_LEN (1K). It is a bit short to print debug
traces. Especially when part of a buffers is dump. Now, the maximum length is
set to BUFSIZE (16K).
Christopher Faulet [Fri, 15 Nov 2019 08:50:22 +0000 (09:50 +0100)]
MINOR: mux-h1: Set EOI on the conn-stream when EOS is reported in TUNNEL state
It could help to distinguish client/server aborts from legitimate shudowns for
reads.
Christopher Faulet [Fri, 15 Nov 2019 08:41:32 +0000 (09:41 +0100)]
BUG/MINOR: mux-h1: Don't set CS_FL_EOS on a read0 when receiving data to pipe
This is mandatory to process input one more time to add the EOM in the HTX
message and to set CS_FL_EOI on the conn-stream. Otherwise, in the stream, a
SHUTR will be reported on the corresponding channel without the EOI. It may be
erroneously interpreted as an abort.
This patch must be backported to 2.0 and 1.9.
Christopher Faulet [Fri, 15 Nov 2019 08:36:28 +0000 (09:36 +0100)]
BUG/MINOR: mux-h1: Properly catch parsing errors on payload and trailers
Errors during the payload or the trailers parsing are reported with the
HTX_FL_PARSING_ERROR flag on the HTX message and not a negative return
value. This change was introduced when the fonctions to convert an H1 message to
HTX one were moved to a dedicated file. But the h1 mux was not fully updated
accordingly.
No backport needed except if the commits about file h1_htx.c are backported.
Christopher Faulet [Fri, 8 Nov 2019 14:27:27 +0000 (15:27 +0100)]
DOC: Add missing stats fields in the management manual
Following fields was missing : srv_icur, src_ilim, qtime_max, ctime_max,
rtime_max and ttime_max.
Christopher Faulet [Fri, 8 Nov 2019 14:24:32 +0000 (15:24 +0100)]
MINOR: contrib/prometheus-exporter: report the number of idle conns per server
This adds two extra metrics per server, one for the current number of idle
connections and one for the configured limit :
* haproxy_server_idle_connections_current
* haproxy_server_idle_connections_limit
Christopher Faulet [Fri, 8 Nov 2019 14:12:29 +0000 (15:12 +0100)]
BUG/MINOR: contrib/prometheus-exporter: Rename some metrics
The following metrics have been renamed without the "_http" part :
* http_queue_time_average_seconds => queue_time_average_seconds
* http_connect_time_average_seconds => connect_time_average_seconds
* http_response_time_average_seconds => response_time_average_seconds
* http_total_time_average_seconds => total_time_average_seconds
These metrics are reported per backend and per server and are not specific to
HTTP sessions.