Repositories - haproxy-3.1.git/log

BUG/MINOR: server: fix the "server-template" prefix memory leak

The srv->tmpl_info.prefix was not freed in srv_free_params().

This could be backported to all stable versions.

(cherry picked from commit 0ae7a5d672f61cd4a949bf081b61857f6bbad476)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: server: properly initialize PROXY v2 TLVs

The PROXY v2 TLVs were not properly initialized when defined with
"set-proxy-v2-tlv-fmt" keyword, which could have caused a crash when
validating the configuration or malfunction (e.g. when used in
combination with "server-template" and/or "default-server").

The issue was introduced with commit 6f4bfed3a ("MINOR: server: Add
parser support for set-proxy-v2-tlv-fmt").

This should be backported up to 2.9.

(cherry picked from commit 6838fe43a320cf090892451ee907967666b626e5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h2: always trim leading and trailing LWS in header values

Annika Wickert reported some occasional disconnections between haproxy
and varnish when communicating over HTTP/2, with varnish complaining
about protocol errors while captures looked apparently normal. Nils
Goroll managed to reproduce this on varnish by injecting the capture of
the outgoing haproxy traffic and noticed that haproxy was forwarding a
header value containing a trailing space, which is now explicitly
forbidden since RFC9113.

It turns out that the only way for such a header to pass through haproxy
is to arrive in h2 and not be edited, in which case it will arrive in
HTX with its undesired spaces. Since the code dealing with HTX headers
always trims spaces around them, these are not observable in dumps, but
only when started in debug mode (-d). Conversions to/from h1 also drop
the spaces.

With this patch we trim LWS both on input and on output. This way we
always present clean headers in the whole stack, and even if some are
manually crafted by the configuration or Lua, they will be trimmed on
the output.

This must be backported to all stable versions.

Thanks to Annika for the helpful capture and Nils for the help with
the analysis on the varnish side!

(cherry picked from commit bbf824933f71eca90b5f07a51fa93f4fa7ac2256)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper

The stream dump function is called from signal handlers (warning, show
threads, panic). It makes use of read_freq_ctr() which might possibly
block if it tries to access a locked freq_ctr in the process of being
updated, e.g. by the current thread.

Here we're relying on the non-blocking API instead. It may return incorrect
values (typically smaller ones after resetting the curr counter) but at
least it will not block.

This needs to be backported to stable versions along with the previous
commit below:

MINOR: freq_ctr: provide non-blocking read functions

At least 3.1 is concerned as the warnings tend to increase the risk of
this situation appearing.

(cherry picked from commit 3c22fa315bcb2945d588cb64302f6ba5c89b382e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: freq_ctr: provide non-blocking read functions

Some code called by the debug handlers in the context of a signal handler
accesses to some freq_ctr and occasionally ends up on a locked one from
the same thread that is dumping it. Let's introduce a non-blocking version
that at least allows to return even if the value is in the process of being
updated, it's less problematic than hanging.

(cherry picked from commit 29e246a84ce27af63779d98b305ad53877ae9acc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stream: never allocate connection addresses from signal handler

In __strm_dump_to_buffer(), we call conn_get_src()/conn_get_dst() to try
to retrieve the connection's IP addresses. But this function may be called
from a signal handler to dump a currently running stream, and if the
addresses were not allocated yet, a poll_alloc() will be performed while
we might possibly already be running pools code, resulting in pool list
corruption.

Let's just make sure we don't call these sensitive functions there when
called from a signal handler.

This must be backported at least to 3.1 and ideally all other versions,
along with this previous commit:

MINOR: tinfo: add a new thread flag to indicate a call from a sig handler

(cherry picked from commit 84d4c948fce4b31220bb9a30cbf676613bbbf4f2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: tinfo: add a new thread flag to indicate a call from a sig handler

Signal handlers must absolutely not change anything, but some long and
complex call chains may look innocuous at first glance, yet result in
some subtle write accesses (e.g. pools) that can conflict with a running
thread being interrupted.

Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when
entering a signal handler and cleared when leaving them. Note, we're
speaking about real signal handlers (synchronous ones), not deferred
ones. This will allow some sensitive call places to act differently
when detecting such a condition, and possibly even to place a few new
BUG_ON().

(cherry picked from commit ddd173355c9c7452ff6ec317c8be6195d25dba2a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()

This function may be called from a signal handler during a warning,
a panic or a show thread. We need to be more cautious about what may
or may not be dereferenced since an h1s is not necessarily fully
initialized. Loops of "show threads" sometimes manage to crash when
dereferencing a null h1s->sd, so let's guard it and add a comment
remining about the unusual call place.

This can be backported to the relevant versions.

(cherry picked from commit a56dfbdcb4cb3eb9ffd3db641efb3e5605a6c3f0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()

co_data() was instrumented to detect cases where c->output > data and
emits a warning if that's not correct. The problem is that it happens
quite a bit during "show threads" if it interrupts traffic anywhere,
and that in some environments building with -DDEBUG_STRICT_ACTION=3,
it will kill the process.

Let's just open-code the channel functions that make access to co_data(),
there are not that many and the operations remain very simple.

This can be backported to 3.1. It didn't trigger in earlier versions
because they didn't have this CHECK_IF_HOT() test.

(cherry picked from commit 9d5bd47634707963ac6be4fc90e64feded174d0a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: clock: always use atomic ops for global_now_ms

global_now_ms is shared between threads so we must give hint to the
compiler that read/writes operations should be performed atomically.

Everywhere global_now_ms was used, atomic ops were used, except in
clock_update_global_date() where a read was performed without using
atomic op. In practise it is not an issue because on most systems
such reads should be atomic already, but to prevent any confusion or
potential bug on exotic systems, let's use an explicit _HA_ATOMIC_LOAD
there.

This may be backported up to 2.8

(cherry picked from commit 97a19517ffe3438562f80c314f5b6f3f27df7668)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers

When the connection for sink_forward_{oc}_applet fails or a previous one
is destroyed, the sft->appctx is instantly released.

However process_sink_forward_task(), which may run at any time, iterates
over all known sfts and tries to create sessions for orphan ones.

It means that instantly after sft->appctx is destroyed, a new one will
be created, thus a new connection attempt will be made.

It can be an issue with tcp log-servers or sink servers, because if the
server is unavailable, process_sink_forward() will keep looping without
any temporisation until the applet survives (ie: connection succeeds),
which results in unexpected CPU usage on the threads responsible for
that task.

Instead, we add a tempo logic so that a delay of 1second is applied
between two retries. Of course the initial attempt is not delayed.

This could be backported to all stable versions.

(cherry picked from commit 9561b9fb6964af325a10e7128b563114f144a3cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: log: fix outgoing abns address family

While reviewing the code in an attempt to fix GH #2875, I stumbled
on another case similar to aac570c ("BUG/MEDIUM: uxst: fix outgoing
abns address family in connect()") that caused abns(z) addresses to
fail when used as log targets.

The underlying cause is the same as aac570c, which is the rework of the
unix socket families in order to support custom addresses for different
adressing schemes, where a real_family() was overlooked before passing
a haproxy-internal address struct to socket-oriented syscall.

To fix the issue, we first copy the target's addr, and then leverage
real_family() to set the proper low-level address family that is passed
to sendmsg() syscall.

It should be backported in 3.1

(cherry picked from commit c9d41927266849208508b144ef15809a3a15c6cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: uxst: fix outgoing abns address family in connect()

Since we reworked the unix socket families in order to support custom
addresses for different addressing schemes, we've been using extra
values for the ss_family field in sockaddr_storage. These ones have
to be adjusted before calling bind() or connect(). It turns out that
after the abns/abnsz updates in 3.1, the connect() code was not adjusted
to take care of the change, resulting in AF_CUST_ABNS or AF_CUST_ABNSZ
to be placed in the address that was passed to connect().

The right approach is to locally copy the address, get its length,
fixup the family and use the fixed value and length for connect().

This must be backported to 3.1. Many thanks for @Mewp for reporting
this issue in github issue #2875.

(cherry picked from commit aac570cd0384f7fb942e7eb1eeb8c7d0605f50e2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers

When "peers" keyword is followed by more than one argument and it's the first
"peers" section in the config, cfg_parse_peers() detects it and exits with
"ERR_ALERT|ERR_FATAL" err_code.

So, upper layer parser, parse_cfg(), continues and parses the next keyword
"peer" and then he tries to check the global cfg_peers, which should contain
"my_cluster". The global cfg_peers is still NULL, because after alerting a user
in alertif_too_many_args, cfg_parse_peers() exited.

peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000

In order to fix this, let's add ERR_ABORT, if "peers" keyword is followed by
more than one argument. Like this parse_cfg() will stops immediately and
terminates haproxy with "too many args for peers my_cluster..." alert message.

It's more reliable, than add checks "if (cfg_peers !=NULL)" in "peer"
subparser, as we may have many "peers" sections.

peers my_another_cluster
peer haproxy1 1.1.1.2 1000

peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000

In addition, for the example above, parse_cfg() will parse all configuration
until the end and only then terminates haproxy with the alert
"too many args...". Peer haproxy1 will be wrongly associated with
my_another_cluster.

This fixes the issue #2872.
This should be backported in all stable versions.

(cherry picked from commit 390df282c1ac4605273abfeb82c97fad205b7294)
Signed-off-by: Willy Tarreau <w@1wt.eu>

[RELEASE] Released version 3.1.5

Released version 3.1.5 with the following main changes :
- BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room
- BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK

BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK

In the SPOP protocol, ACK frame with empty payload are allowed. However, in
that case, because only the payload is transferred, there is no data to
return to the SPOE applet. Only the end of input is reported. Thus the
applet is never woken up. It means that the SPOE filter will be blocked
during the processing timeout and will finally return an error.

To workaournd this issue, a NOOP action is introduced with the value 0. It
is only an internal action for now. It does not exist in the SPOP
protocol. When an ACK frame with an empy payload is received, this noop
action is transferred to the SPOE applet, instead of nothing. Thanks to this
trick, the applet is properly notified. This works because unknown actions
are ignored by the SPOE filter.

This patch must be backported to 3.1.

(cherry picked from commit 851e52b551f80f319b0de1817fc91b7f3949fc67)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room

The commit 7214dcd52 ("BUG/MEDIUM: applet: Don't pretend to have more data
to handle EOI/EOS/ERROR") introduced a regression. Because of this patch, it
was possible to handle EOI/EOS/ERROR applet flags too early while the applet
was waiting for more room to transfer the last output data.

This bug can be encountered with any applet using its own buffers (cache and
stats for instance). And depending on the configuration and the timing, the
data may be truncated or the stream may be blocked, infinitely or not.
Streams blocked infinitely were observed with the cache applet and the HTTP
compression enabled.

For the record, it is important to detect EOI/EOS/ERROR applet flags to be
able to report the corresponding event on the SE and by transitivity on the
SC. Most of time, this happens when some data should be transferred to the
stream. The .rcv_buf callback function is called and these flags are
properly handled. However, some applets may also report them spontaneously,
outside of any data transfer. In that case, the .rcv_buf callback is not
called.

It is the purpose of this patch (and the one above). Being able to detect
pending EOI/EOS/ERROR applet flags. However, we must be sure to not handle
them too early at this place. When these flags are set, it means no more
data will be produced by the applet. So we must only wait to have
transferred everything to the stream. And this happens when the applet is no
longer waiting for more room.

This patch must be backported to 3.1 with the one above.

(cherry picked from commit efc46de29457b93f75ae4b581f70a9df8362d6ca)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.1.4

Released version 3.1.4 with the following main changes :
    - BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
    - BUG/MEDIUM: debug: close a possible race between thread dump and panic()
    - BUG/MINOR: quic: reserve length field for long header encoding
    - BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
    - BUG/MINOR: mworker: section ignored in discovery after a post_section_parser
    - BUG/MINOR: mworker: post_section_parser for the last section in discovery
    - BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
    - BUG/MEDIUM: cli: Be sure to drop all input data in END state
    - BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker
    - BUG/MEDIUM: filters: Handle filters registered on data with no payload callback
    - BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
    - BUG/MINOR: quic: prevent crash on conn access after MUX init failure
    - BUG/MINOR: mux-quic: prevent crash after MUX init failure
    - REGTESTS: Fix truncated.vtc to send 0-CRLF
    - BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
    - BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types
    - MINOR: quic: rename pacing_rate cb to pacing_inter
    - MINOR: mux-quic: increment pacing retry counter on expired
    - MEDIUM: quic: implement credit based pacing
    - MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
    - MEDIUM: quic: use dynamic credit for pacing
    - MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
    - MINOR: quic: adapt credit based pacing to BBR
    - MINOR: epoll: permit to mask certain specific events
    - BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
    - DEBUG: fd: add a counter of takeovers of an FD since it was last opened
    - MINOR: fd: add a generation number to file descriptors
    - DEBUG: epoll: store and compare the FD's generation count with reported event
    - MEDIUM: epoll: skip reports of stale file descriptors
    - BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
    - DOC: htx: clarify <mark> parameter for htx_xfer_blks()
    - BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
    - BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval
    - BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
    - BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler
    - BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR
    - BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
    - MINOR: flt-spoe: Report end of input immediately after applet init
    - MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
    - MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
    - BUG/MINOR: cli: Don't set SE flags from the cli applet
    - BUG/MINOR: cli: Fix memory leak on error for _getsocks command
    - BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
    - BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive
    - BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
    - BUG/MINOR: flt-trace: Support only one name option
    - BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
    - DOC: option redispatch should mention persist options

DOC: option redispatch should mention persist options

"option redispatch" remains vague in which cases a session would persist;
let's mention "option persist" and "force-persist" as an example so folks
don't draw the conclusion that this may be default.

Should be backported to stable branches.

(cherry picked from commit 5926fb78233ab4ba95b07db3815ee62f5cbc5082)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer

A JSON integer is defined in the range [-(2**53)+1, (2**53)-1]. Macro are used
to define the minimum and the maximum value, The minimum one is defined using
the maximum one. So JSON_INT_MAX must be defined as a signed integer value to
avoid wrong cast of JSON_INT_MIN.

It was reported by Coverity in #2841: CID 1587769.

This patch could be backported to all stable versions.

(cherry picked from commit d48b5add889db1bf2f0fae4721abb46413303d33)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: flt-trace: Support only one name option

When a trace filter is defined, only one 'name' option is expected. But it
was not tested. Thus it was possible to set several names leading to a
memory leak.

It is now tested, and it is not allowed to redefine the trace filter name.

It was reported by Coverity in #2841: CID 1587768.

This patch could be backported to all stable versions.

(cherry picked from commit b20e2c96cfea06776db3200614f6d94441f15c2d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: auth: Fix a leak on error path when parsing user's groups

In a userlist section, when a user is parsed, if a specified group is not
found, an error is reported. In this case we must take care to release the
alredy built groups list.

It was reported by Coverity in #2841: CID 1587770.

This patch could be backported to all stable versions.

(cherry picked from commit a7f513af9187444680e3e054e0a13f818aed3307)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive

When a group is defined in a userlist section, only one 'users' option is
expected. But it was not tested. Thus it was possible to set several options
leading to a memory leak.

It is now tested, and it is not allowed to redefine the users option.

It was reported by Coverity in #2841: CID 1587771.

This patch could be backported to all stable versions.

(cherry picked from commit a1e14d2a8272511c29d9225a61c74dc89847287d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()

In _getsocks() functuoin, when we failed to set the unix socket in
non-blocking mode, a goto to "out" label led to loop infinitly. To fix the
issue, we must only let the function exit.

This patch should be backported to all stable versions.

(cherry picked from commit 75e8c8ed330f2ad1b6b33630efebb5041af3d5e9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Fix memory leak on error for _getsocks command

Some errors in parse function of _getsocks commands were not properly handled
and immediately returned, leading to a memory leak on cmsgbuf and tmpbuf
buffers.

To fix the issue, instead of immediately return with -1, we jump to "out"
label. Returning 1 intead of -1 in that case is valid.

This was reported by Coverity in #2841: CIDs 1587773 and 1587772.

This patch should be backported as far as 2.4.

(cherry picked from commit 372cc696d44e6853b9f7920f1c2d965736029764)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Don't set SE flags from the cli applet

Since the CLI was updated to use the new applet API, it should no longer set
directly the SE flags. Instead, the corresponding applet flags must be set,
using the applet API (appet_set_*). It is true for the CLI I/O handler but also
for the commands parse function and I/O callback function.

This patch should be backported as far as 3.0.

(cherry picked from commit 7e927243b9e86cee13ea9c16077a8523e79e41bb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only

The SPOP_CF_ERROR flag is now set on connection error only. It was also set
on some demux failures. But it is not mandatory because the connection is
closed anyway. And it is handy to have a flag dedicated to tcp connection
error. It was the original purpose of this flag.

This patch could be backported to 3.1 to ease future backports.

(cherry picked from commit 514a912a4d15c834a034be8ea7675c3ae0822080)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream

The spop stream now reports the end of input when the ACK is transferred to
the SPOE applet. To do so, the flag SPOP_SF_ACK_RCVD was added. It is set on
the SPOP stream when its ACK is received by the SPOP connection.

In addition when SPOP stream flags are propagated to the SE, the error is
now reported if end of input was not reached instead of testing the
connection error code. It is more accurate.

This patch should be backported to 3.1.

(cherry picked from commit d16c534511ca9e75437839bf654d45faf2992e0e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: flt-spoe: Report end of input immediately after applet init

The SPOE applet forwards the message that must be sent to agent during its
init stage. So just after it is created. When it is performed, the end of
input must be reported because no more data will be forwarded. However, it
was performed after receiving the ACK response. It is harmless, but there is
no reason to delay the EOI. It is now fixed.

This patch must be backported to 3.1.

(cherry picked from commit f7e571859629b40defb37ce3d4331a571ba619ec)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet

The previous fix ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed an issue with the way the SPOE applet was
reporting the end of stream, leading to never shut the applet down.

In fact, there is two bug in one. The first one is about the applet
shutdown. Since the fix above, the applet is no longer closed. Before, it
was closed because it was reported in error. But now, it is just delayed
because the applet and the SPOP stream are declared to support half close
connections. So the applet is only closed when the SPOP connection is
closed. To fix this bug, both side are now stating that half close
connections are not supported.

The second bug is about the way the end of stream is reported. It is
reported when the ACK response is received. But it is too early, because the
parent stream must process the response first. So now, we take care to have
processed the ACK from the parent applet before reporting an end of stream.

This patch must be backported with the commit above to 3.1.

(cherry picked from commit 38aac2c7bcfcce040eabd5e6df9b5c2f783a821a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR

The way appctx EOI/EOS/ERROR flags were reported for applets using the new
API were to state the applet had more data to deliver. But it was not
correct and for APPCTX_FL_EOS, this led to report an error on the SE because
it is not expected. More data to deliver and an end of stream is an
impossible situation.

This was added as a fix by commit b8ca114031 ("BUG/MEDIUM: applet: State
appctx have more data if its EOI/EOS/ERROR flag is set"), mainly to make the
SPOE applet work.

When an applet set one of these flags, it really means it has no more data
to deliver. So we must not try to trigger a new receive to handle these
flags. Instead we must handle them directly in task_process_applet()
function and only if the corresponding SE flags were not already set.

This patch must be backported to 3.1.

(cherry picked from commit 7214dcd52d52ab7fe6ef391aa8039fb3ec40a040)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler

The SPOE applet is using the new applet API. Thus end of input, end of
stream and errors must be reported using the applet flags, not the SE
flags. This was not the case. So let's fix it.

It seems this bug is harmless for now.

This patch must be backported to 3.1.

(cherry picked from commit db504fbdbee0afde435910510507212a5a751cba)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it

When a GET/HEAD/OPTIONS/DELETE healthcheck request was formatted, we claimed
there was a "content-length" header set even when there was no payload,
leading to actually send a "content-length: 0" header to the server. It was
unexpected and could be rejected by servers.

When a healthcheck request is sent we must take care to state there is a
"content-length" header when it is explicitly added.

This patch should fix the issue #2851. It must be backported as far as 2.9.

(cherry picked from commit fad68cb16d7f99acd2b327ff2f8a4d9ab88a68d8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval

When the tcp-response content ruleset evaluation is delayed because of an
ACL condition, the close forwarding on the client side is not explicitly
blocked. So it is possible to close the client side before the end of the
response evaluation.

To fix the issue, this is now done in all cases where some data are
missing. Concretely, channel_dont_close() is called in "missing_data" goto
label.

Note it is only a theorical bug (or pending bug). It is not possible to
trigger it for now because an ACL cannot wait for more data when a close was
received. But the code remains a bit weak. It is safer this way. It is
especially mandatory for the "force yield" option that should be added soon.

This patch could be backported to all stable versions.

(cherry picked from commit 04bbfa4354800b893f03294bd02475a16d90ec85)
[wt: trivial ctx adjustment]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records

A Read0 event could be ignored by the FCGI multiplexer if it is blocked on a
partial record. Instead of handling the event, it remained blocked, waiting
for the end of the record.

To fix the issue, the same solution than the H2 multiplexer is used. Two
flags are introduced. The first one, FCGI_CF_END_REACHED, is used to
acknowledge a read0. This flag is set when a read0 was received AND the FCGI
multiplexer must handle it. The second one, FCGI_CF_DEM_SHORT_READ, is set
when the demux is interrupted on a partial record. A short read and a read0
lead to set the FCGI_CF_END_REACHED flag.

With these changes, the FCGI mux should be able to properly handle read0 on
partial records.

This patch should be backported to all stable versions after a period of
observation.

(cherry picked from commit 5f927f603aae4614f82c73ff6c2e3324ecfdf506)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: htx: clarify <mark> parameter for htx_xfer_blks()

Clarify the fact that the first <mark> block is transferred before
stopping when using htx_xfer_blks()

(cherry picked from commit c17e02923256fef61f74b986045b7eb8d1e126c7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()

When transfering blocks from an src to another dst htx representation,
htx_xfer_blks() decreases the size of each block removed from the <count>
value passed in parameter, so it can't transfer more than <count>. The
size must also contains the metadata, represented by a simple
sizeof(struct htk_blk).

However, the code was doing a sizeof(dstblk) instead of a
sizeof(*dstblk) which as the consequence of removing only a size_t from
count. Fortunately htx_blk size is 64bits, so that does not provoke any
problem in 64bits. But on 32bits architecture, the count value is not
decreased correctly and the function could try to transfer more blocks
than allowed by the count parameter.

Must be backported in every stable release.

(cherry picked from commit c6390cdf9ce4e74540544207d6e3cfb31581eaea)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: epoll: skip reports of stale file descriptors

Now that we can see that some events are reported for older instances
of a file descriptor, let's skip these ones instead of reporting
dangerous events on them. It might possibly qualify as a bug if it
helps fixing strange issues in certain environments, in which case it
can make sense to backport it along with the following recent patches:

  DEBUG: fd: add a counter of takeovers of an FD since it was last opened
  MINOR: fd: add a generation number to file descriptors
  DEBUG: epoll: store and compare the FD's generation count with reported event

(cherry picked from commit 8235a24782e528b9bf8ca9dd69c0a147556dfcb5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: epoll: store and compare the FD's generation count with reported event

There have been some reported cases where races between threads in epoll
were causing wrong reports of close or error events. Since the epoll_event
data is 64 bits, we can store the FD's generation counter in the upper
bits to verify if we're speaking about the same instance of the FD as the
current one or a stale one. If the generation number does not match, then
we classify these into 3 conditions and increment the relevant COUNT_IF()
counters (stale report for closed FD, stale report of harmless event on
reopened FD, stale report of HUP/ERR on reopened FD). Tests have shown that
with heavy concurrency, a very small maxconn (typically 1 per thread),
http-reuse always and a server closing connections first but randomly
(httpterm with /C=2r), such events can happen at a pace of a few per second
for the closed FDs, and a few per minute for the other ones, so there's value
in leaving this accessible for troubleshooting. E.g after a few minutes:

  Count     Type Location function(): "condition" [comment]
  5541       CNT ev_epoll.c:296 _do_poll(): "1" [epoll report of event on a just closed fd (harmless)]
  10         CNT ev_epoll.c:294 _do_poll(): "1" [epoll report of event on a closed recycled fd (rare)]
  42         CNT ev_epoll.c:289 _do_poll(): "1" [epoll report of HUP on a stale fd reopened on the same thread (suspicious)]
  212        CNT ev_epoll.c:279 _do_poll(): "1" [epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)]
  1          CNT mux_h1.c:3911 h1_send(): "b_data(&h1c->obuf)" [connection error (send) with pending output data]

This one with the following setup, whicih abuses threads contention by
starting 64 threads on two cores:
- config:
    global
        nbthread 64
        stats socket /tmp/sock1 level admin
        stats timeout 1h
    defaults
        timeout client 5s
        timeout server 5s
        timeout connect 5s
        mode http
    listen p2
        bind :8002
        http-reuse always
        server s1 127.0.0.1:8000 maxconn 4

- haproxy forcefully started on 2C4T:

    $ taskset -c 0,1,4,5 ./haproxy -db -f epoll-dbg.cfg

- httpterm on port 8000, cpus 2,3,6,7 (2C4T)

- h1load with responses larger than a single buffer, and randomly
  closing/keeping alive:

    $ taskset -c 2,3,6,7 h1load -e -t 4 -c 256 -r 1 0:8002/?s=19k/C=2r

(cherry picked from commit 5012b6c6d9e2278ae247e3e372765be3c11af86c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: fd: add a generation number to file descriptors

This patch adds a counter of close() on file descriptors in the fdtab.
The goal is to better detect if reported events concern the current or
a previous file descriptor. For now the counter is only added, and is
showed in "show fd" as "gen". We're reusing unused space at the end of
the struct. If it's needed for something more important later, this
patch can be reverted.

(cherry picked from commit d155924efe453faa0ea5b5e62e6a3e2bf7ee7358)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: fd: add a counter of takeovers of an FD since it was last opened

That's essentially in order to help with debugging strange cases like
the occasional epoll issues/races, by keeping a counter of how many
times an FD was taken over since last inserted. The room is available
so let's use it. If it's needed later, this patch can easily be reverted.
The counter is also reported in "show fd" as "tkov".

(cherry picked from commit 44ac7a7e731537c9388a141b69cff7074afe2376)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing

Late in 3.1 we've added an integrity check to make sure we didn't keep
trash objects allocated before resizing the trash with commit 0bfd36e7b8
("MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()"), but
it turns out that the counter that is being checked includes the number
of objects left in local thread caches. As such it can trigger despite
no object being allocated. This precisely happens when setting
tune.memory.hot-size to a few megabytes because some temporarily used
trash objects will remain in cache.

In order to address this, let's first flush the pool before running
the check. That was previously done by pool_destroy() but the check
had to be inserted before it. So now we first flush the trash pool,
then verify it's no longer used, and finally we can destroy it.

This needs to be backported to 3.1. Thanks to Christian Ruppert for
reporting this bug.

(cherry picked from commit bd7a688b8b38901037cd2f9856533a788ebdbdf7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: epoll: permit to mask certain specific events

A few times in the past we've seen cases where epoll was caught reporting
a wrong event that caused trouble (e.g. spuriously reporting HUP or RDHUP
after a successful connect()). The new tune.epoll.mask-events directive
permits to mask events such as ERR, HUP and RDHUP and convert them to IN
events that are processed by the regular receive path. This should help
better diagnose and troubleshoot issues such as this one, as well as rule
out such a cause when similar issues are reported:

https://github.com/haproxy/haproxy/issues/2368
https://www.spinics.net/lists/netdev/msg876470.html

It should be harmless to backport this if necessary.

(cherry picked from commit 7fa70da06db3e69a61e4451a82a4f980b40b4d0c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: quic: adapt credit based pacing to BBR

Credit based pacing has been further refined to be able to calculate
dynamically burst size based on congestion parameter. However, BBR
algorithm already provides pacing rate and burst size (labelled as
send_quantum) for 1ms of emission.

Adapt quic_pacing_reload() to use BBR values to compute pacing credit.
This is done via pacing_burst callback which is now only defined for
BBR. For other algorithms, determine the burst size over 1ms with the
congestion window size and RTT.

This should be backported up to 3.1.

(cherry picked from commit 42bac9339cc3e3eed99ae1664a4f3633183ab640)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path

Pacing burst size is now dynamic. As such, configuration value has been
removed and related fields in bind_conf and quic_cc_path structures can
be safely removed.

This should be backported up to 3.1.

(cherry picked from commit 7896edccdcdcdde7d7d196205b83e4b3b36f3852)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: quic: use dynamic credit for pacing

Major improvements have been introduced in pacing recently. Most
notably, QMUX schedules emission on a millisecond resolution, which
allow to use passive wait to be much CPU friendly.

However, an issue remains with the pacing max credit. Unless BBR is
used, it is fixed to the configured value from quic-cc-algo bind
statement. This is not practical as if too low, it may drastically
reduce performance due to 1ms sleep resolution. If too high, some
clients will suffer from too much packet loss.

This commit fixes the issue by implementing a dynamic maximum credit
value based on the network condition specific to each clients.
Calculation is done to fix a maximum value which should allow QMUX
current tasklet context to emit enough data to cover the delay with the
next tasklet invokation. As such, avg_loop_us is used to detect the
process load. If too small, 1.5ms is used as minimal value, to cover the
extra delay incurred by the system which will happen for a default 1ms
sleep.

This should be backported up to 3.1.

(cherry picked from commit cb91ccd8a8b548d85bf5155b1a8ce759f1480a4e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: mux-quic: reduce pacing CPU usage with passive wait

Pacing algorithm has been revamped in the previous commit to implement a
credit based solution. This is a far more adaptative solution, in
particular which allow to catch up in case pause between pacing emission
was longer than expected.

This allows QMUX to remove the active loop based on tasklet wake-up.
Instead, a new task is used when emission should be paced. The main
advantage is that CPU usage is drastically reduced.

New pacing task timer is reset each time qcc_io_send() is invoked. Timer
will be set only if pacing engine reports that emission must be
interrupted. In this case timer is set via qcc_wakeup_pacing() to the
delay reported by congestion algorithm, or 1ms if delay is too short. At
the end of qcc_io_cb(), pacing task is queued if timer has been set.

Pacing task execution is simple enough : it immediately wakes up QCC I/O
handler.

Note that to have decent performance, it requires to have a large enough
burst defined in configuration of quic-cc-algo. However, this value is
common to every listener clients, which may cause too much loss under
network conditions. This will be address in a future patch.

This should be backported up to 3.1.

(cherry picked from commit 8098be1fdc71fd548ade4aa0530918bf7cc2a8f4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: quic: implement credit based pacing

Implement a new method for QUIC pacing emission based on credit. This
represents the number of packets which can be emitted in a single burst.
After emission, decrement from the credit the number of emitted packets.
Several emission can be conducted in the same sequence until the credit
is completely decremented.

When a new emission sequence is initiated (i.e. under a new QMUX tasklet
invokation), credit is refilled according to the delay which occured
between the last and current emission context.

This new mechanism main advantage is that it allows to conduct several
emission in the same task context without having to wait between each
invokation. Wait is only forced if pacing is expired, which is now
equivalent to having a null credit.

Furthermore, if delay between two emissions sequence would have been
smaller than expected, credit is only partially refilled. This allows to
restart emission without having to wait for the whole credit to be
available.

On the implementation side, a new field <credit> is avaiable in
quic_pacer structure. It is automatically decremented on
quic_pacing_sent_done() invokation. Also, a new function
quic_pacing_reload() must be used by QUIC MUX when a new emission
sequence is initiated to refill credit. <next> field from quic_pacer has
been removed.

For the moment, credit is based on the burst configured via quic-cc-algo
keyword, or directly reported by BBR.

This should be backported up to 3.1.

(cherry picked from commit 4489a61585283803726b10c5b081c26f24b7d8dd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: increment pacing retry counter on expired

A field <paced_sent_ctr> from quic_pacer structure is used to report the
number of occurences where emission has been interrupted due to pacing.
However, it was not incremented when QUIC MUX had to pause immediately
emission as pacing was still not yet expired.

Fix this by incrementing <paced_sent_ctr> in qcc_io_send() prior to
emission if pacing is expired. Note that incrementation is only done
once if the tasklet is then repeatdely woken up until the timer is
expired.

This should be backported up to 3.1.

(cherry picked from commit 9d8589f0dee050352ae874be6a4fdeecbb428992)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: rename pacing_rate cb to pacing_inter

Rename one of the congestion algorithms pacing callback from pacing_rate
to pacing_inter. This better reflects that this function returns a delay
(in nanoseconds) which should be applied between each packet emission to
fill the congestion window with a perfectly smoothed emission.

This should be backported up to 3.1.

(cherry picked from commit 7c0820892f47642cfa27e49aa1218dfaf640e3b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types

Some actions such as "sc0_get_gpc0" (using smp_fetch_sc_stkctr()
internally) can take an optional table name as parameter to perform the
lookup on a different table from the tracked one but using the key from
the tracked entry. It is done by leveraging the stktable_lookup() function
which was originally meant to perform intra-table lookups.

Calling sc0_get_gpc0() with a different table name will result in
stktable_lookup() being called to perform lookup using a stktsess from
a different table. While it is theorically fine, it comes with a pitfall:
both tables (the one from where the stktsess originates and the actual
target table) should rely on the exact same key type and length.

Failure to do so actually results in undefined behavior, because the key
type and/or length from one table is used to perform the lookup in
another table, while the underlying lookup API expects explicit type and
key length.

For instance, consider the below example:

  peers testpeers
    bind 127.0.0.1:10001
    server localhost

    table test type binary len 1 size 100k expire 1h store gpc0
    table test2 type string size 100k expire 1h store gpc0

  listen test_px
    mode http
    bind 0.0.0.0:8080
    http-request track-sc0 bin(AA) table testpeers/test
    http-request track-sc1 str(ok) table testpeers/test2
    log-format "%[sc0_get_gpc0(testpeers/test2)]"
    log stdout format raw local0

    server s1 git.haproxy.org:80

Performing a curl request to localhost:8080 will cause unitialized reads
because string "ok" from test2 table will be compared as a string against
"AA" binary sample which is not NULL terminated:

==2450742== Conditional jump or move depends on uninitialised value(s)
==2450742==    at 0x484F238: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2450742==    by 0x27BCE6: stktable_lookup (stick_table.c:539)
==2450742==    by 0x281470: smp_fetch_sc_stkctr (stick_table.c:3580)
==2450742==    by 0x283083: smp_fetch_sc_get_gpc0 (stick_table.c:3788)
==2450742==    by 0x2A805C: sample_process (sample.c:1376)

So let's prevent that by adding some comments in stktable_set_entry()
func description, and by adding a check in smp_fetch_sc_stkctr() to ensure
both source stksess and target table share the same key properties.

While it could be relevant to backport this in all stable versions, it is
probably safer to wait for some time before doing so, to ensure that no
existing configs rely on this ambiguity because the fact that the target
table and source stksess entry need to share the same key type and length
is not explicitly documented.

(cherry picked from commit 5bbdd14f56a2a672e8fef4449e4143b0f0642812)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut

On shut, truncated HTX messages were not properly handled by the H2
multiplexer. Depending on how data were emitted, a chunked HTX message
without the 0-CRLF could be considered as full and an empty data with ES
flag set could be emitted instead of a RST_STREAM(CANCEL) frame.

In the H2 multiplexer, when a shut is performed, an HTX message is
considered as truncated if more HTX data are still expected. It is based on
the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream.
However, this flag is set or unset depending on the HTX extra field
value. This field is used to state how much data that must still be
transferred, based on the announced data length. For a message with a
content-length, this assumption is valid. But for a chunked message, it is
not true. Only the length of the current chunk is announced. So we cannot
rely on this field in that case to know if a message is full or not.

Instead, we must rely on the HTX start-line flags to know if more HTX data
are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag
is set on the HTX start-line), it means that more data are always expected,
until the end of message is reached (the HTX_FL_EOM flag is set on the HTX
message). This is true for bodyless message because the end of message is
reported with the end of headers. This is also true for tunneled messages
because the end of message is received before switching the H2 stream in
tunnel mode.

This patch must be backported as far as 2.8.

(cherry picked from commit b70921f2c19a0612a4f1e31ef16cf8c8b216fdf9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: Fix truncated.vtc to send 0-CRLF

When a chunked messages is sent, the 0-CRLF must be explicitely sent. Since
the begining, it is missing. Just add it.

(cherry picked from commit b93e419750aa8921e76e1e58c57a9bf8eefea969)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-quic: prevent crash after MUX init failure

qmux_init() may fail for several reasons. In this case, connection
resources are freed and underlying and a CONNECTION_CLOSE will be
emitted via its quic_conn instance.

In case of qmux_init() failure, qcc_release() is used to clean up
resources, but QCC <conn> member is first resetted to NULL, as
connection released must be delayed. Some cleanup operations are thus
skipped, one of them is the resetting of <ctx> connection member to
NULL. This may cause a crash as <ctx> is a dangling pointer after QCC
release. One of the possible reproducer is to activate QMUX traces,
which will cause a segfault on the qmux_init() error leave trace.

To fix this, simply reset <ctx> to NULL manually on qmux_init() failure.

This must be backported up to 3.0.

(cherry picked from commit 2715dbe9d065d8700a8fba6e2605a451cfbb72b8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: prevent crash on conn access after MUX init failure

Initially, QUIC-MUX was responsible to reset quic_conn <conn> member to
NULL when MUX was released. This was performed via qcc_release().

However, qcc_release() is also used on qmux_init() failure. In this
case, connection must be freed via its session, so QCC <conn> member is
resetted to NULL prior to qcc_release(), which prevents quic_conn <conn>
member to also be resetted. As the connection is freed soon after,
quic_conn <conn> is a dangling pointer, which may cause crashes.

This bug should be very rare as first it implies that QUIC-MUX
initialization has failed (for example due to a memory alloc error).
Also, <conn> member is rarely used by quic_conn instance. In fact, the
only reproducible crash was done with QUIC traces activated, as in this
case connection is accessed via quic_conn under __trace_enabled()
function.

To fix this, detach connection from quic_conn via the XPRT layer instead
of the MUX. More precisely, this is performed via quic_close(). This
should ensure that it will always be conducted, either on normal
connection closure, but also after special conditions such as MUX init
failure.

This should be backported up to 2.6.

(cherry picked from commit 2cdc4695cb82fce46d67cef17300ec7cf978906e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: fcgi: Don't set the status to 302 if it is already set

When a "Location" header was found in a FCGI response, the status code was
forced to 302. But it should only be performed if no status code was set
first.

So now, we take care to not override an already defined status code when the
"Location" header is found.

This patch should fix the issue #2865. It must backported to all stable
versions.

(cherry picked from commit ca79ed5eefaa65fc82e1a8c1ec4308eaaadaebd1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: filters: Handle filters registered on data with no payload callback

An HTTP filter with no http_payload callback function may be registered on
data. In that case, this filter is obviously not called when some data are
received but it remains important to update its internal state to be sure to
keep it synchronized on the stream, especially its offet value. Otherwise,
the wrong calculation on the global offset may be performed in
flt_http_end(), leading to an integer overflow when data are moved from
input to output. This overflow triggers a BUG_ON() in c_adv().

The same is true for TCP filters with no tcp_payload callback function.

This patch must be backport to all stable versions.

(cherry picked from commit 34542d5ec29c89ec45b63107f9330f185f0bfd40)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker

On reload, the new worker requests bound FDs to the old one. The old worker
sends them in message of at most 252 FDs. Each message is acknowledged by
the new worker. All messages sent or received by the old worker are handled
manually via sendmsg/recv syscalls. So the old worker must be sure consume
all the ACK replies. However, the last one was never consumed. So it was
considered as a command by the CLI applet. This issue was hidden since
recently. But it was the root cause of the issue #2862.

Note this last ack is also the first one when there are less than 252 FDs to
transfer.

This patch must be backported to all stable versions.

(cherry picked from commit 49b7bcf583261efedabad5ba15c4026f2e713c61)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: cli: Be sure to drop all input data in END state

Commit 7214dcd ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed a bug with the CLI applet. Pending input
data when the applet is in CLI_ST_END state were never consumed or dropped,
leading to a wakeup loop.

The CLI applet implements its own snd_buf callback function. It is important
it consumes all pending input data. Otherwise, the applet is woken up in
loop until it empties the request buffer. Another way to fix the issue would
be to report an error. But in that case, it seems reasonnable to drop these
data.

The issue can be observed on reload, in master/worker mode, because of issue
about the last ACK message which was never consummed by the _getsocks()
command.

This patch should fix the issue #2862. It must be backported to 3.1 with the
commit above.

(cherry picked from commit 972ce87676f40daa76fff61539fb5b0a66b56f4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals

1d3c8223 ("MINOR: ssl: allow to change the server signature algorithm")
mplemented the sigals keyword in the crt-list but never the dump of the
keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 5a7cbb8d8115770c4776836fdf727b26d490d2ad)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals

b6ae2aafde43 ("MINOR: ssl: allow to change the signature algorithm for
client authentication") implemented the client-sigals keyword in the
crt-list but never the dump of the keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 037d2e5498917d2323a9ad748b9d97aaa688f351)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED

The crappy epoll API stroke again with reloads and transferred FDs.
Indeed, when listening sockets are retrieved by a new worker from a
previous one, and the old one finally stops listening on them, it
closes the FDs. But in this case, since the sockets themselves were
not closed, epoll will not unregister them and will continue to
report new activity for these in the old process, which can only
observe, count an fd_poll_drop event and not unregister them since
they're not reachable anymore.

The unfortunate effect is that long-lasting old processes are woken
up at the same rate as the new process when accepting new connections,
and can waste a lot of CPU. Accept rates divided by 8 were observed on
a small test involving a slow transfer on 10 connections facing a reload
every second so that 10 processes were busy dealing with them while
another process was hammering the service with new connections.

Fortunately, years ago we implemented a flag FD_CLONED exactly for
similar purposes. Let's simply mark transferred FDs with FD_CLONED
so that the process knows that these ones require special treatment
and have to be manually unregistered before being closed. This does
the job fine, now old processes correctly unregister the FD before
closing it and no longer receive accept events for the new process.

This needs to be backported to all stable versions. It only affects
epoll, as usual, and this time in combination with transferred FDs
(typically reloads in master-worker mode). Thanks to Damien Claisse
for providing all detailed measurements and statistics allowing to
understand and reproduce the problem.

(cherry picked from commit 561319bd1ccaab5dd69bac0c19387670b02d484f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: post_section_parser for the last section in discovery

Previous patch 2c270a05f ("BUG/MINOR: mworker: section ignored in
discovery after a post_section_parser") needs an adjustment for the last
section of the file.

Indeed the post_section_parser of the last section must not be called in
discovery mode.

Must be backported in 3.1.

(cherry picked from commit 313eeae7dbcafac37ea6710c92de79c992c17d4c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: section ignored in discovery after a post_section_parser

When a new section is discovered, the post_section_parser of the
previous section is called. However in the new master-worker mode the
discovery mode will skip the post_section_parser. But instead of
trying to parse the current section keyword after that, it would skip
completely the current line.

This is a minor bug since there isn't a lot of section with
post_section_parser, and not a lot of section to parse in discovery
mode.

But this could be reproduced like this:

global
        expose-deprecated-directives

resolvers res
parse-resolv-conf

program foo
        command sleep 10

program bar
       command sleep 10

Ths 'resolvers' section has a post_section_parser which will be ignored
in discovery mode with the consequence of ignoring the first program
section.

This must be backported in 3.1.

(cherry picked from commit 2c270a05f06ff6ea2c23b369692a12a7780291b7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding

Function max_stream_data_size() is used to determine the payload length
of a CRYPTO frame. It takes into account that the CRYPTO length field is
a variable length integer.

Implemented calcul was incorrect as it reserved too much space as a
frame header. This error is mostly due because max_stream_data_size()
reuses max_available_room() which also reserve space for a variable
length integer. This results in CRYPTO frames shorter of 1 to 2 bytes
than the maximum achievable value, which produces in the end datagram
shorter than the MTU.

Fix max_stream_data_size() implementation. It is now merely a wrapper on
max_available_room(). This ensures that CRYPTO frame encoding is now
properly optimized to use the MTU available.

This should be backported up to 2.6.

(cherry picked from commit e6a223542ae87880b3e8261daffbe362730f9e55)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: reserve length field for long header encoding

Long header packets have a mandatory Length field, which contains the
size of Packet number and payload, encoded as a variable-length integer.
Its value can thus only be determined after the payload size is known,
which depends on the remaining buffer space after this variable-length
field.

Packet payload are encoded in two steps. First, a list of input frames
is processed until the packet buffer is full. CRYPTO and STREAM frames
payload can be splitted if need to fill the buffer. Real encoding is
then performed as a second stage operation, first with Length field,
then with the selected frames themselves.

Before this patch, no space was reserved in the buffer for Length field
when attaching the frames to the packet. This could result in a error as
the packet payload would be too large for the remaining space.

In practice, this issue was rarely encounted, mostly as a side-effect
from another issue linked to CRYPTO frame encoding. Indeed, a wrong
calculation is performed on CRYPTO splitting, which results in frame
payload shorter by a few bytes than expected. This however ensured there
would be always enough room for the Length field and payload during
encoding. As CRYPTO frames are the only big enough content emitted with
a Long header packet, this renders the current issue mostly non
reproducible.

Fix the original issue by reserving some space for Length field prior to
frame payload calculation, using a maximum value based on the remaining
room space. Packet length is then reduced if needed when encoding is
performed, which ensures there is always enough room for the selected
frames.

Note that the other issue impacting CRYPTO frame encoding is not yet
fixed. This could result in datagrams with Long header packets not
completely extended to the full MTU. The issue will be addressed in
another patch.

This should be backported up to 2.6.

(cherry picked from commit 63747452a39cff72a6bc7610da079afe01fb6dcc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: debug: close a possible race between thread dump and panic()

The rework of the thread dumping mechanism in 2.8 with commit 9a6ecbd590
("MEDIUM: debug: simplify the thread dump mechanism") opened a small
race, which is that a thread in the process of dumping other ones may
block the other one from panicing while it's looping at the end of
ha_thread_dump_fill(), or any other sequence involving the currently
dumped one.

This was emphasized in 3.1 with commit 148eb5875f ("DEBUG: wdt: better
detect apparently locked up threads and warn about them") that allowed
to emit warnings about long-stuck threads, because in this case, what
happens is that sometimes a thread starts to emit a warning (or a set
of warnings), and while the warning is being awaited for, a panic
finally happens and interrupts either the dumping thread, which never
finishes and waits for the target's pointer to become NULL which will
never happen since it was supposed to do it itself, or the currently
dumped thread which could wait for the dumping thread to become ready
while this one has not released the former.

In order to address this, first we now make sure never to dump a thread
that is already in the process of dumping another one. We're adding a
new thread flag to know this situation, that is set in ha_thread_dump_fill()
and cleared in ha_thread_dump_done(). And similarly, we don't trigger
the watchdog on a thread waiting for another one to finish its dump,
as it's likely a case of warning (and maybe even a panic) that makes
them wait for each other and we don't want such cases to be reentrant.
Finally, we check in the main polling loop that the flag never accidentally
leaked (e.g. wrong flag manipulation) as this would be difficult to spot
with bad consequences.

This should be backported at least to 2.8, and should resolve github
issue #2860. Thanks to Chris Staite for the very informative backtrace
that exhibited the problem.

(cherry picked from commit 7ddcdff33fbe62c1f0a342c64152833897c7647e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3

The clienthello callback was written when TLSv1.3 was not yet out, and
signatures algorithm changed since then.

With TLSv1.2, the least significant byte was used to determine the
SignatureAlgorithm, which could be rsa(1), dsa(2), ecdsa(3).
https://datatracker.ietf.org/doc/html/rfc5246#section-7.4.1.4.1

This was used to chose which type of certificate to push to the client.

But TLSv1.3 changed that, and introduced new RSA-PSS algorithms that
does not have the least sinificant byte to 1.
https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.3

This would result in chosing the wrong certificate when an RSA an ECDSA
ones are in the configuration for the same SNI or default entry.

This patch fixes the issue by parsing bothe hash and signature field to
check the RSA-PSS signature scheme.

This must fix issue #2852.

This must be backported in every stable versions. The code was moved
from ssl_sock.c to ssl_clienthello in recent versions.

(cherry picked from commit 3912780b1ee656b6d7327c34027f7b5d9db61731)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

[RELEASE] Released version 3.1.3

Released version 3.1.3 with the following main changes :
    - BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()
    - BUG/MINOR: quic: reject NEW_TOKEN frames from clients
    - BUG/MEDIUM: stktable: fix missing lock on some table converters
    - BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters
    - BUG/MAJOR: quic: reject too large CRYPTO frames
    - BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()
    - BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro
    - BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours
    - MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount
    - BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null
    - BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data
    - BUG/MINOR: quic: do not increase congestion window if app limited
    - BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES
    - BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"
    - CLEANUP: quic: remove unused prototype
    - BUILD: ssl: allow to build without the renegotiation API of WolfSSL
    - BUILD: ssl: more cleaner approach to WolfSSL without renegotiation

BUILD: ssl: more cleaner approach to WolfSSL without renegotiation

Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834

When building Wolfssl without renegotiation options, WolfSSL still
defines the macros about it, which warns during the build.

This patch completes the previous one by undefining the macros so
haproxy could build without any warning.

(cherry picked from commit b43e5d8c1692a0f15db4e621e3cff41158a47167)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

BUILD: ssl: allow to build without the renegotiation API of WolfSSL

In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was
suggested to push --enable-haproxy within --enable-distro.

WolfSSL does not want to include the renegotiation support in
--enable-distro.

To achieve this, let haproxy build without SSL_renegotiate_pending()
when wolfssl does not define HAVE_SECURE_RENEGOCIATION or
HAVE_SERVER_RENEGOCIATION_INFO.

(cherry picked from commit c6a8279cdfc3272e34feb256ed9e4601e0a104db)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

CLEANUP: quic: remove unused prototype

Remove undefined quic_pacing_send() function prototype from quic_pacing
module.

This should be backported up to 3.1.

(cherry picked from commit 2178bf1192650da0196dc38987e64dfafd74de43)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"

shutdown-backup-sessions action for on-marked-up directive does not work anymore
since the stream_shutdown() function was modified to be async-safe.

When stream_shutdown() was modified to be async-safe, dedicated task events were
added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to
TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was
performed by process_stream() to shut the stream with the appropriate reason.

However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a
stream down because a preferred server became available, was not mapped in the
same way. So since commit b8e3b0a18d ("BUG/MEDIUM: stream: make
stream_shutdown() async-safe"), this action is ignored and does not work
anymore.

To fix an issue, and being able to bakcport the fix, a third task event was
added. TASK_F_EVT3 is now mapped on SF_ERR_UP.

This patch should fix the issue #2848. It must be backported as far as 2.6.

(cherry picked from commit 0a52a75ef7151a187b42cae6d49a75b79380f211)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES

ssl_sock_load_ca and ssl_sock_free_ca definitions are compiled only, if
SSL_NO_GENERATE_CERTIFICATES is not set. In case, when we set this define and
build haproxy, linker throws an error. So, let's fix this.

This should be backported in all stable versions.

(cherry picked from commit c987f30245023f3bdf4dbe5296ed39f2d8faa98b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: do not increase congestion window if app limited

Previously, congestion window was increased any time each time a new
acknowledge was received. However, it did not take into account the
window filling level. In a network condition with negligible loss, this
will cause the window to be incremented until the maximum value (by
default 480k), even though the application does not have enough data to
fill it.

In most cases, this issue is not noticeable. However, it may lead to
excessive memory consumption when a QUIC connection is suddendly
interrupted, as in this case haproxy will fill the window with
retransmission. It even has caused OOM crash when thousands of clients
were interrupted at once on a local network benchmark.

Fix this by first checking window level prior to every incrementation
via a new helper function quic_cwnd_may_increase(). It was arbitrarily
decided that the window must be at least 50% full when the ACK is
handled prior to increment it. This value is a good compromise to keep
window in check while still allowing fast increment when needed.

Note that this patch only concerns cubic and newreno algorithm. BBR has
already its notion of application limited which ensures the window is
only incremented when necessary.

This should be backported up to 2.6.

(cherry picked from commit bbaa7aef7ba25480b842cda274e57d820680bb57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data

It is possible to have front H1 connections waiting for the client timeout
while they should be closed because a conneciton error was reported before
sebding an error message to the client. It is not a leak because the
connections are closed when the timeout expires but it is a waste of
ressources, especially if the client timeout is high.

When an early error message must be sent to the client, if an error was
already detected, no data are sent and the output buffer is released. At
this stage, the H1 connection is in CLOSING state and it must be
released. But because of a bug, this is not performed. The client timeout is
rearmed and the H1 connection is only closed when it expires.

To fix the issue, the condition to close a H1C must also be evaluated when
an error is detected before sending data.

It is only an issue with idle client connections, because there is no H1
stream in that case and the error message is generated by the mux itself.

This patch must be backported as far as 2.8.

(cherry picked from commit b18e988e0dabbb0c382d2dfb5b8701670a386dde)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null

Some new compilers warn that <oldest_lost> variable can be null even this cannot be
the case as mentioned by the comment about an already present ASSUME_NONNULL()
call comment as follows:

src/quic_loss.c: In function ‘qc_release_lost_pkts’:
src/quic_loss.c:307:86: error: potential null pointer dereference [-Werror=null-dereference]
307 | unsigned int period = newest_lost->time_sent_ms - oldest_lost->time_sent_ms;
| ~~~~~~~~~~~^~~~~~~~~~~~~~

Move up this ASSUME_NONNULL() statement to please these compiler.

Must be backported as far as 2.6 to easy any further backport around this code part.

(cherry picked from commit 1f099db7e2ca978b467f0d524261af1d588d1d0a)
[cf: ALREADY_CHECKED() is moved because ASSUME_NONNULL() does not exist]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount

This is definitively a bug to call quic_tx_packet_refdec() to decrement the reference
counter of a TX packet calling quic_tx_packet_refdec(), and possibly to release its
memory when it is negative or null.

This counter is incremented when a TX frm is attached to it with some allocated memory
and when the packet is inserted into a data structure, if needed (list or tree).

Should be easily backported as far as 2.6 to ease any further backport around
this code part.

(cherry picked from commit 4f38c4bfd8c07331258b4b13111a07c7b3013021)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours

Reset ->prev and ->next fields of a coalesced TX packet to ensure it cannot access
several times its neighbours after it is supposed to be detached from them calling
quic_tx_packet_dgram_detach().

There are two cases where a packet can be coalesced to another previous built one:
this is when it is built into the same datagrame without GSO (and flagged flag with
QUIC_FL_TX_PACKET_COALESCED) or when sent from the same sendto() syscall with GOS
(not flagged with QUIC_FL_TX_PACKET_COALESCED).

This fix may be in relation with GH #2839.

Must be backported as far as 2.6.

(cherry picked from commit cb729fb64d18815fd73e48d24f0d58a830aee63c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro

This environment variable was added by commit d4c0be6b20 ("MINOR: startup:
HAPROXY_STARTUP_VERSION contains the version used to start"). However, it's
set from the macro that is passed during the build process instead of being
set from the variable that's kept up to date in version.c. The difference
is visible only during debugging/bisecting because only changed files and
version.o are rebuilt, but not necessarily haproxy.o, which is where the
environment variable is set. This means that the version exposed in the
environment is not necessarily the same as the one presented in
"haproxy -v" during such debugging sessions.

This should be backported to 2.8. It has no impact at all on regularly
built binaries.

(cherry picked from commit 9e61cf67909936036e8c22be5ea340b9d717cd88)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()

sink_new_from_srv() leverages sink_new_buf() with the server id as name,
sink_new_buf() then calls __sink_new() with the provided name.

Unfortunately sink_new() is designed in such a way that it will first look
up in the list of existing sinks to check if a sink already exists with
given name, in which case the existing sink is returned. While this
behavior may be error-prone, it is actually up to the caller to ensure
that the provided name is unique if it really expects a unique sink
pointer.

Due to this bug in sink_new_from_srv(), multiple tcp servers with the same
name defined in distinct log backends would end up sharing the same sink,
which means messages sent to one of the servers would also be forwarded to
all servers with the same name across all log backend sections defined in
the config, which is obviously an issue and could even raise security
concerns.

Example:

  defaults
    log backend@log-1 local0

  backend log-1
    mode log
    server s1 127.0.0.1:514
  backend log-2
    mode log
    server s1 127.0.0.1:5114

With the above config, logs sent to log-1/s1 would also end up being sent
to log-2/s1 due to server id "s1" being used for tcp servers in distinct
log backends.

To fix the issue, we now prefix the sink ame with the backend name:
back_name/srv_id combination is known to be unique (backend name
serves as a namespace)

This bug was reported by GH user @landon-lengyel under #2846.

UDP servers (with udp@ prefix before the address) are not affected as they
don't make use of the sink facility.

As a workaround, one should manually ensure that all tcp servers across
different log backends (backend with "mode log" enabled) use unique names

This bug was introduced in e58a9b4 ("MINOR: sink: add sink_new_from_srv()
function") thus it exists since the introduction of log backends in 2.9,
which means this patch should be backported up to 2.9.

(cherry picked from commit bfa493d4be6b390c1b0915971b42cb98418f0c78)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: quic: reject too large CRYPTO frames

Received CRYPTO frames are inserted in a ncbuf to handle out-of-order
reception via ncb_add(). They are stored on the position relative to the
frame offset, minus a base offset which corresponds to the in-order data
length already handled.

Previouly, no check was implemented on the frame offset value prior to
ncb_add(), which could easily trigger a crash if relative offset was too
large. Fix this by ensuring first that the frame can be stored in the
buffer before ncb_add() invokation. If this is not the case, connection
is closed with error CRYPTO_BUFFER_EXCEEDED, as required by QUIC
specification.

This should fix github issue #2842.

This must be backported up to 2.6.

(cherry picked from commit c3a4a4d1660866457ffb83699bd091621aae680d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters

When backends extra counters are dumped, the wrong pointer was used in the
promex context to retrieve the stats module. p[1] must be used instead of
p[2]. Because of this typo, a infinite loop could be experienced if the
output buffer is full during this stage. But in all cases an overflow is
possible leading to a memory corruption.

This patch may be related to issue #2831. It must be backported as far as
3.0.

(cherry picked from commit 91578212d7cef7405f3631db43a086ac9f9ca162)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stktable: fix missing lock on some table converters

In 819fc6f563
("MEDIUM: threads/stick-tables: handle multithreads on stick tables"),
sample fetch and action functions were properly guarded with stksess
read/write locks for read and write operations respectively, but the
sample_conv_table functions leveraged by "table_*" converters were
overlooked.

This bug was not known to cause issues in existing deployments yet (at
least it was not reported), but due to its nature it can theorically lead
to inconsistent values being reported by "table_*" converters if the value
is being updated by another thread in parallel.

It should be backported to all stable versions.

[ada: for versions < 3.0, glitch_cnt and glitch_rate samples should be
ignored as they first appeared in 3.0]

(cherry picked from commit 8919a80da9391e348aa325b44fdae40a35f48dcf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: reject NEW_TOKEN frames from clients

As specified by RFC 9000, reject NEW_TOKEN frames emitted by clients.
Close the connection with error code PROTOCOL_VIOLATION.

This must be backported up to 2.6.

(cherry picked from commit 4a5d82a97d9269eb17f9b92af6c8a9cd904705cd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()

When smp_to_stkey() deals with SINT samples, since stick-tables deals with
32 bits integers while SINT sample is 64 bit integer, inplace conversion
was done in smp_to_stkey. For that the 64 bit integer was truncated before
the key would point to it. Unfortunately this only works on little endian
architectures because with big endian ones, the key would point to the
wrong 32bit range.

To fix the issue and make the conversion endian-proof, let's re-assign
the sample as 32bit integer before the key points to it.

Thanks to Willy for having spotted the bug and suggesting the above fix.

It should be backported to all stable versions.

(cherry picked from commit b59d1fd911045c37b3fc88795a1f2c1947e6e59d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.1.2

Released version 3.1.2 with the following main changes :
    - MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)
    - BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)
    - BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.
    - BUG/MINOR: quic: wrong bbr_target_inflight() implementation
    - BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
    - BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()
    - BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
    - MINOR: quic: reduce the private data size of QUIC cc algos
    - CLEANUP: quic: remove a wrong comment about ->app_limited (drs)
    - BUG/MINOR: quic: fix the wrong tracked recovery start time value
    - BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)
    - BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines
    - REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings
    - BUG/MINOR: quic: missing Startup accelerating probing bw states
    - CLEANUP: quic: Rename some BBR functions in relation with bw probing
    - BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date
    - BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving
    - BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
    - BUILD: debug: only dump/reset glitch counters when really defined
    - BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing
    - CLEANUP: mux-quic: remove unused qcc member send_retry_list
    - MINOR: quic: add traces
    - MINOR: mux-quic: refactor wait-for-handshake support
    - MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption
    - MEDIUM/OPTIM: mux-quic: implement purg_list
    - MINOR: mux-quic: extract code to build STREAM frames list
    - MINOR: mux-quic: split STREAM and RS/SS emission
    - MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
    - MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb
    - MINOR: trace: implement tracing disabling API
    - MINOR: mux-quic: hide traces when woken up on pacing only
    - DOC: config: add example for server "track" keyword
    - DOC: config: reorder "tune.lua.*" keywords by alphabetical order
    - DOC: config: add "tune.lua.burst-timeout" to the list of global parameters
    - MINOR: hlua: add option to preserve bool type from smp to lua
    - REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool
    - BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA
    - CLEANUP: mux-quic: remove dead err label in qcc_build_frms()
    - BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse
    - MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"
    - BUG/MEDIUM: queue: Make process_srv_queue return the number of streams
    - BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"
    - DOC: config: add missing "track-sc0" in action keywords matrix
    - BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission
    - BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()
    - Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"
    - BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission
    - BUG/MEDIUM: mux-quic: do not attach on already closed stream
    - BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action
    - MINOR: config: Alert about extra arguments for errorfile and errorloc
    - BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()
    - MINOR: mux-quic: add traces on sd attach
    - MINOR: mux-quic: change return value of qcs_attach_sc()
    - BUG/MINOR: mux-quic: handle closure of uni-stream
    - BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined
    - BUG/MEDIUM: h1-htx: Properly handle bodyless messages

BUG/MEDIUM: h1-htx: Properly handle bodyless messages

During h1 parsing, there are some postparsing checks to detect bodyless
messages and switch the parsing in DONE state. However, a case was not
properly handled. Responses to HEAD requests with a "transfer-encoding"
header. The response parser remained blocked waiting for the response body.

To fix the issue, the postparsing was sliglty modified. Instead of trying to
handle bodyless messages in a common way between the request and the
response, it is now performed in the dedicated postparsing functions. It is
easier to enumerate all cases, especially because there is already a test
for responses to HEAD requests.

This patch should fix the issue #2836. It must be backported as far as 2.9.

(cherry picked from commit b9cc361b35e66c1f2d26a9b703f8759f70cbc03c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined

A 'resolvers' section may be defined without any nameserver. In that case,
we must take care to not dump corresponding Prometheus metrics. However
there is an issue that could lead to a crash or a strange infinite loop
because we are looping on an empty list and, at some point, we are
dereferencing an invalid pointer.

There is an issue because the loop on the nameservers of a resolvers section
is performed via callback functions and not the standard list_for_each_entry
macro. So we must take care to properly detect end of the list and empty
lists for nameservers. But the fix is not so simple because resolvers
sections with and without nameservers may be mixed.

To fix the issue, in rslv_promex_start_ts() and rslv_promex_next_ts(), when the
next resolvers section must be evaluated, a loop is now used to properly skip
empty sections.

This patch is related to #2831. Not sure it fixes it. It must be backported
as far as 3.0.

(cherry picked from commit 892eb2bb2ceba2c0e80d51eed69dfc9494a56988)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: handle closure of uni-stream

This commit is a direct follow-up to the previous one. As already
described, a previous fix was merged to prevent streamdesc attach
operation on already completed QCS instances scheduled for purging. This
was implemented by skipping app proto decoding.

However, this has a bad side-effect for remote uni-directional stream.
If receiving a FIN stream frame on such a stream, it will considered as
complete because streamdesc are never attached to a uni stream. Due to
the mentionned new fix, this prevent analysis of this last frame for
every uni stream.

To fix this, do not skip anymore app proto decoding for completed QCS.
Update instead qcs_attach_sc() to transform it as a noop function if QCS
is already fully closed before streamdesc instantiation. However,
success return value is still used to prevent an invalid decoding error
report.

The impact of this bug should be minor. Indeed, HTTP3 and QPACK uni
streams are never closed by the client as this is invalid due to the
spec. The only issue was that this prevented QUIC MUX to close the
connection with error H3_ERR_CLOSED_CRITICAL_STREAM.

This must be backported along the previous patch, at least to 3.1, and
eventually to 2.8 if mentionned patches are merged there.

(cherry picked from commit 801e39e1ccfc7404354d6870e779f83351593df0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: change return value of qcs_attach_sc()

A recent fix was introduced to ensure that a streamdesc instance won't
be attached to an already completed QCS which is eligible to purging.
This was performed by skipping application protocol decoding if a QCS is
in such a state. Here is the patch responsible for this change.
caf60ac696a29799631a76beb16d0072f65eef12
BUG/MEDIUM: mux-quic: do not attach on already closed stream

However, this is too restrictive, in particular for unidirection stream
where no streamdesc is never attached. To fix this behavior, first
qcs_attach_sc() API has been modified. Instead of returning a streamdesc
instance, it returns either 0 on success or a negative error code.

There should be no functional changes with this patch. It is only to be
able to extend qcs_attach_sc() with the possibility of skipping
streamdesc instantiation while still keeping a success return value.

This should be backported wherever the above patch has been merged. For
the record, it was scheduled for immediate backport on 3.1, plus merging
on older releases up to 2.8 after a period of observation.

(cherry picked from commit af00be8e0f7fe3942fac93ac32bfe7ae7f5b78c9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: add traces on sd attach

Add traces into qcs_attach_sc(). This function is called when a request
is received on a QCS stream and a streamdesc instance is attached. This
will be useful to facilitate debugging.

(cherry picked from commit 4a997e5a932ef35982f4df3ac3ba5871798be2ef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()

The following patch was a major refactoring of QUIC MUX. It removes
pacing specific code path. In particular, qcc_wakeup() utility function
was removed and replaced by its tasklet_wakup() usage.
41f0472d967b2deb095d5adc8a167da973fbee3d
MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb

However, an incorrect substitution was performed in qcc_set_error(). As
such, there was no explicit wakeup in case an error is detected by QUIC
MUX or the app protocol layer. This may lead to missing error reporting
to clients.

Fix this by re-add tasklet_wakup() usage into qcc_set_error().

This must be backported up to 3.1 where above patch is scheduled.

(cherry picked from commit 4f2554903b1449a31bb30097311cb6fa86632338)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: config: Alert about extra arguments for errorfile and errorloc

errorfile and errorloc directives expect excatly two arguments. But extra
arguments were just ignored while an error should be emitted. It is now
fixed.

This patch could be backported as far as 2.2 if necessary.

(cherry picked from commit f578811c4e8965f243cd16a6f569909a2a07498c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action

The do-log action does not accept argument for now. But an error was
triggered if any extra arguments was found, preventing the use of if/unless
conditionnals.

When an action is parsed, expected arguments must be tested to detect
missing ones but not unexpected extra arguments because this should be
performed by the conditionnal parser. So just removing the test in the
do-log parser function is enough to fix the issue.

This patch must be backported to 3.1.

(cherry picked from commit a785a20befeb858d47c558378e830e0f62060d21)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-quic: do not attach on already closed stream

Due to QUIC packet reordering, a stream may be opened via a new
RESET_STREAM or STOP_SENDING frame. This would cause either Tx or Rx
channel to be immediately closed.

This can cause an issue with current QUIC MUX implementation with QCS
purging. QCS are inserted into QCC purge list when transfer could be
considered as completed. In most cases, this happens after full
request/response exchange. However, it can also happens after request
reception if RESET_STREAM/STOP_SENDING are received first.

A BUG_ON() crash will occur if a STREAM frame is received after. In this
case, streamdesc instance will be attached via qcs_attach_sc() to handle
the new request. However, QCS is already considered eligible to purging.
It could cause it to be released while its streamdesc instance remains.
A BUG_ON() crash detects this problem in qcc_purge_streams().

To fix this, extend qcc_decode_qcs() to skip app proto rcv_buf
invokation if QCS is considered completed. A similar condition was
already implemented when read was previously aborted after a
STOP_SENDING emission by QUIC MUX.

This crash was reproduced on haproxy.org. Here is the output of the
backtrace :
Core was generated by `./haproxy-dev -db -f /etc/haproxy/haproxy-current.cfg -sf 16495'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
2661                    BUG_ON_HOT(!qcs_is_completed(qcs));
[Current thread is 1 (LWP 1457)]
[ ## gdb ## ] bt
#0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
#1  0x00000000004e4db7 in qcc_io_process (qcc=0x774cca0) at src/mux_quic.c:2744
#2  0x00000000004e5a54 in qcc_io_cb (t=0x7f71193940c0, ctx=0x774cca0, status=573504) at src/mux_quic.c:2886
#3  0x0000000000b4f792 in run_tasks_from_lists (budgets=0x7ffdcea1e670) at src/task.c:603
#4  0x0000000000b5012f in process_runnable_tasks () at src/task.c:883
#5  0x00000000007de4a3 in run_poll_loop () at src/haproxy.c:2771
#6  0x00000000007deb9f in run_thread_poll_loop (data=0x1335a00 <ha_thread_info>) at src/haproxy.c:2985
#7  0x00000000007dfd8d in main (argc=6, argv=0x7ffdcea1e958) at src/haproxy.c:3570

This BUG_ON() crash can only happen since 3.1 refactoring. Indeed, purge
list was only implemented on this version. As such, please backport it
on 3.1 immediately. However, a logic issue remains for older version as
a stream could be attached on a fully closed QCS. Thus, it should be
backported up to 2.8, this time after a period of observation.

(cherry picked from commit caf60ac696a29799631a76beb16d0072f65eef12)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission

Properly fix BUG_ON() occurence when QUIC MUX emits only empty STREAM
frames. This was addressed by a previous patch but it causes another
regression so a revert was needed.

BUG_ON() on qcc_build_frms() return value is invalid. Indeed,
qcc_build_frms() may return 0, but this does not imply that frame list
is empty, as encoded frames can have a zero length payload. As such,
simply remove this invalid BUG_ON().

This must be backported up to 3.1.

(cherry picked from commit ddfd8031f895b448604cbc259f74e6cb7b086b0b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"

This reverts commit 98064537423fafe05b9ddd97e81cedec8b6b278d.

Above patch tried to fix a BUG_ON() occurence when MUX only emitted
empty STREAM frames via qcc_build_frms(). Return value of qcs_send() was
changed from the payload STREAM frame to the whole frame length.
However, this is invalid as this return value is used to ensure
connection flow-control is not exceeded on sending retry. This causes
occurence of BUG_ON() crash in qcc_io_send() as send-list is not
properly purged after QCS emission.

Reverts this incorrect fix. The original issue will be properly dealt in
the next commit.

This commit must be backported to 3.1 if reverted commit was already
applied on it.

(cherry picked from commit 85e27f1e92d0d6fd4dd0588992d3da3039a065f9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()

When data was copied from RX buffers to the channel buffer, more data than
expected could be moved because amount of data copied was never decremented
from the limit. This could lead to a stream dead lock when the compression
filter was inuse.

The issue was introduced by commit 4eb3ff1 ("MAJOR: mux-h2: make streams use
the connection's buffers") but revealed by 3816c38 ("MAJOR: mux-h2: permit a
stream to allocate as many buffers as desired").

Because a h2 stream can now have several RX buffers, in h2_rcv_buf(), we
loop on these buffers to fill the channel buffer. However, we must still
take care to respect the limit to not copy to much data. However, the
"count" variable was never decremented to reflect amount of data already
copied. So, it was possible to exceed the limit.

It was an issue when the compression filter was inuse because the channel
buffer could be fully filled, preventing the compression to be
performed. When this happened, the stream was infinitly blocked because the
compression filter was asking for some space but nothing was scheduled to be
forwarded.

This patch should fix the issue #2826. It must be backported to 3.1.

(cherry picked from commit 22f8d2c99e71599f12f03829f3c62150b1289389)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission

A BUG_ON() is present in qcc_io_send() to ensure that encoded frame list
is empty if qcc_build_frms() previously returned 0.

This BUG_ON() may be triggered if empty STREAM frame is encoded for
standalone FIN. Indeed, qcc_build_frms() returns the sum of all STREAM
payload length. In case only empty STREAM frames are generated, return
value will be 0, despite new frames encoded and inserted into frame
list.

To fix this, change return value of qcs_send(). This now returns the
whole STREAM frame length, both header and payload included. This
ensures that qcc_build_frms() won't return a nul value if new frames are
encoded, even empty ones.

This must be backported up to 3.1.

(cherry picked from commit 98064537423fafe05b9ddd97e81cedec8b6b278d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>