Repositories - haproxy-3.1.git/log

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals

1d3c8223 ("MINOR: ssl: allow to change the server signature algorithm")
mplemented the sigals keyword in the crt-list but never the dump of the
keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 5a7cbb8d8115770c4776836fdf727b26d490d2ad)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals

b6ae2aafde43 ("MINOR: ssl: allow to change the signature algorithm for
client authentication") implemented the client-sigals keyword in the
crt-list but never the dump of the keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 037d2e5498917d2323a9ad748b9d97aaa688f351)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED

The crappy epoll API stroke again with reloads and transferred FDs.
Indeed, when listening sockets are retrieved by a new worker from a
previous one, and the old one finally stops listening on them, it
closes the FDs. But in this case, since the sockets themselves were
not closed, epoll will not unregister them and will continue to
report new activity for these in the old process, which can only
observe, count an fd_poll_drop event and not unregister them since
they're not reachable anymore.

The unfortunate effect is that long-lasting old processes are woken
up at the same rate as the new process when accepting new connections,
and can waste a lot of CPU. Accept rates divided by 8 were observed on
a small test involving a slow transfer on 10 connections facing a reload
every second so that 10 processes were busy dealing with them while
another process was hammering the service with new connections.

Fortunately, years ago we implemented a flag FD_CLONED exactly for
similar purposes. Let's simply mark transferred FDs with FD_CLONED
so that the process knows that these ones require special treatment
and have to be manually unregistered before being closed. This does
the job fine, now old processes correctly unregister the FD before
closing it and no longer receive accept events for the new process.

This needs to be backported to all stable versions. It only affects
epoll, as usual, and this time in combination with transferred FDs
(typically reloads in master-worker mode). Thanks to Damien Claisse
for providing all detailed measurements and statistics allowing to
understand and reproduce the problem.

(cherry picked from commit 561319bd1ccaab5dd69bac0c19387670b02d484f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: post_section_parser for the last section in discovery

Previous patch 2c270a05f ("BUG/MINOR: mworker: section ignored in
discovery after a post_section_parser") needs an adjustment for the last
section of the file.

Indeed the post_section_parser of the last section must not be called in
discovery mode.

Must be backported in 3.1.

(cherry picked from commit 313eeae7dbcafac37ea6710c92de79c992c17d4c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: section ignored in discovery after a post_section_parser

When a new section is discovered, the post_section_parser of the
previous section is called. However in the new master-worker mode the
discovery mode will skip the post_section_parser. But instead of
trying to parse the current section keyword after that, it would skip
completely the current line.

This is a minor bug since there isn't a lot of section with
post_section_parser, and not a lot of section to parse in discovery
mode.

But this could be reproduced like this:

global
        expose-deprecated-directives

resolvers res
parse-resolv-conf

program foo
        command sleep 10

program bar
       command sleep 10

Ths 'resolvers' section has a post_section_parser which will be ignored
in discovery mode with the consequence of ignoring the first program
section.

This must be backported in 3.1.

(cherry picked from commit 2c270a05f06ff6ea2c23b369692a12a7780291b7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding

Function max_stream_data_size() is used to determine the payload length
of a CRYPTO frame. It takes into account that the CRYPTO length field is
a variable length integer.

Implemented calcul was incorrect as it reserved too much space as a
frame header. This error is mostly due because max_stream_data_size()
reuses max_available_room() which also reserve space for a variable
length integer. This results in CRYPTO frames shorter of 1 to 2 bytes
than the maximum achievable value, which produces in the end datagram
shorter than the MTU.

Fix max_stream_data_size() implementation. It is now merely a wrapper on
max_available_room(). This ensures that CRYPTO frame encoding is now
properly optimized to use the MTU available.

This should be backported up to 2.6.

(cherry picked from commit e6a223542ae87880b3e8261daffbe362730f9e55)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: reserve length field for long header encoding

Long header packets have a mandatory Length field, which contains the
size of Packet number and payload, encoded as a variable-length integer.
Its value can thus only be determined after the payload size is known,
which depends on the remaining buffer space after this variable-length
field.

Packet payload are encoded in two steps. First, a list of input frames
is processed until the packet buffer is full. CRYPTO and STREAM frames
payload can be splitted if need to fill the buffer. Real encoding is
then performed as a second stage operation, first with Length field,
then with the selected frames themselves.

Before this patch, no space was reserved in the buffer for Length field
when attaching the frames to the packet. This could result in a error as
the packet payload would be too large for the remaining space.

In practice, this issue was rarely encounted, mostly as a side-effect
from another issue linked to CRYPTO frame encoding. Indeed, a wrong
calculation is performed on CRYPTO splitting, which results in frame
payload shorter by a few bytes than expected. This however ensured there
would be always enough room for the Length field and payload during
encoding. As CRYPTO frames are the only big enough content emitted with
a Long header packet, this renders the current issue mostly non
reproducible.

Fix the original issue by reserving some space for Length field prior to
frame payload calculation, using a maximum value based on the remaining
room space. Packet length is then reduced if needed when encoding is
performed, which ensures there is always enough room for the selected
frames.

Note that the other issue impacting CRYPTO frame encoding is not yet
fixed. This could result in datagrams with Long header packets not
completely extended to the full MTU. The issue will be addressed in
another patch.

This should be backported up to 2.6.

(cherry picked from commit 63747452a39cff72a6bc7610da079afe01fb6dcc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: debug: close a possible race between thread dump and panic()

The rework of the thread dumping mechanism in 2.8 with commit 9a6ecbd590
("MEDIUM: debug: simplify the thread dump mechanism") opened a small
race, which is that a thread in the process of dumping other ones may
block the other one from panicing while it's looping at the end of
ha_thread_dump_fill(), or any other sequence involving the currently
dumped one.

This was emphasized in 3.1 with commit 148eb5875f ("DEBUG: wdt: better
detect apparently locked up threads and warn about them") that allowed
to emit warnings about long-stuck threads, because in this case, what
happens is that sometimes a thread starts to emit a warning (or a set
of warnings), and while the warning is being awaited for, a panic
finally happens and interrupts either the dumping thread, which never
finishes and waits for the target's pointer to become NULL which will
never happen since it was supposed to do it itself, or the currently
dumped thread which could wait for the dumping thread to become ready
while this one has not released the former.

In order to address this, first we now make sure never to dump a thread
that is already in the process of dumping another one. We're adding a
new thread flag to know this situation, that is set in ha_thread_dump_fill()
and cleared in ha_thread_dump_done(). And similarly, we don't trigger
the watchdog on a thread waiting for another one to finish its dump,
as it's likely a case of warning (and maybe even a panic) that makes
them wait for each other and we don't want such cases to be reentrant.
Finally, we check in the main polling loop that the flag never accidentally
leaked (e.g. wrong flag manipulation) as this would be difficult to spot
with bad consequences.

This should be backported at least to 2.8, and should resolve github
issue #2860. Thanks to Chris Staite for the very informative backtrace
that exhibited the problem.

(cherry picked from commit 7ddcdff33fbe62c1f0a342c64152833897c7647e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3

The clienthello callback was written when TLSv1.3 was not yet out, and
signatures algorithm changed since then.

With TLSv1.2, the least significant byte was used to determine the
SignatureAlgorithm, which could be rsa(1), dsa(2), ecdsa(3).
https://datatracker.ietf.org/doc/html/rfc5246#section-7.4.1.4.1

This was used to chose which type of certificate to push to the client.

But TLSv1.3 changed that, and introduced new RSA-PSS algorithms that
does not have the least sinificant byte to 1.
https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.3

This would result in chosing the wrong certificate when an RSA an ECDSA
ones are in the configuration for the same SNI or default entry.

This patch fixes the issue by parsing bothe hash and signature field to
check the RSA-PSS signature scheme.

This must fix issue #2852.

This must be backported in every stable versions. The code was moved
from ssl_sock.c to ssl_clienthello in recent versions.

(cherry picked from commit 3912780b1ee656b6d7327c34027f7b5d9db61731)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

[RELEASE] Released version 3.1.3

Released version 3.1.3 with the following main changes :
    - BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()
    - BUG/MINOR: quic: reject NEW_TOKEN frames from clients
    - BUG/MEDIUM: stktable: fix missing lock on some table converters
    - BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters
    - BUG/MAJOR: quic: reject too large CRYPTO frames
    - BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()
    - BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro
    - BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours
    - MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount
    - BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null
    - BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data
    - BUG/MINOR: quic: do not increase congestion window if app limited
    - BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES
    - BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"
    - CLEANUP: quic: remove unused prototype
    - BUILD: ssl: allow to build without the renegotiation API of WolfSSL
    - BUILD: ssl: more cleaner approach to WolfSSL without renegotiation

BUILD: ssl: more cleaner approach to WolfSSL without renegotiation

Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834

When building Wolfssl without renegotiation options, WolfSSL still
defines the macros about it, which warns during the build.

This patch completes the previous one by undefining the macros so
haproxy could build without any warning.

(cherry picked from commit b43e5d8c1692a0f15db4e621e3cff41158a47167)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

BUILD: ssl: allow to build without the renegotiation API of WolfSSL

In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was
suggested to push --enable-haproxy within --enable-distro.

WolfSSL does not want to include the renegotiation support in
--enable-distro.

To achieve this, let haproxy build without SSL_renegotiate_pending()
when wolfssl does not define HAVE_SECURE_RENEGOCIATION or
HAVE_SERVER_RENEGOCIATION_INFO.

(cherry picked from commit c6a8279cdfc3272e34feb256ed9e4601e0a104db)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

CLEANUP: quic: remove unused prototype

Remove undefined quic_pacing_send() function prototype from quic_pacing
module.

This should be backported up to 3.1.

(cherry picked from commit 2178bf1192650da0196dc38987e64dfafd74de43)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions"

shutdown-backup-sessions action for on-marked-up directive does not work anymore
since the stream_shutdown() function was modified to be async-safe.

When stream_shutdown() was modified to be async-safe, dedicated task events were
added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to
TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was
performed by process_stream() to shut the stream with the appropriate reason.

However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a
stream down because a preferred server became available, was not mapped in the
same way. So since commit b8e3b0a18d ("BUG/MEDIUM: stream: make
stream_shutdown() async-safe"), this action is ignored and does not work
anymore.

To fix an issue, and being able to bakcport the fix, a third task event was
added. TASK_F_EVT3 is now mapped on SF_ERR_UP.

This patch should fix the issue #2848. It must be backported as far as 2.6.

(cherry picked from commit 0a52a75ef7151a187b42cae6d49a75b79380f211)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl: put ssl_sock_load_ca under SSL_NO_GENERATE_CERTIFICATES

ssl_sock_load_ca and ssl_sock_free_ca definitions are compiled only, if
SSL_NO_GENERATE_CERTIFICATES is not set. In case, when we set this define and
build haproxy, linker throws an error. So, let's fix this.

This should be backported in all stable versions.

(cherry picked from commit c987f30245023f3bdf4dbe5296ed39f2d8faa98b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: do not increase congestion window if app limited

Previously, congestion window was increased any time each time a new
acknowledge was received. However, it did not take into account the
window filling level. In a network condition with negligible loss, this
will cause the window to be incremented until the maximum value (by
default 480k), even though the application does not have enough data to
fill it.

In most cases, this issue is not noticeable. However, it may lead to
excessive memory consumption when a QUIC connection is suddendly
interrupted, as in this case haproxy will fill the window with
retransmission. It even has caused OOM crash when thousands of clients
were interrupted at once on a local network benchmark.

Fix this by first checking window level prior to every incrementation
via a new helper function quic_cwnd_may_increase(). It was arbitrarily
decided that the window must be at least 50% full when the ACK is
handled prior to increment it. This value is a good compromise to keep
window in check while still allowing fast increment when needed.

Note that this patch only concerns cubic and newreno algorithm. BBR has
already its notion of application limited which ensures the window is
only incremented when necessary.

This should be backported up to 2.6.

(cherry picked from commit bbaa7aef7ba25480b842cda274e57d820680bb57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1: Properly close H1C if an error is reported before sending data

It is possible to have front H1 connections waiting for the client timeout
while they should be closed because a conneciton error was reported before
sebding an error message to the client. It is not a leak because the
connections are closed when the timeout expires but it is a waste of
ressources, especially if the client timeout is high.

When an early error message must be sent to the client, if an error was
already detected, no data are sent and the output buffer is released. At
this stage, the H1 connection is in CLOSING state and it must be
released. But because of a bug, this is not performed. The client timeout is
rearmed and the H1 connection is only closed when it expires.

To fix the issue, the condition to close a H1C must also be evaluated when
an error is detected before sending data.

It is only an issue with idle client connections, because there is no H1
stream in that case and the error message is generated by the mux itself.

This patch must be backported as far as 2.8.

(cherry picked from commit b18e988e0dabbb0c382d2dfb5b8701670a386dde)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: quic: Move an ASSUME_NONNULL() for variable which is not null

Some new compilers warn that <oldest_lost> variable can be null even this cannot be
the case as mentioned by the comment about an already present ASSUME_NONNULL()
call comment as follows:

src/quic_loss.c: In function ‘qc_release_lost_pkts’:
src/quic_loss.c:307:86: error: potential null pointer dereference [-Werror=null-dereference]
307 | unsigned int period = newest_lost->time_sent_ms - oldest_lost->time_sent_ms;
| ~~~~~~~~~~~^~~~~~~~~~~~~~

Move up this ASSUME_NONNULL() statement to please these compiler.

Must be backported as far as 2.6 to easy any further backport around this code part.

(cherry picked from commit 1f099db7e2ca978b467f0d524261af1d588d1d0a)
[cf: ALREADY_CHECKED() is moved because ASSUME_NONNULL() does not exist]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount

This is definitively a bug to call quic_tx_packet_refdec() to decrement the reference
counter of a TX packet calling quic_tx_packet_refdec(), and possibly to release its
memory when it is negative or null.

This counter is incremented when a TX frm is attached to it with some allocated memory
and when the packet is inserted into a data structure, if needed (list or tree).

Should be easily backported as far as 2.6 to ease any further backport around
this code part.

(cherry picked from commit 4f38c4bfd8c07331258b4b13111a07c7b3013021)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours

Reset ->prev and ->next fields of a coalesced TX packet to ensure it cannot access
several times its neighbours after it is supposed to be detached from them calling
quic_tx_packet_dgram_detach().

There are two cases where a packet can be coalesced to another previous built one:
this is when it is built into the same datagrame without GSO (and flagged flag with
QUIC_FL_TX_PACKET_COALESCED) or when sent from the same sendto() syscall with GOS
(not flagged with QUIC_FL_TX_PACKET_COALESCED).

This fix may be in relation with GH #2839.

Must be backported as far as 2.6.

(cherry picked from commit cb729fb64d18815fd73e48d24f0d58a830aee63c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: init: set HAPROXY_STARTUP_VERSION from the variable, not the macro

This environment variable was added by commit d4c0be6b20 ("MINOR: startup:
HAPROXY_STARTUP_VERSION contains the version used to start"). However, it's
set from the macro that is passed during the build process instead of being
set from the variable that's kept up to date in version.c. The difference
is visible only during debugging/bisecting because only changed files and
version.o are rebuilt, but not necessarily haproxy.o, which is where the
environment variable is set. This means that the version exposed in the
environment is not necessarily the same as the one presented in
"haproxy -v" during such debugging sessions.

This should be backported to 2.8. It has no impact at all on regularly
built binaries.

(cherry picked from commit 9e61cf67909936036e8c22be5ea340b9d717cd88)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: log/sink: possible sink collision in sink_new_from_srv()

sink_new_from_srv() leverages sink_new_buf() with the server id as name,
sink_new_buf() then calls __sink_new() with the provided name.

Unfortunately sink_new() is designed in such a way that it will first look
up in the list of existing sinks to check if a sink already exists with
given name, in which case the existing sink is returned. While this
behavior may be error-prone, it is actually up to the caller to ensure
that the provided name is unique if it really expects a unique sink
pointer.

Due to this bug in sink_new_from_srv(), multiple tcp servers with the same
name defined in distinct log backends would end up sharing the same sink,
which means messages sent to one of the servers would also be forwarded to
all servers with the same name across all log backend sections defined in
the config, which is obviously an issue and could even raise security
concerns.

Example:

  defaults
    log backend@log-1 local0

  backend log-1
    mode log
    server s1 127.0.0.1:514
  backend log-2
    mode log
    server s1 127.0.0.1:5114

With the above config, logs sent to log-1/s1 would also end up being sent
to log-2/s1 due to server id "s1" being used for tcp servers in distinct
log backends.

To fix the issue, we now prefix the sink ame with the backend name:
back_name/srv_id combination is known to be unique (backend name
serves as a namespace)

This bug was reported by GH user @landon-lengyel under #2846.

UDP servers (with udp@ prefix before the address) are not affected as they
don't make use of the sink facility.

As a workaround, one should manually ensure that all tcp servers across
different log backends (backend with "mode log" enabled) use unique names

This bug was introduced in e58a9b4 ("MINOR: sink: add sink_new_from_srv()
function") thus it exists since the introduction of log backends in 2.9,
which means this patch should be backported up to 2.9.

(cherry picked from commit bfa493d4be6b390c1b0915971b42cb98418f0c78)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: quic: reject too large CRYPTO frames

Received CRYPTO frames are inserted in a ncbuf to handle out-of-order
reception via ncb_add(). They are stored on the position relative to the
frame offset, minus a base offset which corresponds to the in-order data
length already handled.

Previouly, no check was implemented on the frame offset value prior to
ncb_add(), which could easily trigger a crash if relative offset was too
large. Fix this by ensuring first that the frame can be stored in the
buffer before ncb_add() invokation. If this is not the case, connection
is closed with error CRYPTO_BUFFER_EXCEEDED, as required by QUIC
specification.

This should fix github issue #2842.

This must be backported up to 2.6.

(cherry picked from commit c3a4a4d1660866457ffb83699bd091621aae680d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex: Use right context pointers to dump backends extra-counters

When backends extra counters are dumped, the wrong pointer was used in the
promex context to retrieve the stats module. p[1] must be used instead of
p[2]. Because of this typo, a infinite loop could be experienced if the
output buffer is full during this stage. But in all cases an overflow is
possible leading to a memory corruption.

This patch may be related to issue #2831. It must be backported as far as
3.0.

(cherry picked from commit 91578212d7cef7405f3631db43a086ac9f9ca162)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stktable: fix missing lock on some table converters

In 819fc6f563
("MEDIUM: threads/stick-tables: handle multithreads on stick tables"),
sample fetch and action functions were properly guarded with stksess
read/write locks for read and write operations respectively, but the
sample_conv_table functions leveraged by "table_*" converters were
overlooked.

This bug was not known to cause issues in existing deployments yet (at
least it was not reported), but due to its nature it can theorically lead
to inconsistent values being reported by "table_*" converters if the value
is being updated by another thread in parallel.

It should be backported to all stable versions.

[ada: for versions < 3.0, glitch_cnt and glitch_rate samples should be
ignored as they first appeared in 3.0]

(cherry picked from commit 8919a80da9391e348aa325b44fdae40a35f48dcf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: reject NEW_TOKEN frames from clients

As specified by RFC 9000, reject NEW_TOKEN frames emitted by clients.
Close the connection with error code PROTOCOL_VIOLATION.

This must be backported up to 2.6.

(cherry picked from commit 4a5d82a97d9269eb17f9b92af6c8a9cd904705cd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stktable: fix big-endian compatiblity in smp_to_stkey()

When smp_to_stkey() deals with SINT samples, since stick-tables deals with
32 bits integers while SINT sample is 64 bit integer, inplace conversion
was done in smp_to_stkey. For that the 64 bit integer was truncated before
the key would point to it. Unfortunately this only works on little endian
architectures because with big endian ones, the key would point to the
wrong 32bit range.

To fix the issue and make the conversion endian-proof, let's re-assign
the sample as 32bit integer before the key points to it.

Thanks to Willy for having spotted the bug and suggesting the above fix.

It should be backported to all stable versions.

(cherry picked from commit b59d1fd911045c37b3fc88795a1f2c1947e6e59d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.1.2

Released version 3.1.2 with the following main changes :
    - MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)
    - BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)
    - BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.
    - BUG/MINOR: quic: wrong bbr_target_inflight() implementation
    - BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
    - BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()
    - BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)
    - MINOR: quic: reduce the private data size of QUIC cc algos
    - CLEANUP: quic: remove a wrong comment about ->app_limited (drs)
    - BUG/MINOR: quic: fix the wrong tracked recovery start time value
    - BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)
    - BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines
    - REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings
    - BUG/MINOR: quic: missing Startup accelerating probing bw states
    - CLEANUP: quic: Rename some BBR functions in relation with bw probing
    - BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date
    - BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving
    - BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().
    - BUILD: debug: only dump/reset glitch counters when really defined
    - BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing
    - CLEANUP: mux-quic: remove unused qcc member send_retry_list
    - MINOR: quic: add traces
    - MINOR: mux-quic: refactor wait-for-handshake support
    - MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption
    - MEDIUM/OPTIM: mux-quic: implement purg_list
    - MINOR: mux-quic: extract code to build STREAM frames list
    - MINOR: mux-quic: split STREAM and RS/SS emission
    - MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send
    - MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb
    - MINOR: trace: implement tracing disabling API
    - MINOR: mux-quic: hide traces when woken up on pacing only
    - DOC: config: add example for server "track" keyword
    - DOC: config: reorder "tune.lua.*" keywords by alphabetical order
    - DOC: config: add "tune.lua.burst-timeout" to the list of global parameters
    - MINOR: hlua: add option to preserve bool type from smp to lua
    - REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool
    - BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA
    - CLEANUP: mux-quic: remove dead err label in qcc_build_frms()
    - BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse
    - MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"
    - BUG/MEDIUM: queue: Make process_srv_queue return the number of streams
    - BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"
    - DOC: config: add missing "track-sc0" in action keywords matrix
    - BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission
    - BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()
    - Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"
    - BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission
    - BUG/MEDIUM: mux-quic: do not attach on already closed stream
    - BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action
    - MINOR: config: Alert about extra arguments for errorfile and errorloc
    - BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()
    - MINOR: mux-quic: add traces on sd attach
    - MINOR: mux-quic: change return value of qcs_attach_sc()
    - BUG/MINOR: mux-quic: handle closure of uni-stream
    - BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined
    - BUG/MEDIUM: h1-htx: Properly handle bodyless messages

BUG/MEDIUM: h1-htx: Properly handle bodyless messages

During h1 parsing, there are some postparsing checks to detect bodyless
messages and switch the parsing in DONE state. However, a case was not
properly handled. Responses to HEAD requests with a "transfer-encoding"
header. The response parser remained blocked waiting for the response body.

To fix the issue, the postparsing was sliglty modified. Instead of trying to
handle bodyless messages in a common way between the request and the
response, it is now performed in the dedicated postparsing functions. It is
easier to enumerate all cases, especially because there is already a test
for responses to HEAD requests.

This patch should fix the issue #2836. It must be backported as far as 2.9.

(cherry picked from commit b9cc361b35e66c1f2d26a9b703f8759f70cbc03c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined

A 'resolvers' section may be defined without any nameserver. In that case,
we must take care to not dump corresponding Prometheus metrics. However
there is an issue that could lead to a crash or a strange infinite loop
because we are looping on an empty list and, at some point, we are
dereferencing an invalid pointer.

There is an issue because the loop on the nameservers of a resolvers section
is performed via callback functions and not the standard list_for_each_entry
macro. So we must take care to properly detect end of the list and empty
lists for nameservers. But the fix is not so simple because resolvers
sections with and without nameservers may be mixed.

To fix the issue, in rslv_promex_start_ts() and rslv_promex_next_ts(), when the
next resolvers section must be evaluated, a loop is now used to properly skip
empty sections.

This patch is related to #2831. Not sure it fixes it. It must be backported
as far as 3.0.

(cherry picked from commit 892eb2bb2ceba2c0e80d51eed69dfc9494a56988)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: handle closure of uni-stream

This commit is a direct follow-up to the previous one. As already
described, a previous fix was merged to prevent streamdesc attach
operation on already completed QCS instances scheduled for purging. This
was implemented by skipping app proto decoding.

However, this has a bad side-effect for remote uni-directional stream.
If receiving a FIN stream frame on such a stream, it will considered as
complete because streamdesc are never attached to a uni stream. Due to
the mentionned new fix, this prevent analysis of this last frame for
every uni stream.

To fix this, do not skip anymore app proto decoding for completed QCS.
Update instead qcs_attach_sc() to transform it as a noop function if QCS
is already fully closed before streamdesc instantiation. However,
success return value is still used to prevent an invalid decoding error
report.

The impact of this bug should be minor. Indeed, HTTP3 and QPACK uni
streams are never closed by the client as this is invalid due to the
spec. The only issue was that this prevented QUIC MUX to close the
connection with error H3_ERR_CLOSED_CRITICAL_STREAM.

This must be backported along the previous patch, at least to 3.1, and
eventually to 2.8 if mentionned patches are merged there.

(cherry picked from commit 801e39e1ccfc7404354d6870e779f83351593df0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: change return value of qcs_attach_sc()

A recent fix was introduced to ensure that a streamdesc instance won't
be attached to an already completed QCS which is eligible to purging.
This was performed by skipping application protocol decoding if a QCS is
in such a state. Here is the patch responsible for this change.
caf60ac696a29799631a76beb16d0072f65eef12
BUG/MEDIUM: mux-quic: do not attach on already closed stream

However, this is too restrictive, in particular for unidirection stream
where no streamdesc is never attached. To fix this behavior, first
qcs_attach_sc() API has been modified. Instead of returning a streamdesc
instance, it returns either 0 on success or a negative error code.

There should be no functional changes with this patch. It is only to be
able to extend qcs_attach_sc() with the possibility of skipping
streamdesc instantiation while still keeping a success return value.

This should be backported wherever the above patch has been merged. For
the record, it was scheduled for immediate backport on 3.1, plus merging
on older releases up to 2.8 after a period of observation.

(cherry picked from commit af00be8e0f7fe3942fac93ac32bfe7ae7f5b78c9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: add traces on sd attach

Add traces into qcs_attach_sc(). This function is called when a request
is received on a QCS stream and a streamdesc instance is attached. This
will be useful to facilitate debugging.

(cherry picked from commit 4a997e5a932ef35982f4df3ac3ba5871798be2ef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: fix wakeup on qcc_set_error()

The following patch was a major refactoring of QUIC MUX. It removes
pacing specific code path. In particular, qcc_wakeup() utility function
was removed and replaced by its tasklet_wakup() usage.
41f0472d967b2deb095d5adc8a167da973fbee3d
MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb

However, an incorrect substitution was performed in qcc_set_error(). As
such, there was no explicit wakeup in case an error is detected by QUIC
MUX or the app protocol layer. This may lead to missing error reporting
to clients.

Fix this by re-add tasklet_wakup() usage into qcc_set_error().

This must be backported up to 3.1 where above patch is scheduled.

(cherry picked from commit 4f2554903b1449a31bb30097311cb6fa86632338)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: config: Alert about extra arguments for errorfile and errorloc

errorfile and errorloc directives expect excatly two arguments. But extra
arguments were just ignored while an error should be emitted. It is now
fixed.

This patch could be backported as far as 2.2 if necessary.

(cherry picked from commit f578811c4e8965f243cd16a6f569909a2a07498c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: log: Allow to use if/unless conditionnals for do-log action

The do-log action does not accept argument for now. But an error was
triggered if any extra arguments was found, preventing the use of if/unless
conditionnals.

When an action is parsed, expected arguments must be tested to detect
missing ones but not unexpected extra arguments because this should be
performed by the conditionnal parser. So just removing the test in the
do-log parser function is enough to fix the issue.

This patch must be backported to 3.1.

(cherry picked from commit a785a20befeb858d47c558378e830e0f62060d21)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-quic: do not attach on already closed stream

Due to QUIC packet reordering, a stream may be opened via a new
RESET_STREAM or STOP_SENDING frame. This would cause either Tx or Rx
channel to be immediately closed.

This can cause an issue with current QUIC MUX implementation with QCS
purging. QCS are inserted into QCC purge list when transfer could be
considered as completed. In most cases, this happens after full
request/response exchange. However, it can also happens after request
reception if RESET_STREAM/STOP_SENDING are received first.

A BUG_ON() crash will occur if a STREAM frame is received after. In this
case, streamdesc instance will be attached via qcs_attach_sc() to handle
the new request. However, QCS is already considered eligible to purging.
It could cause it to be released while its streamdesc instance remains.
A BUG_ON() crash detects this problem in qcc_purge_streams().

To fix this, extend qcc_decode_qcs() to skip app proto rcv_buf
invokation if QCS is considered completed. A similar condition was
already implemented when read was previously aborted after a
STOP_SENDING emission by QUIC MUX.

This crash was reproduced on haproxy.org. Here is the output of the
backtrace :
Core was generated by `./haproxy-dev -db -f /etc/haproxy/haproxy-current.cfg -sf 16495'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
2661                    BUG_ON_HOT(!qcs_is_completed(qcs));
[Current thread is 1 (LWP 1457)]
[ ## gdb ## ] bt
#0  0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661
#1  0x00000000004e4db7 in qcc_io_process (qcc=0x774cca0) at src/mux_quic.c:2744
#2  0x00000000004e5a54 in qcc_io_cb (t=0x7f71193940c0, ctx=0x774cca0, status=573504) at src/mux_quic.c:2886
#3  0x0000000000b4f792 in run_tasks_from_lists (budgets=0x7ffdcea1e670) at src/task.c:603
#4  0x0000000000b5012f in process_runnable_tasks () at src/task.c:883
#5  0x00000000007de4a3 in run_poll_loop () at src/haproxy.c:2771
#6  0x00000000007deb9f in run_thread_poll_loop (data=0x1335a00 <ha_thread_info>) at src/haproxy.c:2985
#7  0x00000000007dfd8d in main (argc=6, argv=0x7ffdcea1e958) at src/haproxy.c:3570

This BUG_ON() crash can only happen since 3.1 refactoring. Indeed, purge
list was only implemented on this version. As such, please backport it
on 3.1 immediately. However, a logic issue remains for older version as
a stream could be attached on a fully closed QCS. Thus, it should be
backported up to 2.8, this time after a period of observation.

(cherry picked from commit caf60ac696a29799631a76beb16d0072f65eef12)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-quic: properly fix BUG_ON on empty STREAM emission

Properly fix BUG_ON() occurence when QUIC MUX emits only empty STREAM
frames. This was addressed by a previous patch but it causes another
regression so a revert was needed.

BUG_ON() on qcc_build_frms() return value is invalid. Indeed,
qcc_build_frms() may return 0, but this does not imply that frame list
is empty, as encoded frames can have a zero length payload. As such,
simply remove this invalid BUG_ON().

This must be backported up to 3.1.

(cherry picked from commit ddfd8031f895b448604cbc259f74e6cb7b086b0b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

Revert "BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission"

This reverts commit 98064537423fafe05b9ddd97e81cedec8b6b278d.

Above patch tried to fix a BUG_ON() occurence when MUX only emitted
empty STREAM frames via qcc_build_frms(). Return value of qcs_send() was
changed from the payload STREAM frame to the whole frame length.
However, this is invalid as this return value is used to ensure
connection flow-control is not exceeded on sending retry. This causes
occurence of BUG_ON() crash in qcc_io_send() as send-list is not
properly purged after QCS emission.

Reverts this incorrect fix. The original issue will be properly dealt in
the next commit.

This commit must be backported to 3.1 if reverted commit was already
applied on it.

(cherry picked from commit 85e27f1e92d0d6fd4dd0588992d3da3039a065f9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Count copied data when looping on RX bufs in h2_rcv_buf()

When data was copied from RX buffers to the channel buffer, more data than
expected could be moved because amount of data copied was never decremented
from the limit. This could lead to a stream dead lock when the compression
filter was inuse.

The issue was introduced by commit 4eb3ff1 ("MAJOR: mux-h2: make streams use
the connection's buffers") but revealed by 3816c38 ("MAJOR: mux-h2: permit a
stream to allocate as many buffers as desired").

Because a h2 stream can now have several RX buffers, in h2_rcv_buf(), we
loop on these buffers to fill the channel buffer. However, we must still
take care to respect the limit to not copy to much data. However, the
"count" variable was never decremented to reflect amount of data already
copied. So, it was possible to exceed the limit.

It was an issue when the compression filter was inuse because the channel
buffer could be fully filled, preventing the compression to be
performed. When this happened, the stream was infinitly blocked because the
compression filter was asking for some space but nothing was scheduled to be
forwarded.

This patch should fix the issue #2826. It must be backported to 3.1.

(cherry picked from commit 22f8d2c99e71599f12f03829f3c62150b1289389)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-quic: fix BUG_ON on empty STREAM emission

A BUG_ON() is present in qcc_io_send() to ensure that encoded frame list
is empty if qcc_build_frms() previously returned 0.

This BUG_ON() may be triggered if empty STREAM frame is encoded for
standalone FIN. Indeed, qcc_build_frms() returns the sum of all STREAM
payload length. In case only empty STREAM frames are generated, return
value will be 0, despite new frames encoded and inserted into frame
list.

To fix this, change return value of qcs_send(). This now returns the
whole STREAM frame length, both header and payload included. This
ensures that qcc_build_frms() won't return a nul value if new frames are
encoded, even empty ones.

This must be backported up to 3.1.

(cherry picked from commit 98064537423fafe05b9ddd97e81cedec8b6b278d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add missing "track-sc0" in action keywords matrix

In d54e8f8107 ("DOC: config: reorganize actions into their own section"),
"track-sc0" keyword was properly documented but the keyword was not placed
in the action keywords matrix alongside other track-sc* statements. It
was probably overlooked, so let's fix that.

Could be backported up to 2.9 with d54e8f8107.

(cherry picked from commit f94c63021b9d46a5a77540b6b95cf6e60dcbf5e0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"

Since b3d5708 ("MINOR: stats: remove implicit static trash_chunk usage")
a segfault can occur when issuing "show schema json" on the stats socket.

Indeed, now the dumping functions don't rely on trash_chunk anymore, but
instead they rely on the appctx->chunk buffer. However, unlike other
stats dumping commands, the "show schema json" only have an io handler,
and no parse function. With other command, the parse function is
responsible for pre-setting some data, including applet ctx reservation.

Thus due to "show schema json" lacking parsing function, the applet ctx is
used uninitialized, which is a bug obviously.

To fix the issue we simply add a parse function for "show schema json",
although all it does for now is calling applet_reserve_svcctx() for the
current applet ctx.

This issue was reported by @dsuch in GH #2825. It must be backported up
to 3.0.

(cherry picked from commit ac1f413590932d4eb3b78ca5a536845f162b7866)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queue: Make process_srv_queue return the number of streams

Make process_srv_queue() return the number of streams unqueued, as
pendconn_grab_from_px() did, as that number is used by
srv_update_status() to generate logs.

This should be backported up to 2.6 with
111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8

(cherry picked from commit 5b8899b6ccc7dab3a54a51dcb8ba1512bd0c886c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: hlua: rename "tune.lua.preserve-smp-bool" to "tune.lua.bool-sample-conversion"

A better name was found for the option implemented in ec74438
("MINOR: hlua: add option to preserve bool type from smp to lua")

Indeed, "tune.lua.preserve-smp-bool {on | off}" wasn't explicit enough
nor did it encourage the adoption of the new "fixed" behavior (vs
historical behavior which is now considered as a bug).

Thus it becomes "tune.lua.bool-sample-conversion { normal | pre-3.1-bug }"
which actively encourage users to switch the new behavior after having
patched in-use Lua script if needed. From a technical point of view,
the logic remains the same, as the option currently defaults to
"pre-3.1-bug" to prevent script breakage, and a warning is emitted if
the option isn't set explicily and Lua is used.

Documentation and regtests were updated.

Must be backported in 3.1 with ec74438 and f2838f5 ("REGTESTS: fix
lua-based regtests using tune.lua.smp-preserve-bool")

(cherry picked from commit 29b6d8af16884bb8f0b5cd83a3752980df8548d0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse

Dedicated HTTP/2 stats proxy counters are available for current and
total number of HTTP/2 connection on both frontend and backend sides.
Both counters are simply incremented into h2_init().

This causes issues when using reverse HTTP. First, increment is not
performed on the expected side, as it is triggered before
h2_conn_reverse() which switches a connection from frontend to backend
or vice versa. For example on active revers side, h2_total_connections
is incremented on the backend only even after connection is reversed and
attached to a listener for the remainder of its lifetime.

h2_open_connections suffers from a similar but arguably worst behavior
as it is also decremented. If increment and decrement operations are not
performed on the same proxy side, which happens for every connection
which has been successfully reversed, it causes an invalid counter
value, possibly with an integer overflow.

To fix this, delay increment operations on reverse HTTP from h2_init()
to h2_conn_reverse(). Both counters are updated only after reverse has
completed, thus using the expected frontend or backend side.

To prevent overflow on h2_open_connections, ensure h2_release()
decrement is not performed if a connection is freed before achieving its
reversal, as in this case it would not have been accounted by H2
counters.

This should be backported up to 2.9.

This should fix github issue #2821.

(cherry picked from commit 8633446337cd322f010ce7b3ca9c257d1ce7848c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: mux-quic: remove dead err label in qcc_build_frms()

STREAM frames emission in qcc_build_frms() has been splitted from
RESET_STREAM/STOP_SENDING into qcc_emit_rs_ss(). Now, the former cannot
fail, as such err label can be removed as it is unreachable.

This should be backported up to 3.1.

This should fix github issue #2824.

(cherry picked from commit 4490df57a6d6d603077c1ecbb69c67a07bc9e3d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-quic: prevent BUG_ON() by refreshing frms on MAX_DATA

QUIC MUX emission has been optimized recently by recycling STREAM frames
list between emission cycles. This is done via qcc frms list member. If
new data is available, frames list must be cleared before the next
emission to force the encoding of new STREAM frames.

If a refresh frames list is missed, it would lead to incomplete data
emission on the next transfer. In most cases, this is detected via a
BUG_ON() inside qcc_io_send(), as qcs instances remains in send_list
after a qcc_send_frames() full emission.

A bug was recently found which causes this BUG_ON() crash. This is
directly related to flow control. Indeed, when sending credit is
increased on the connection or a stream, frames list should be cleared
as new larger STREAM frames could be encoded. This was already performed
on MAX_DATA/MAX_STREAM_DATA reception but only if flow-control limit was
unblocked. However this is not the proper condition and it may lead to
insufficient frames refresh and thus this BUG_ON() crash.

Fix this by adjusting the condition for frames refresh on flow control
credit increase. Now, frames list is cleared if real offset is not
blocked and soft offset was equal or greater to the previous limit.
Indeed, this is the only case in which frames refreshing is necessary as
it would result in bigger encoded STREAM frames.

This bug was detected on QUIC interop with go-x-net client. It can also
be reproduced, albeit not systematically, using the following command :
  $ ngtcp2-client -q --no-quic-dump --no-http-dump \
    --exit-on-all-streams-close --max-data 10 \
    127.0.0.1 20443 -n10 "http://127.0.0.1:20443/?s=10k"

This bug appeared with the following patch. As it is scheduled for 3.1
backporting, the current fix should be backported with it.
  14710b5e6bf76834343d58db22e00b72590b16fe
  MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send

(cherry picked from commit 7edb2ffae7e899af58a6b4fe546ac0665f5d3c9d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: fix lua-based regtests using tune.lua.smp-preserve-bool

Because of the previous commit, configs making use of lua script without
setting "tune.lua.smp-preserve-bool" explicitly now raise a warning.

However, since 6f746af91 ("REGTESTS: use -dW by default on every
reg-tests"), regtests are not allowed to raise warnings anymore.

Because of this the CI now fails for every tests that relies on Lua.
To fix this, let's explicitly set the "tune.lua.smp-preserve-bool" for
all tests involving Lua. Here we set the value to "on" because we know
it is safe to do so, and this way it will be future-proof.

If ec7443827 ("MINOR: hlua: add option to preserve bool type from smp to
lua") is backported, then this patch must be backported with it (if it
is not trivial to backport, then simply follow this rule: grep for
"lua-load" in reg-tests directory, then for each match, make sure to set
the tune.smp-preserve-bool tunable in the global section.

(cherry picked from commit f2838f5172f163842c8b4d1bd2f55c9138e84573)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: hlua: add option to preserve bool type from smp to lua

As discussed in GH #2814, there is an ambiguity in hlua implementation
that causes haproxy smp boolean type to be pushed as an integer on the
Lua stack. On the other hand, when doing Lua to haproxy smp conversion,
the boolean type is properly perserved. Of course this situation is not
desirable and can lead to unexpected results. However we cannot simply
fix the behavior because in Lua boolean and integer types are not
are completely distinct types and cannot be used interchangeably. So in
order to prevent breaking existing scripts logic, in this patch we add a
dedicated lua tunable named "tune.lua.smp-preserve-bool" which can take
the following values:

  - "on" : when converting haproxy smp to lua, boolean type is preserved
  - "off": when converting haproxy smp to lua, boolean is converted to
           integer (legacy behavior)

For now, the tunable defaults to "off" to preserve historical behavior.
However, when the option isn't set explicitly and lua is used, a warning
will be emitted in order to raise user's awareness about this ambiguity.
It is expected that the tunable could default to "on" in future versions,
thus it is recommended to avoid setting it to "off" except when using
existing Lua scripts that still rely on the old behavior regarding boolean
smp to Lua conversion, and that they cannot be fixed easily.

This should solve issue GH #2814. It may be relevant to backport this in
haproxy 3.1.

(cherry picked from commit ec744382739d24681b7ccd08dd57ee9606870bc5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add "tune.lua.burst-timeout" to the list of global parameters

"tune.lua.burst-timeout" was properly defined but not listed in the list
of global parameters as it was overlooked in 58e36e5b1 ("MEDIUM: hlua:
introduce tune.lua.burst-timeout")

(cherry picked from commit 67e3270c59f4cb997f2078d395d365e0189326ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: reorder "tune.lua.*" keywords by alphabetical order

Effort was made to properly organize "tune.*" keywords by alphabetical
order, but "tune.lua" keywords didn't follow that rule with care.

Let's fix that.

(cherry picked from commit 985a45d9c77568798b5c0c77cae7ddba219f3b63)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add example for server "track" keyword

As requested on GH #2325, "track" server keyword could benefit from a
simple config example to show how to make use of it.

That's what we're doing in this commit, thanks to GH user @HAkmiller
for the suggestion.

(cherry picked from commit 48545113f4db86ad43a99ba01742a9811410e6dc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: hide traces when woken up on pacing only

Previous commit aligned default and pacing emission. This is a cleaner
and more robust code. However, it may disrupt traces analysis when
pacing is rescheduled until timer expiration.

Hide traces when qcc_io_cb() is woken up only due to pacing and timer is
not yet expired. This is implemented by using special TASK_WOKEN_IO for
pacing.

This should be backported up to 3.1.

(cherry picked from commit 53db43aff231d83ea6f6bcaf99f5bdb5a0330e7b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: trace: implement tracing disabling API

Define a set of functions to temporarily disable/reactivate tracing for
the current thread. This could be useful when wanting to quickly remove
tracing output for some code parts.

The API relies on a disable/resume set of functions, with a thread-local
counter. This counter is tested under __trace_enabled(). It is a
cumulative value so that the same count of resume must be issued after
several disable usage. There is also the possibility to force reset the
counter to 0 before restoring the old value.

This should be backported up to 3.1.

(cherry picked from commit 9d155ca7064290d3280621c85443221fa13334b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: mux-quic: remove pacing specific code on qcc_io_cb

Pacing was recently implemented by QUIC MUX. Its tasklet is rescheduled
until next emission timer is reached. To improve performance, an
alternate execution of qcc_io_cb was performed when rescheduled due to
pacing. This was implemented using TASK_F_USR1 flag.

However, this model is fragile, in particular when several events
happened alongside pacing scheduling. This has caused some issue
recently, most notably when MUX is subscribed on transport layer on
receive for handshake completion while pacing emission is performed in
parallel. MUX qcc_io_cb() would not execute the default code path, which
means the reception event is silently ignored.

Recent patches have reworked several parts of qcc_io_cb. The objective
was to improve performance with better algorithm on send and receive
part. Most notable, qcc frames list is only cleared when new data is
available for emission. With this, pacing alternative code is now mostly
unneeded. As such, this patch removes it. The following changes are
performed :

* TASK_F_USR1 is now not used by QUIC MUX. As such, tasklet_wakeup()
  default invokation can now replace obsolete wrappers
  qcc_wakeup/qcc_wakeup_pacing

* qcc_purge_sending is removed. On pacing rescheduling, all qcc_io_cb()
  is executed. This is less error-prone, in particular when pacing is
  mixed with other events like receive handling. This renders the code
  less fragile, as it completely solves the described issue above.

This should be backported up to 3.1.

(cherry picked from commit 41f0472d967b2deb095d5adc8a167da973fbee3d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM/OPTIM: mux-quic: do not rebuild frms list on every send

A newly introduced frames list member has been defined into QCC instance
with pacing implementation. This allowed to preserve STREAM frames built
between different emission scheduled by pacing, without having to
regenerate it if no new QCS data is available.

Generalize this principle outside of pacing scheduling. Now, the frames
list will be reused accross several qcc_io_send() usage. Frames list is
only cleared when necessary. This will force its refreshing in the next
qcc_io_send() via qcc_build_frms_list().

Frames list refreshing is performed in the following cases :
* on successful transfer from stream snd_buf / done_ff / shut
* on stream reset or read abort
* on max_data/max_stream_data reception with window increase

Note that the two first cases are in fact covered directly due to
qcc_send_stream() usage when QCS is (re)inserted into the send_list.

The main objective of this patch will be to remove QUIC MUX pacing
specific code path. It could also provide better performance as emission
of large frames may often be rescheduled due to transport layer, either
on congestion or full socket buffer. When QUIC MUX is rescheduled, no
new data is available and frames list can be reuse as-is, avoiding an
unecessary loop over send_list.

This should be backported up to 3.1.

(cherry picked from commit 14710b5e6bf76834343d58db22e00b72590b16fe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: split STREAM and RS/SS emission

This commit is a follow-up of the previous one which defines function
qcc_build_frms(). This function implements looping over qcc send_list,
to both encode and send individually any STOP_SENDING and RESET_STREAM,
but also encode STREAM frames as a preparator step. STREAM frames were
then sent as a list outside of qcc_build_frms() via qcc_send_frames().

Extract STOP_SENDING/RESET_STREAM encoding and emission step into a new
function qcc_emit_rs_ss(). The code is thus cleaner. In particular it
highlights that an error during STOP_SENDING/RESET_STREAM emission stage
is fatal and prevent any STREAM frames processing.

This should be backported up to 3.1.

(cherry picked from commit 9ecc1a8e578c67e8653ba7d1e9c5441f5ad2eca8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: extract code to build STREAM frames list

Extracts code responsible to generate STREAM, RESET_STREAM and
STOP_SENDING frames for each qcs instances registered in qcc send_list.
It is moved from qcc_io_send() to its owned new function
qcc_build_frms().

This commit does not bring functional change. It is a preparatory step
to adapt QUIC MUX send mechanism to allow reusing of qcc frms list
accross qcc_io_send() invokation.

As a side change, qcc_tx_frms_free() is renamed to qcc_clear_frms().
This better highlights its relationship with qcc_build_frms().

This should be bkacported up to 3.1.

(cherry picked from commit 244dc00b09868f52864d50fc14af8d981c81424b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM/OPTIM: mux-quic: implement purg_list

This commit is part of the current serie which aims to refactor and
improve overall performance of QUIC MUX I/O handler.

qcc_io_process() is responsible to perform some internal operations on
QUIC MUX after I/O completion. It is notably called on every qcc_io_cb()
tasklet handler.

The most intensive work on it is the purging of QCS instances after
transfer completion. This was implemented by looping on QCC streams tree
and inspecting the state of every QCS. The purpose of this commit is to
optimize this processing.

A new purg_list QCC member is defined. It is responsible to list every
QCS instances whose transfer has been completed. It is thus safe to
reuse <el_send> QCS list attach point. Stream purging will thus only
loop on purg_list instead of every known QCS.

This should be backported up to 3.1.

(cherry picked from commit e296585ae9eb86f898614aed2b04929018340ee7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption

This commit is part of the current serie which aims to refactor and
improve overall performance of QUIC MUX I/O handler.

Define a recv_list element into qcc structure. This is used to
registered every instance of qcs which are currently blocked on
demuxing, which happen on no more space in <rx.appbuf>.

The purpose of this patch is to reduce qcc_io_recv() CPU usage. Now,
only recv_list iteration is performed, instead of the previous looping
over every qcs instances. This is useful as qcc_io_recv() is called each
time qcc_io_cb() is scheduled, even if only sending condition was the
wakeup origin.

A qcs is not inserted into recv_list immediately after blocking on demux
full buffer. Instead, this is only done after unblocking via stream
rcv_buf callback, which ensure that new buffer space is available.

This should be backported up to 3.1.

(cherry picked from commit 4b42dd4ae0d65c62d1979791b50dadd0d42774bb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: refactor wait-for-handshake support

This commit refactors wait-for-handshake support from QUIC MUX. The flag
logic QC_CF_WAIT_HS is inverted : it is now positionned only if MUX is
instantiated before handshake completion. When the handshake is
completed, the flag is removed.

The flag is now set directly on initialization via qmux_init(). Removal
via qcc_wait_for_hs() is moved from qcc_io_process() to qcc_io_recv().
This is deemed more logical as QUIC MUX is scheduled on RECV to be
notify by the transport layer about handshake termination. Moreover,
qcc_wait_for_hs() is now called if recv subscription is still active.

This commit is the first of a serie which aims to refactor QUIC MUX I/O
handler and improves its overall performance. The ultimate objective is
to be able to stream qcc_io_cb() by removing pacing specific code path
via qcc_purge_sending().

This should be backported up to 3.1.

(cherry picked from commit 0a53a008d032b69377869c8caaec38f81bdd5bd6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: add traces

Add some traces to better follow QUIC MUX scheduling, in particular with
pacing interaction.

This should be backported up to 3.1.

(cherry picked from commit 9dcd2369e210f617f738d68f6764ccc229309528)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: mux-quic: remove unused qcc member send_retry_list

Remove unused fields send_retry_list from qcc and its corresponding
attach element el from qcs.

This should be backported up to 3.1.

(cherry picked from commit 17bfe93768a1f579de35875cc4cf0e35dd7b6d35)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-quic: do not mix qcc_io_send() return codes with pacing

With pacing implementation, qcc_send_frames() return code has been
extended to report emission interruption due to pacing limitation. This
is used only in qcc_io_send().

However, its invokation may be skipped using 'sent_done' label. This
happens on emission failure of a STOP_SENDING or RESET_STREAM (either
memory allocation failure, or transport layer rejection). In this case,
return values are mixed as qcs_send() is wrongly compared against pacing
interruption condition. This value corresponds to the length of the last
built STREAM frames.

If by mischance the last frame was 1 byte long, qcs_send() return value
is equal to pacing interruption condition. This has several effects. If
pacing is activated, it may lead to unneeded wakeup on QUIC MUX. Worst,
if pacing is not used, a BUG_ON() crash will be triggered.

Fix this by using a different variable dedicated to qcc_send_frames()
return value. By default it is initialized to 0. This ensures that
pacing code won't be activated in case qcc_send_frames() is not used.

This must be backported up to 3.1.

(cherry picked from commit 2e3542bec6235482e4136ddd1e23e8f7cd4b5a26)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: debug: only dump/reset glitch counters when really defined

If neither DEBUG_GLITCHES nor DEBUG_STRICT is set, we end up with
no dbg_cnt section, resulting in debug_parse_cli_counters not
building due to __stop_dbg_cnt and __start_dbg_cnt not being defined.
Let's just condition the end of the function to these conditions.
An alternate approach (less elegant) is to always declare a dummy
entry of type DBG_COUNTER_TYPES in debug.c.

This must be backported to 3.1 since it was brought with glitches.

(cherry picked from commit 4710ab5604298eefddb36887098d5fc388dcd3cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().

pendconn_grab_from_px() was called when a server was brought back up, to
get some streams waiting in the proxy's queue and get them to run on the
newly available server. It is very similar to process_srv_queue(),
except it only goes through the proxy's queue, which can be a problem,
because there is a small race condition that could lead us to add more
streams to the server queue just as it's going down. If that happens,
the server would just be ignored when back up by new streams, as its
queue is not empty, and it would never try to process its queue.
The other problem with pendconn_grab_from_px() is that it is very
liberal with how it dequeues streams, and it is not very good at
enforcing maxconn, it could lead to having 3*maxconn connections.
For both those reasons, just get rid of pendconn_grab_from_px(), and
just use process_srv_queue().
Both problems are easy to reproduce, especially on a 64 threads machine,
set a maxconn to 100, inject in H2 with 1000 concurrent connections
containing up to 100 streams each, and after a few seconds/minutes the
max number of concurrent output streams will be much higher than
maxconn, and eventually the server will stop processing connections.

It may be related to github issue #2744. Note that it doesn't totally
fix the problem, we can occasionally see a few more connections than
maxconn, but the max that have been observed is 4 more connections, we
no longer get multiple times maxconn.

have more outgoing connections than maxconn,
This should be backported up to 2.6.

(cherry picked from commit 111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving

In stream_free(), make sure we call process_srv_queue() each time we
call sess_change_server(), otherwise a server may end up not dequeuing
any stream when it could do so. In some extreme cases it could lead to
an infinite loop, as the server would appear to be available, as its
"served" parameter would be < maxconn, but would end up not being used,
as there are elements still in its queue.

This should be backported up to 2.6.

(cherry picked from commit dc9ce9c26469e00ab71fe6387dbd13010d4930f0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date

In sc_notify(), it remained a case where it was possible to set an
expiration date on the stream in the past, leading to a crash because of a
BUG_ON(). This must never happen of course.

In sc_notify(), The stream's expiration may be updated in case no wakeup
conditions are encoutered. In that case, we must take care to never set an
expiration date in the past. However, it appeared there was still a
condition to do so. This code is based on an implicit postulate: the
stream's expiration date must always be set when we leave
process_stream(). It was true since the 2.9. But in 3.0, the buffer
allocation mechanism was improved and on an alloc failure in
process_stream(), the stream is inserted in a wait-list and its expiration
date is set to TICK_ETERNITY. With the good timing, and an analysis
expiration date set on a channel, it is possible to set the stream's
expiration date in past.

After analysis, it appeared that the proper way to fix the issue is to only
evaluate I/O timers (read and write timeout) and not stream's timers
(analase_exp or conn_exp) because only I/O timers may have changed since the
last process_stream() call.

This patch must be backported as far as 3.0 to fix the issue. But it is
probably a good idea to also backported it as far as 2.8.

(cherry picked from commit 4f32d03360e42a3e7fe4ed71012ec90339b9731d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: quic: Rename some BBR functions in relation with bw probing

Rename bbr_is_probing_bw() to bbr_is_in_a_probe_state() and
bbr_is_accelerating_probing_bw() to bbr_is_probing_bw() to match
the function names of the BBR v3 internet draft.

Must be backported to 3.1 to ease any further backport to come.

(cherry picked from commit 949bc18f660d860b9102360d2b98d0c96e60266c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: missing Startup accelerating probing bw states

Startup state is also a probing with acceleration bandwidth state.
This modification should have come with this previous one:

BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)

Must be backported to 3.1.

(cherry picked from commit 0dc0c890ea20f3af76ae4f4f7422718bcd8f6605)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings

User tried to update a PEM, generated automatically. Part of this PEM has LF
line endings, and another part (CA certificate), added by some API, has CRLF
line endings. This has revealed a bug in cli_snd_buf(), see more
details in issue GitHUB #2818. So, let's add an example of such PEM in our
SSL regtest.

(cherry picked from commit ea4a148a7d9ead2132757bffd4699146e80da2df)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines

cli_snd_buf() analyzez input line by line. Before this patch it has always
scanned a given line for the presence of '\r' followed by '\n'.

This is only needed for strings, that contain the commands itself like
"show ssl cert\n", "set ssl cert test.pem <<\n".

In case of strings, which contain the command's payload, like
"-----BEGIN CERTIFICATE-----\r\n", '\r\n' should be preserved
as is.

This patch fixes the GitHub issue #2818.

This patch should be backported in v3.1 and in v3.0.

(cherry picked from commit d60c893991699af1103bcdcd33c915885c056559)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: too permissive exit condition for high loss detection in Startup (BBR)

This bug fixes the 3rd condition used by bbr_check_startup_high_loss() to decide
it has detected some high loss as mentioned by the BBR v3 RFC draft:

   4.3.1.3. Exiting Startup Based on Packet Loss
   ...
   There are at least BBRStartupFullLossCnt=6 discontiguous sequence ranges lost in that round trip.

where a <= operator was used in place of <.

Must be backported to 3.1.

(cherry picked from commit 178109f608ae1e446d9ea898bf5c83e4e8313bb5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: fix the wrong tracked recovery start time value

bbr_congestion_event() role is to track the start time of recovery periods.
This was done using <ts> passed as parameter. But this parameter is the
time the newest lost packet has been sent.
The timestamp value to store in ->recovery_start_ts is <now_ms>.

Must be backported to 3.1.

(cherry picked from commit e61b4189078f97d036952371c7fc9e5ceefdfff0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: quic: remove a wrong comment about ->app_limited (drs)

->app_limited quic_drs struct member is not a boolean. This is
the index of the last transmitted packet marked as application-limited, or 0 if
the connection is not currently application-limited (see C.app_limited
definition in BBR v3 draft).

(cherry picked from commit e1d25cdbddfa344ceee7dcecd6a84eff4a209122)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: reduce the private data size of QUIC cc algos

After these commits:

BUG/MINOR: quic: remove max_bw filter from delivery rate sampling
BUG/MINOR: quic: fix BBB max bandwidth oscillation issue

where some members were removed from bbr struct, the private data
size of QUIC cc algorithms may be reduced from 160 to 144 uint32_t.

Should be easily backported to 3.1 alonside the commits mentioned above.

(cherry picked from commit eeaeb412dc978bdc9663061d9d2542dc1f7310cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: reduce packet losses at least during ProbeBW_CRUISE (BBR)

Upon congestion events (for a instance packet loss),
bbr_adapt_lower_bounds_from_congestion() role is to adapt some BBR internal
variables in relation with the estimated bandwidth (BBR.bw).

According to the BBR v3 draft, this function should do nothing
if BBRIsProbingBW() pseudo-code returns true. That said, this function
is not defined by the BBR v3 draft. But according to this part mentioned before
defining the pseudo-code for BBRAdaptLowerBoundsFromCongestion():

4.5.10.3. When not Probing for Bandwidth
When not explicitly accelerating to probe for bandwidth (Drain, ProbeRTT,
ProbeBW_DOWN, ProbeBW_CRUISE), BBR responds to loss by slowing down to some extent.
This is because loss suggests that the available bandwidth and safe volume of
in-flight data may have decreased recently, and the flow needs to adapt, slowing
down toward the latest delivery process. BBR flows implement this response by
reducing the short-term model parameters, BBR.bw_lo and BBR.inflight_lo.

BBRIsProbingBW() should concern the accelerating probe for bandwidth states
which are BBR_ST_PROBE_BW_REFILL and BBR_ST_PROBE_BW_UP.

Adapt the code to match this latter assumption. At least this reduce
drastically the packet loss volumes at least during ProbeBW_CRUISE.

As an example, on a 100MBits/s internet link with ~94ms as RTT, before
this patch, 4329640 sent packets were needed with 1617119 lost packets (!!!) to
download a 3GB object. After this patch, 2843952 sent packets vs 144134 lost packets
are needed. There may be some packet loss issue. I suspect the maximum bandwidth
which may be overestimated. More this is the case, more the packet loss is big.
That said, at this time, it remains below 5% depending on the size of the objects,
5% being for more than 2GB objects.

Must be backported to 3.1.

(cherry picked from commit 9813de0537882355514794eed7b738bed1806954)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: underflow issue for bbr_inflight_hi_from_lost_packet()

Add a test to ensure that values of a local variable used by
bbr_inflight_hi_from_lost_packet() is not be impacted by underflow issues
when subtracting too big numbers and make this function return a correct value.

Must be backported to 3.1.

(cherry picked from commit ebfc301d5d8dbae0c07a9d937bef18d5c3f841c6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: remove max_bw filter from delivery rate sampling

This filter is no more needed after this commit:

BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.

Indeed, one added this filter at delivery rate sampling level to filter
the BBR max bandwidth estimations and was inspired from ngtcp2 code source when
trying to fix the oscillation issue. But this BBR max bandwidth oscillation issue
was fixed by the aforementioned commit.

Furthermore this code tends to always increment the BBR max bandwidth. From my point
of view, this is not a good idea at all.

Must be backported to 3.1.

(cherry picked from commit 22ab45a3a84d0a9a61cc90c6efedd893dd1cbe4a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: wrong bbr_target_inflight() implementation

This bug arrived with this commit:

6404b7a18a BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value

This patch partially reverts after having checked the BBR v3 draft.
This bug was invisible when testing long BBR flows.

Must be backported to 3.1.

(cherry picked from commit 2bcd5b4cba4e9f123860560de0807312b909b306)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: fix BBB max bandwidth oscillation issue.

Remove the code in relation with BBR.ack_phase as per this commit:
https://github.com/ietf-wg-ccwg/draft-ietf-ccwg-bbr/pull/5/commits/ee98c12ad6f0e93153656218a7df1b1ef92618d7

I do now kwow at this time why such a request was pushed on GH for the BBR v3 draft
pseudo-code. That said, the use of such an ack phase seemed confusing, adding much
more information about a BBR flow state than needed. Indeed, the ack phase
state is modified several times in the BBR draft pseudo-code but only used to
decide if the max bandwidth filter virtual clock had to be incremented by
BBRAdvanceMaxBwFilter().

In addition to this, when discussing about haproxy BBR implementation with
Neal Cardwell on the BBR development google group about an oscillation issue
of the max bandwidth (BBR.max_bw), I concluded that this was due to the fact
that its filter virutal clock was too often update, due to the ack phase wich
was stalled in BBR_ACK_PHASE_ACKS_PROBE_STOPPING state for too long. This is
where Neal asked me to test the aforementioned commit. This definitively
makes the max bandwidth (BBR.max_bw) oscillation issue disappear.

Another solution would have been to add a new ack phase enum afer
BBR_ACK_PHASE_ACKS_PROBE_STOPPING. BBR_ACK_PHASE_ACKS_PROBE_STOPPED
would have been a good candidate.

Remove the code in relation with BBR.ack_phase.

Must be backported to 3.1.

(cherry picked from commit b47e1e65df1b468bcd84f02fbc1eb4e1fd7e2bc9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: wrong logical statement in in_recovery_period() (BBR)

A && logical operator was badly replaced by a || in this function which decides
if BBR is in a recovery period.

Must be backported to 3.1.

(cherry picked from commit 1dbf6b8bedadd5d287ab44e1a387a89e2fc6c70b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR)

The windowed filters are used only the BBR implementation for QUIC to filter
the maximum bandwidth samples for its estimation over a virtual time interval
tracked by counting the cyclical progression through ProbeBW cycles. ngtcp2
and quiche use such windowed filters in their BBR implementation. But in a
slightly different way. When updating the 2nd or 3rd filter samples, this
is done based on their values in place of the time they have been sampled.
It seems more logical to rely on the sample timestamps even if this has no
implication because when a sample is updated using another sample because it
has the same value, they have both the same timestamps!

This patch modifies two statements which compare two consecutive filter samples
based on their values (smp[]->v) by statements which compare them based on the
virtual time they have been sampled (smp[]->t). This fully complies which the
code used by the Linux kernel in lib/win_minmax.c.

Alo take the opportunity of this patch to shorten some statements using <smp>
local variable value to update smp[2] sample in place of initializing its two
members with the <smp> member values.

This patch SHOULD be easily backported to 3.1 where BBR was first implemented.

(cherry picked from commit a9a2f98f86a93dbefca636959234df161b572c21)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.1.1

Released version 3.1.1 with the following main changes :
    - BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
    - MINOR: proxy: Add support of 421-Misdirected-Request in retry-on status
    - BUG/MINOR: log: fix lf_text() behavior with empty string
    - BUG/MINOR: improve BBR throughput on very fast links
    - BUG/MEDIUM: event_hdl: fix uninitialized value in async mode when no data is provided
    - BUG/MEDIUM: quic: prevent stream freeze on pacing
    - BUG/MEDIUM: http-ana: Reset request flag about data sent to perform a L7 retry
    - BUG/MINOR: h1-htx: Use default reason if not set when formatting the response
    - BUILD: quic: fix a build error about an non initialized timestamp
    - BUG/MINOR: signal: register default handler for SIGINT in signal_init()
    - BUG/MINOR: startup: close pidfd and free global.pidfile in handle_pidfile()
    - BUG/MINOR: startup: fix pidfile creation
    - BUG/MINOR: quic: fix bbr_inflight() calls with wrong gain value
    - BUG/MEDIUM: init: make sure only daemonized processes change their session
    - BUG/MINOR: init: do not call fork_poller() for non-forked processes
    - BUG/MEDIUM: mux-quic: remove pacing status when everything is sent
    - BUG/MINOR: quic: remove startup alert if conn socket-owner unsupported
    - BUG/MINOR: quic: remove startup alert if GSO unsupported
    - BUG/MEDIUM: mux-h2: make sure not to touch dummy streams when sending WU
    - BUG/MINOR: config: Fix parsing of accept-invalid-http-{request,response}
    - DOC: config: fix confusing init-state examples
    - BUG/MINOR: debug: COUNT_IF() should return true/false
    - BUILD: debug: fix build issues in COUNT_IF() with -Wunused-value
    - MINOR: mux-h2/traces: add a missing trace on negative initial window size
    - CLEANUP: mux-h2/traces: reword certain ambiguous traces
    - MINOR: mux-h2/glitches: add a description to the H2 glitches
    - BUG/MINOR: mux-h2: fix expression when detecting excess of CONTINUATION frames
    - BUG/MINOR: mworker: don't save program PIDs in oldpids
    - BUG/MINOR: mworker: fix -D -W -sf/-st modes
    - BUG/MINOR: startup: fix error path for master, if can't open pidfile
    - BUG/MEDIUM: startup: don't daemonize if started with -c
    - BUG/MEDIUM: startup: report status if daemonized process fails
    - BUG/MEDIUM: mworker: report status, if daemonized master fails
    - BUG/MINOR: namespace: handle a possible strdup() failure
    - BUG/MINOR: ssl_crtlist: handle a possible strdup() failure
    - BUG/MINOR: resolvers: handle a possible strdup() failure
    - BUG/MINOR: stats: decrement srv refcount on stats-file release
    - MINOR: list: define a watcher type
    - BUG/MEDIUM: stats/server: use watcher to track server during stats dump
    - BUG/MINOR: http-fetch: Ignore empty argument string for query()
    - BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks
    - BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency

BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency

Since 9c91b30 ("MINOR: server: remove prev_deleted server list"), hlua
server pair iterator may use and return invalid (stale) server pointer
if multiple servers were deleted between two iterations.

Indeed, the server refcount mechanism (using srv_take()) is no longer
sufficient as the prev_deleted mitigation was removed.

To ensure server pointer consistency between two yields, the new watcher
mechanism must be used (as it already the case for stats dumping).

Thus in this patch we slightly change the server iteration logic:
hlua_server_list_iterator_context struct now stores the next valid server
pointer, and a watcher is added to ensure this pointer is never stale.

Then in hlua_listable_servers_pairs_iterator(), this next pointer is used
to create the Lua server object, and the next valid pointer is obtained by
leveraging watcher_next().

No backport needed unless 9c91b30 ("MINOR: server: remove prev_deleted
server list") is. Please note that dynamic servers were not supported in
Lua prior to 2.8, so it doesn't make sense to backport this patch further
than 2.8.

(cherry picked from commit 358166ae6a3e98d36b378c7eeab1673cc8b4a4dd)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks

"hold.timeout" was used as expiration date for srvrq_check tasks. But it is
not accurrate. The expiration date must be based on the resolution timeouts
instead (resolve and retry).

The purpose of srvrq_check task is to clean up the server resolution status
when outdated info are inherited from the state file. Using "hold.timeout"
is not accurrate here because hold timeouts concern the resolution response
items not the resolution status of servers. It may be set to a huge value or
0. The expiration date of these tasks must be based on the resolution
timeouts instead.

So now the ("timeout resolve" + resolve_retries * "timeout retry") value is
used.

This patch should fix the issue #2816. It must be backported to all stable
versions.

(cherry picked from commit 647a2906626e9c3d9c3349d338a35798325496f2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: http-fetch: Ignore empty argument string for query()

query() sample fetch function takes an optional argument string. During
configuration parsing, empty string must be ignored. It is especially
important when the sample is used with empty parenthesis. The argument is
optional and it is a list of options to configure the behavior of the sample
fetch. So it is logical to ignore empty strings.

This patch should fix the issue #2815. It must be backported to 3.1.

(cherry picked from commit e1525e7b8fa15c59c1c6fc4063b5218d134b7afb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: stats/server: use watcher to track server during stats dump

If a server A is deleted while a stats dump is currently on it, deletion
is delayed thanks to reference counting. Server A is nonetheless removed
from the proxy list. However, this list is a single linked list. If the
next server B is deleted and freed immediately, server A would still
point to it. This problem has been solved by the prev_deleted list in
servers.

This model seems correct, but it is difficult to ensure completely its
validity. In particular, it implies when stats dump is resumed, server A
elements will be accessed despite the server being in a half-deleted
state.

Thus, it has been decided to completely ditch the refcount mechanism for
stats dump. Instead, use the watcher element to register every stats
dump currently tracking a server instance. Each time a server is deleted
on the CLI, each stats dump element which may points to it are updated
to access the next server instance, or NULL if this is the last server.
This ensures that a server which was deleted via CLI but not completely
freed is never accessed on stats dump resumption.

Currently, no race condition related to dynamic servers and stats dump
is known. However, as described above, the previous model is deemed too
fragile, as such this patch is labelled as bug-fix. It should be
backported up to 2.6, after a reasonable period of observation. It
relies on the following patch :
MINOR: list: define a watcher type

(cherry picked from commit 071ae8ce3d1a318d2227fad2ebf63e78a05815f0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: list: define a watcher type

Define a new watcher type into list module. This type is similar to bref
and can be used to register an element which is currently tracking a
dynamic target. Contrary to bref, if the target is freed, every watcher
element are updated to point to a next valid entry or NULL.

This type will simplify handling of dynamic servers deletion, in
particular while stats dump are performed.

This patch is not a bug-fix. However, it is mandatory to fix a race
condition in dynamic servers. Thus, it should be backported along the
next commit up to 2.6.

(cherry picked from commit eafa8a32bb19ee567cb0a60a4042a8ef436e1e94)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats: decrement srv refcount on stats-file release

Servers instance may be removed at runtime. This can occurs during a
stat dump which currently references this server instance. This case is
protected by server refcount to prevent the server immediate release.

CLI output may be interrupted prior to stats dump completion, for
example if client CLI has been disconnected before the full response
transfer. As such, srv_drop() must be called in every stats dump release
callback.

srv_drop() was missing for stats-file dump release callback. This could
cause a race condition which would prevent a server instance to be fully
removed. Fix this by adding srv_drop() invokation into
cli_io_handler_release_dump_stat_file().

This should be backported up to 3.0.

(cherry picked from commit 2199179461957c86f57e39c4d97c37f3e4d10935)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: resolvers: handle a possible strdup() failure

This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.

(cherry picked from commit 193c94a53988f77b827e5f8bb5121d3dc3b2b5ed)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl_crtlist: handle a possible strdup() failure

This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.

(cherry picked from commit ce30bc17305e4ac2dec7d641eedcb301a237d863)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: namespace: handle a possible strdup() failure

This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to all supported branches.

(cherry picked from commit abee5468509a77537015a7e5dc37268d4d564e03)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mworker: report status, if daemonized master fails

As daemonization fork happens now very early and before the master-worker
fork, if master or worker processes fail during the initialization, some
critical errors can't be reported to stdout. The launching (parent) process in
such cases exits with 0. This makes an impression, that master and his worker
have successfully started at background, which really complicates the
operations.

In the previous commit a pipe was added to make daemonized child communicate
with his parent. Let's add the same logic to master-worker mode. Up to
receiving the READY message from the worker, master will "forward" it via the
pipe to the launching process. Launching process can obtain master's exit
status, if the master fails to start and nothing has been written in the pipe.

This fix should be backported only in 3.1.

(cherry picked from commit 97aaf7671654ceba16727f8375104064538c40e6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: startup: report status if daemonized process fails

Due to master-worker rework, daemonization fork happens now before parsing
and applying the configuration. This makes impossible to report correctly all
warnings and alerts to shell's stdout. Daemonzied process fails, while being
already in background, exit code reported by shell via '$?' equals to 0, as
it's the exit code of his parent.

To fix this, let's create a pipe between parent and daemonized child. The
child will send into this pipe a "READY" message, when it finishes his
initialization. The parent will wait on the "read" end of the pipe until
receiving something. If read() fails, parent obtains the status of the
exited child with waitpid(). So, the parent can correctly report the error to
the stdout and he can exit with child's exitcode.

This fix should be backported only in 3.1.

(cherry picked from commit 663d75e7a00a9c50185cca322cbc64de934b6631)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: startup: don't daemonize if started with -c

Due to master-worker refactoring, daemonization fork happens now very early,
before parsing and verifying the configuration. For the moment there is no
any specific syntax, which needs for the daemon mode to be really applied in
order to perform the tests.

So, it's better not to do the daemonization fork, if 'daemon' keyword is
presented in the config (or -D option), when we started with -c (MODE_CHECK).
Like this, during the config verification, the process will always stay in
foreground and all warning or errors will be delivered to the stdout.

This fix should be backported only in 3.1.

(cherry picked from commit 5f94e98d8958ff0003bfbb736427070d16270d6e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: startup: fix error path for master, if can't open pidfile

If master process can't open a pidfile, there is no sense to send SIGTTIN to
oldpids, as it will exit. So, old workers will terminate as well. It's better
to send the last alert to the log about unrecoverable error, because master is
already in its polling loop.

For the standalone mode we should keep the previous logic in this case: send
SIGTTIN to old process and unbind listeners for the new one. So, it's better
to put this error path in main(), as it's done when other configuration settings
can't be applied.

This patch should be backported only in 3.1.

(cherry picked from commit cd0b58e23e12f12369bc1a6e65c6a97409ee8cd6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: fix -D -W -sf/-st modes

When a new master process is launched like below:

./haproxy -W -D -p ha.pid -sf $(cat ha.pid)...

The old master process and its workers do not stop. Since the master-worker
refactoring, the code, which sends USR1/TERM to old pids from -sf, is called
only for the standalone mode. In master-worker mode we should receive the READY
message from the newly forked worker at first, in order to be able to terminate
the previous master.

So, to fix this, let's terminate the previous master in _send_status(), where
we parse the READY message from the newly forked worker. And let's continue to
use oldpids array, as it was in 3.0, in order to stop the workers, launched
before the reload.

This patch should be backported only in 3.1.

(cherry picked from commit ee111d2004ca653387515f340c504d18b71191c3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: don't save program PIDs in oldpids

After reload, previously launched programs are stopped explicitly in
mworker_ext_launch_all(). So, there is no longer need to save their PIDs in
oldpids array before the master reexec().

This also prepares the fix of "-D -W -sf/-st" modes, as we will need to
loop over this array in the master process context, in order to stop the
previous master, when the new one is ready.

This patch should be backported only in 3.1.

(cherry picked from commit 1fead6c0ca8f8ad2f354fb4dfab241ec5281737c)
Signed-off-by: Willy Tarreau <w@1wt.eu>