Repositories - haproxy-3.0.git/log

MINOR: debug: also add a pointer to struct global to post_mortem

The pointer to struct global is also an important element to have in
post_mortem given that it's used a lot to take decisions in the code.
Let's just add it. It's worth noting that we could get rid of argc/argv
at this point since they're also present in the global struct, but they
don't cost much there anyway.

(cherry picked from commit 2f04ebe14aca91f4a0fafcd03a0f310d98d97aaf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: do not limit backtraces to stuck threads

Historically for size limitation reasons, we would only dump the
backtrace of stuck threads. The problem is that when triggering
a panic or other reasons, we have no backtrace, which effectively
limits it to the watchdog timer. It's also visible in "show threads"
which used to report backtraces for all threads in 2.4 and displays
none nowadays, making its use much more limited.

A first approach could be to just dump the thread that triggers the
panic (in addition to stuck threads). But that remains quite limited
since "show threads" would still display nothing. This patch takes a
better approach consisting in dumping all non-idle threads. This way
the output is less polluted that with the older approach (no need to
dump all those waiting in the poller), and all active threads are
visible, in panics as well as in "show threads". As such, the CLI
command "debug dev panic" now dmups backtraces again. This is already
a benefit which will ease testing of various locations against the
ability to resolve useful symbols.

(cherry picked from commit 4adb2d864d7e3ca9df1e39beabf7b2ffa5aee35c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: print gdb hints when crashing

To make bug reporting easier for users, when crashing, let's suggest
what to do. Typically when a BUG_ON() matches, only the current thread
is useful the vast majority of the time, while when the watchdog
triggers, all threads are interesting.

The messages are printed at the end after the dump. We may adjust these
with wiki links in the future is more detailed instructions are relevant.

(cherry picked from commit 8f204fa8aeadef3faea4471ba9cfd93d9d168960)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name

These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).

The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.

(cherry picked from commit 601b34fe7bd50c733a437f26817580bbd56c8d56)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: rawsock: set connection error codes when returning from recv/send/splice

For a long time the errno values returned by recv/send/splice() were not
translated to connection error codes. There are not that many eligible
and having them would help a lot when debugging some complex issues where
logs disagree with network traces. Let's add them now.

(cherry picked from commit 822d82caf4165f0f6da681737c7e3db17d01f599)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: connection: add more connection error codes to cover common errno

While we get reports of connection setup errors in fc_err/bc_err, we
don't have the equivalent for the recv/send/splice syscalls. Let's
add provisions for new codes that cover the common errno values that
recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT,
EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case
for when the poller reported the error itself. It's worth noting that
EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code
and should not be reported.

The only thing is that it's quite hard to forcefully (and reliably)
trigger these errors in automated tests as the timing is critical.
Using iptables to manually reset established connections in the
middle of large transfers at least permits to see some ECONNRESET
and/or EPIPE, but the other ones are harder to trigger.

(cherry picked from commit 00c383ff65c6378327382d2c055f66efb098498d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats: Fix the name for the total number of streams created

Because of a copy/paste error, CurrStreams was reused by mistake. It should
be "CumStreams"

No backports needed.

(cherry picked from commit 131b877565db423930909f0c26f25e000cbd6e3b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: stream/stats: Expose the total number of streams ever created in stats

A shared counter is added in the thread context to track the total number of
streams created on the thread. This number is then reported in stats. It
will be a useful information to diagnose some bugs.

(cherry picked from commit 273d322b6fa8117423bbdc9b818002563d4fd3a3)
[wt: ctx adj in tinfo-t]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: stream/stats: Expose the current number of streams in stats

A shared counter is added in the thread context to track the current number
of streams. This number is then reported in stats. It will be a useful
information to diagnose some bugs.

(cherry picked from commit 18ee22ff766bd7399947af3be2b512ac5827b3c8)
[wt: adj ctx in tinfo-t]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: cli/debug: show dev: add cmdline and version

'show dev' command is very convenient to obtain haproxy debugging information,
while process is run in container. Let's extend its output with version and
cmdline. cmdline is useful in a way, as it shows absolute binary path and its
arguments, because sometimes the person, who is debugging failing container is
not the same, who has created and deployed it.

argc and argv are stored in the exported global structure, because
feed_post_mortem() is added as a post check function callback in the
post_check_list. So we can't simply change the signature of
feed_post_mortem(), without breaking other post check callbacks APIs.

Parsers are not supposed to modify argv, so we can safely bypass its pointer
to debug_parse_cli_show_dev(), without copying all argument stings somewhere
in the heap or on stack.

(cherry picked from commit 0d79c9bedfa564e3c032c1e910c29949f5133d91)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: fix malformed probing packet building

This bug arrived with this commit:

cdfceb10a MINOR: quic: refactor qc_prep_pkts() loop

which prevents haproxy from sending PING only packets/datagrams (some
packets/datagrams with only PING frame as ack-eliciting frames inside).
Such packets/datagrams are useful in rare cases during retransmissions
when one wants to probe the peer without exceeding the anti-amplification
limit.

Modify the condition passed to qc_build_pkt() to add padding to the current
datagram. One does not want to do that when probing the peer without ack-eliciting
frames passed as <frms> parameter. Indeed qc_build_pkt() calls qc_do_build_pkt()
which supports this case: if <probe> is true (probing required), qc_do_build_pkt()
handles the case where some padding must be added to a PING only packet/datagram.
This is the case when probing with an empty <frms> frame list of ack-eliciting
frames without exceeding the anti-amplification limit from qc_dgrams_retransmit().

Add some comments to qc_build_pkt() and qc_do_build_pkt() to clarify this
as this code is easy to break!

Thank you for @Tristan971 for having reported this issue in GH #2709.

Must be backported to 3.0.

(cherry picked from commit 217e467e89d15f3c22e11fe144458afbf718c8a8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry

It was the only one prefixed with "CO_ERR_", making it harder to batch
process and to look up. It was added in 2.5 by commit 61944f7a73 ("MINOR:
ssl: Set connection error code in case of SSL read or write fatal failure")
so it can be backported as far as 2.6 if needed to help integrate other
patches.

(cherry picked from commit 393957908bf492ff6660fba239106f0da7988fe8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: document connection error 44 (reverse connect failure)

It was missing from commit ac1164de7c ("MINOR: connection: define error
for reverse connect"), and can be backported to 3.0 and 2.9.

(cherry picked from commit abed9e0426c2f24522e0053452435082870e3afc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex: Fix dump of extra counters

When extra counters are dumped for an entity (frontend, backend, server or
listener), there is a filter on capabilities. Some extra counters are not
available for all entities and must be ignored. However, when this was
performed, the field number, used as an index to dump the metric value, was
still incremented while it should not and leads to an overflow or a stats
mix-up.

This patch must be backported to 3.0.

(cherry picked from commit d1adfd9fe41b0f9f67944eec07348213a7debbf3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: stream: Save last evaluated rule on invalid yield

When an action yields while it is not allowed, an internal error is
reported. This interrupts the processing. So info about the last evaluated
rule must be filled.

This patch may be bakcported if needed. If so, the commit ("MINOR: stream:
Save last evaluated rule on invalid yield") must be backported first.

(cherry picked from commit 0b7605491e4ccb66a0468c219306adf354355e0d)
[cf: Of course the mentionned commit to be backported with this one is wrong. It
must be "BUG/MINOR: http-ana: Report internal error if an action yields on
a final eval"].
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-ana: Report internal error if an action yields on a final eval

This was already performed for tcp actions at content level, but not for
HTTP actions. It is always a bug, so it must be reported accordingly.

This patch may be backported to all stable versions.

(cherry picked from commit 65ea29dcf85c6553e6dd0613a9c6c506fe22b9ac)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections

There were several flaws in the way the different timeouts were applied on
H1 connections. First, the H1C task handling timeouts was not created if no
client/server timeout was specified. But there are other timeouts to
consider. First, the client-fin/server-fin timeouts. But for frontend
connections, http-keey-alive and http-request timeouts may also be used. And
finally, on soft-stop, the close-spread-time value must be considered too.

So at the end, it is probably easier to always create a task to manage H1C
timeouts. Especially since the client/server timeouts are most often set.

Then, when the expiration date of the H1C's task must only be updated if the
considered timeout is set. So tick_add_ifset() must be used instead of
tick_add(). Otherwise, if a timeout is undefined, the taks may expire
immediately while it should in fact never expire.

Finally, the idle expiration date must only be considered for idle
connections.

This patch should be backported in all stable versions, at least as far as
2.6. On the 2.4, it will have to be slightly adapted for the idle_exp
part. On 2.2 and 2.0, the patch will have to be rewrite because
h1_refresh_timeout() is quite different.

(cherry picked from commit 3c09b34325a073e2c110e046f9705b2fddfa91c5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add missing glitch_{cnt,rate} sample definitions

Following previous commit, when glitch_cnt and glitch_rate data types were
implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for
glitch_cnt and glitch_rate"), newly exposed samples such as
table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and
src_glitch_rate() were documented but their definitions was missing in
supported keywords list.

It should be backported in 3.0 with c9c6b683f

(cherry picked from commit 0686fd8cfccd7ff12211b8253bf2446d62c90a18)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add missing glitch_{cnt,rate} data types

When glitch_cnt and glitch_rate data types were implemented in
c9c6b683f ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and
glitch_rate"), the data types list for "stick-table" keyword documentation
was overlooked.

This was reported by Nick Ramirez.

It should be backported in 3.0 with c9c6b683f.

(cherry picked from commit 9a6fc2d474511ead2fe8c39524d23b156d640ef8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly

Since commit 089c13850f ("MEDIUM: ssl: ssl-load-extra-del-ext work
only with .crt"), the 'set ssl cert' CLI command does not check
correctly if the transaction you are trying to update is the right one.

The consequence is that you could commit accidentaly a transaction on
the wrong certificate.

The fix introduces the check again in case you are not using
ssl-load-extra-del-ext.

This must be backported in all stable versions.

(cherry picked from commit 984d2cfb61744bed29ce92cdc5360155cbd8ca44)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: trace: stop rewriting argv with -dt

When using trace with -dt, the trace_parse_cmd() function is doing a
strtok which write \0 into the argv string.

When using the mworker mode, and reloading, argv was modified and the
trace won't work anymore because the first : is replaced by a '\0'.

This patch fixes the issue by allocating a temporary string so we don't
modify the source string directly. It also replace strtok by its
reentrant version strtok_r.

Must be backported as far as 2.9.

(cherry picked from commit 596db3ef86844617565a0b4b4ce8358fe6537d87)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: cli: remove non-printable characters from 'debug dev fd'

When using 'debug dev fd', the output of laddr and raddr can contain
some garbage.

This patch replaces any control or non-printable character by a '.'.

(cherry picked from commit 944a224358ab2865a3a1c0bf700aba38550b19cc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: store important pointers in post_mortem

Dealing with a core and a stripped executable is a pain when it comes
to finding pools, proxies or thread contexts. Let's put a pointer to
these heads and arrays in the post_mortem struct for easier location.
Other critical lists like this could possibly benefit from being added
later.

Here we now have:
  - tgroup_info
  - thread_info
  - tgroup_ctx
  - thread_ctx
  - pools
  - proxies

Example:
  $ objdump -h haproxy|grep post
   34 _post_mortem  000014b0  0000000000cfd400  0000000000cfd400  008fc400  2**8

  (gdb) set $pm=(struct post_mortem*)0x0000000000cfd400

  (gdb) p $pm->tgroup_ctx[0]
  $8 = {
    threads_harmless = 254,
    threads_idle = 254,
    stopping_threads = 0,
    timers = {
      b = {0x0, 0x0}
    },
    niced_tasks = 0,
    __pad = 0xf5662c <ha_tgroup_ctx+44> "",
    __end = 0xf56640 <ha_tgroup_ctx+64> ""
  }

  (gdb) info thr
    Id   Target Id                         Frame
  * 1    Thread 0x7f9e7706a440 (LWP 21169) 0x00007f9e76a9c868 in raise () from /lib64/libc.so.6
    2    Thread 0x7f9e76a60640 (LWP 21175) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    3    Thread 0x7f9e7613d640 (LWP 21176) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    4    Thread 0x7f9e7493a640 (LWP 21179) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    5    Thread 0x7f9e7593c640 (LWP 21177) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    6    Thread 0x7f9e7513b640 (LWP 21178) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    7    Thread 0x7f9e6ffff640 (LWP 21180) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
    8    Thread 0x7f9e6f7fe640 (LWP 21181) 0x00007f9e76b343c7 in wait4 () from /lib64/libc.so.6
  (gdb) p/x $pm->thread_info[0].pth_id
  $12 = 0x7f9e7706a440
  (gdb) p/x $pm->thread_info[1].pth_id
  $13 = 0x7f9e76a60640

  (gdb) set $px = *$pm->proxies
  while ($px != 0)
     printf "%#lx %s served=%u\n", $px, $px->id, $px->served
     set $px = ($px)->next
  end

  0x125eda0 GLOBAL served=0
  0x12645b0 stats served=0
  0x1266940 comp served=0
  0x1268e10 comp_bck served=0
  0x1260cf0 <OCSP-UPDATE> served=0
  0x12714c0 <HTTPCLIENT> served=0

(cherry picked from commit e5fccfe0b6397ec2b14ebc3a0d09646442b2018d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: place the post_mortem struct in its own section.

Placing it in its own section will ease its finding, particularly in
gdb which is too dumb to find anything in memory. Now it will be
sufficient to issue this:

  $ gdb -ex "info files" -ex "quit" ./haproxy core 2>/dev/null |grep _post_mortem
  0x0000000000cfd300 - 0x0000000000cfe780 is _post_mortem

or this:

   $ objdump -h haproxy|grep post
    34 _post_mortem  00001480  0000000000cfd300  0000000000cfd300  008fc300  2**8

to spot the symbol's address. Then it can be read this way:

   (gdb) p *(struct post_mortem *)0x0000000000cfd300

(cherry picked from commit 93c3f2a0b4da77c0317496b8585192fb64ef400f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: place a magic pattern at the beginning of post_mortem

In order to ease finding of the post_mortem struct in core dumps, let's
make it start with a recognizable pattern of exactly 32 chars (to
preserve alignment):

  "POST-MORTEM STARTS HERE+7654321\0"

It can then be found like this from gdb:

  (gdb) find 0x000000012345678, 0x0000000100000000, 'P','O','S','T','-','M','O','R','T','E','M'
  0xcfd300 <post_mortem>
  1 pattern found.

Or easier with any other more practical tool (who as ever used "find" in
gdb, given that it cannot iterate over maps and is 100% useless?).

(cherry picked from commit 989b02e1930d7ecd1a728c3d18ccfba095cdd636)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pools: export the pools variable

We want it to be accessible from debuggers for inspection and it's
currently unavailable. Let's start by exporting it as a first step.

(cherry picked from commit fba48e1c40287f1abb4066935f2436bd0b8cd7a4)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: debug: silence a build warning with threads disabled

Commit 091de0f9b2 ("MINOR: debug: slightly change the thread_dump_pointer
signification") caused the following warning to be emitted when threads
are disabled:

  src/debug.c: In function 'ha_thread_dump_one':
  src/debug.c:359:9: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

Let's just disguise the pointer to silence it. It should be backported
where the patch above was backported, since it was part of a series aiming
at making thread dumps more exploitable from core dumps.

(cherry picked from commit f163cbfb7f893a06d158880a753cad01908143d8)
[wt: s/MT_LIST_FOR_EACH_ENTRY_LOCKED/mt_list_for_each_entry_safe/ with
     two backup elements in 3.0]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: server: fix race on servers_list during server deletion

Each server is inserted in a global list named servers_list on
new_server(). This list is then only used to finalize servers
initialization after parsing.

On dynamic server creation, there is no issue as new_server() is under
thread isolation. However, when a server is deleted after its refcount
reached zero, srv_drop() removes it from servers_list without lock
protection. In the longterm, this can cause list corruption and crashes,
especially if multiple adjacent servers are removed in parallel.

To fix this, convert servers_list to a mt_list. This should not impact
performance as servers_list is not used during runtime outside of server
creation/deletion.

This should fix github issue #2733. Thanks to Chris Staite who first
found the issue here.

This must be backported up to 2.6.

(cherry picked from commit 7a02fcaf20dbc19db36052bbc7001bcea3912ab5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side

There is no reason to disable the 0-copy data forwarding if an end-of-stream
was reported on the consumer side. Indeed, the consumer will send data in
this case. So there is no reason to check the read side here.

This patch may be backported as far as 2.9.

(cherry picked from commit 362de90f3e4ddd0c15331c6b9cb48b671a6e2385)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding

When the response forwarding is aborted, we must not report a client abort
if a EOS was seen on client side. On abort performed by the stream must be
considered.

This bug was introduced when the SHUTR was splitted in 2 flags.

This patch must be backported as far as 2.8.

(cherry picked from commit 5970c6abec3e0ee4ac44364e999cae2cc852f4c8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error

When some data must be sent to the endpoint but an error was previously
reported, nothing is performed and we leave. But, in this case, the SC is not
notified the sends are blocked.

It is indeed an issue if the endpoint reports an error after consuming all
data from the SC. In the endpoint the outgoing data are trashed because of
the error, but on the SC, everything was sent, even if an error was also
reported.

Because of this bug, it is possible to have outgoing data blocked at the SC
level but without any write timeout armed. In some cases, this may lead to
blocking conditions where the stream is never closed.

So now, when outgoing data cannot be sent because an previous error was
triggered, a blocked send is reported. This way, it is possible to report a
write timeout.

This patch should fix the issue #2754. It must be backported as far as 2.8.

(cherry picked from commit fbc3de6e9e59679d2e9ece3984ce31b6a7dd418f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server: fix dynamic server leak with check on failed init

If a dynamic server is added with check or agent-check, its refcount is
incremented after server keyword parsing. However, if add server fails
at a later stage, refcount is only decremented once, which prevented the
server to be fully released.

This causes a leak with a server which is detached from most of the
lists but still exits in the system.

This bug is considered minor as only a few conditions may cause a
failure in add server after check/agent-check initialization. This is
the case if there is a naming collision or the dynamic ID cannot be
generated.

To fix this, simply decrement server refcount on add server error path
if either check and/or agent-check are flagged as activated.

This bug is related to github issue #2733. Thanks to Chris Staite who
first found the leak.

This must be backported up to 2.6.

(cherry picked from commit 116178563c2fb57e28a76838cf85c4858b185b76)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: activity/memprofile: show per-DSO stats

On systems where many libs are loaded, it's hard to track suspected
leaks. Having a per-DSO summary makes it more convenient. That's what
we're doing here by summarizing all calls per DSO before showing the
total.

(cherry picked from commit 401fb0e87a2cea7171e4d37da6094755eb10a972)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: activity/memprofile: always return "other" bin on NULL return address

It was found in a large "show profiling memory" output that a few entries
have a NULL return address, which causes confusion because this address
will be reused by the next new allocation caller, possibly resulting in
inconsistencies such as "free() ... pool=trash" which makes no sense. The
cause is in fact that the first caller had an entry->info pointing to the
trash pool from a p_alloc/p_free with a NULL return address, and the second
had a different type and reused that entry.

Let's make sure undecodable stacks causing an apparent NULL return address
all lead to the "other" bin.

While this is not exactly a bug, it would make sense to backport it to the
recent branches where the feature is used (probably at least as far as 2.8).

(cherry picked from commit 5091f90479ab4d963b55cb725cee8201d93521d9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families

As described in GH #2765, there were situations where http connections
would be re-used for requests to different endpoints, which is obviously
unexpected. In GH #2765, this occured with httpclient and UNIX socket
combination, but later code analysis revealed that while disabling http
reuse on httpclient proxy helped, it didn't fix the underlying issue since
it was found that conn_calculate_hash_sockaddr() didn't take into account
families such as AF_UNIX or AF_CUST_SOCKPAIR, and because of that the
sock_addr part of the connection wasn't hashed.

To properly fix the issue, let's explicly handle UNIX (both regular and
ABNS) and AF_CUST_SOCKPAIR families, so that the destination address is
properly hashed. To prevent this bug from re-appearing: when the family
isn't known, instead of doing nothing like before, let's fall back to a
generic (unoptimal) hashing which hashes the whole sockaddr_storage struct

As a workaround, http-reuse may be disabled on impacted proxies.
(unfortunately this doesn't help for httpclient since reuse policy
defaults to safe and cannot be modified from the config)

It should be backported to all stable versions.

Shout out to @christopherhibbert for having reported the issue and
provided a trivial reproducer.

[ada: prior to 3.0, ctx adjt is required because conn_hash_update()'s
prototype is slightly different]

(cherry picked from commit b5b40a9843e505ed84153327ab897ca0e8d9a571)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF

When data are sent via the zero-copy data forwarding, in h2_done_ff, we must
be sure to remove the H2 stream from the send list if something is send. It
was only performed if no blocking condition was encountered. But we must
also do it if something is sent. Otherwise the transfer may be blocked till
timeout.

This patch must be backported as far as 2.9.

(cherry picked from commit ded28f6e5c210b49ede7edb25cd4b39163759366)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF

During the zero-copy data forwarding, the caller specify the maximum amount
of data the producer may push. However, the HTML stats applet does not use
it and can fill all the free space in the buffer. It is especially an issue
when the consumer is limited by a flow control, like the H2. Because we may
emit too large DATA frame in this case. It is especially visible with big
buffer (for instance 32k).

In the early age or zero-copy data forwarding, the caller was responsible to
pass a properly resized buffer. And during the different refactoring steps,
this has changed but the HTML stats applet was not updated accordingly.

To fix the bug, the buffer used to dump the HTML page is resized to be sure
not too much data are dumped.

This patch should solve the issue #2757. It must be backported to 3.0.

(cherry picked from commit 529e4f36a353bca292196e1344a79b8cd4ba143c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent

A stream may be shut without any HTX EOM reported to report a proper
closure. This is the case for QCS instances flagged with
QC_SF_UNKNOWN_PL_LENGTH. Shut is performed with an empty FIN emission
instead of a RESET_STREAM. This has been implemented since the following
patch :

24962dd1784dd22babc8da09a5fc8769617f89e3
BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length

However, in case of HTTP/3, an empty FIN should only be done after a
full message is emitted, which requires at least a HEADERS frame. If an
empty FIN is emitted without it, client may interpret this as invalid
and close the connection. To prevent this, fallback to a RESET_STREAM
emission if no data were emitted on the stream.

This was reproduced using ngtcp2-client with 10% loss (-r 0.1) on a
remote host, with httpterm request "/?s=100k&C=1&b=0&P=400". An error
ERR_H3_FRAME_UNEXPECTED is returned by ngtcp2-client when the bug
occurs.

Note that this change is incomplete. The message validity depends solely
on the application protocol in use. As such, a new app_ops callback
should be implemented to ensure the stream is closed accordingly.
However, this first patch ensures that at least HTTP/3 case is valid
while keeping a minimal backport process.

This should be backported up to 2.8.

(cherry picked from commit 68c8c910238f0b759d75b4da2128370abf184cd1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mworker: fix mworker-max-reloads parser

Before this patch, when wrong argument was provided in the configuration for
mworker-max-reloads keyword, parser shows these errors below on the stderr:

[WARNING]  (1820317) : config : parsing [haproxy.cfg:154] : (null)parsing [haproxy.cfg:154] : 'mworker-max-reloads' expects an integer argument.

In a case, when by mistake two arguments were provided instead of one, this has
also triggered a buggy error message:

[ALERT]    (1820668) : config : parsing [haproxy.cfg:154] : 'mworker-max-reloads' cannot handle unexpected argument '45'.
[WARNING]  (1820668) : config : parsing [haproxy.cfg:154] : (null)

So, as 'mworker-max-reloads' is parsed in discovery mode by master process
let's align now its parser with all others, which could be called for this
mode. Like this in cases, when there are too many args or argument isn't a
valid integer we return proper error codes to global section parser and
messages are formated properly.

This fix should be backported in all stable versions.

(cherry picked from commit af1d170122369094a1f3869791fb34fb7286e31e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: fix rfc7239 forwarded typo in desc

replace specicy with specify in rfc7239 forwarded option description.
Multiple occurences were found.

May be backported in 2.8.

(cherry picked from commit 45cbbdc84551e51cdaf0046e1371e8495d053fb5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: quic: avoid freezing 0RTT connections

This issue came with this commit:

f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT

and could be easily reproduced with picoquic QUIC client with -Q option
which splits a big ClientHello TLS message into two Initial datagrams.
A second condition must be fulfilled to reprodue this issue: picoquic
must not send the token provided by haproxy (NEW_TOKEN). To do that,
haproxy must be patched to prevent it to send such tokens.

Under these conditions, if haproxy has enough time to reply to the first Initial
datagrams, when it receives the second Initial datagram it sends a Retry paquet.
Then the client ignores the Retry paquet as mentionned by RFC 9000:

17.2.5.2. Handling a Retry Packet
    A client MUST accept and process at most one Retry packet for each connection
    attempt. After the client has received and processed an Initial or Retry packet
    from the server, it MUST discard any subsequent Retry packets that it receives.

On its side, haproxy has closed the connection. When it receives the second
Initial datagram, it open a new connection but with Initial packets it
cannot decrypt (wrong ODCID) leaving the client without response.

To fix this, as the aim of the token (NEW_TOKEN) sent by haproxy is to validate
the peer address, in place of closing the connection when no token was received
for a 0RTT connection, one leaves this validation to the handshake process.
Indeed, the peer adress is validated during the handshake when a valid handshake
packet is received by the listener. But as one does not want haproxy to process
0RTT data when no token was received, one does not accept the connection before
the successful handshake completion. In addition to this, the 0RTT packets
are not released after successful handshake completion when no token was received
to leave a chance to haproxy to process these 0RTT data in such case (see
quic_conn_io_cb()).

Must be backported as far as 2.9.

(cherry picked from commit b1af5dabf0c4af1eda3a520a90332df1f4c12dcf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: avoid leaking post handshake frames

This bug came with this commit:
f627b92 BUG/MEDIUM: quic: always validate sender address on 0-RTT
If an error happens in quic_build_post_handshake_frames() during the
code exexuted for th NEW_TOKEN frame allocation, some could leak because
of the wrong label used to interrupt this function asap.
Replace the "goto leave" by "goto err" to deallocated such frames to fix
this issue.

Must be backported as far as 2.9.

(cherry picked from commit 19aa320f640f701544c3441787da1577a2479590)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: Never reuse server connection in http-messaging/truncated.vtc

A "Connection: close" header is added to responses to avoid any connection
reuse. This should avoid errors on the client side.

(cherry picked from commit e7be13da87f8ec00470ef60bb43b85f0480fd85d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter

When a filter is registered on the data, it means it may change the payload
length by rewritting data. It means consumers of the message cannot trust the
expected length of payload as announced by the producer. The commit 8bd835b2d2
("MEDIUM: filters/htx: Don't rely on HTX extra field if payload is filtered")
was pushed to solve this issue. When the HTTP payload of a message is filtered,
the extra field is set to 0 to be sure it will never be used by error by any
consumer. However, it is not enough.

Indeed, the filters must be called before fowarding some data. They cannot be
by-passed. But if a consumer is unable to flush the HTX message, some outgoing
data can remain blocked in the channel's buffer. If some new data are then
pushed because there is some room in the channel's buffe, the producer will set
the HTX extra field. At this stage, if the consumer is unblocked and can send
again data, it is possible to call it to forward outgoing data blocked in the
channel's buffer before waking the stream up to filter new input data. It is the
purpose of the data fast-forwarding. In this case, the HTX extra field will be
seen by the consumer. It is unexpected and leads to undefined behavior.

One consequence of this bug is to perform a wrong chunking on compressed
messages, leading to processing errors at the end of the message, reported as
"ID--" in logs.

To fix the bug, a HTX flag is added to state the payload of the current HTX
message is altered. When this flag is set (HTX_FL_ALTERED_PAYLOAD), the HTX
extra field must not be trusted. And to keep things simple, when this flag is
set, the HTX extra field is automatically set to 0 when the HTX message is
loaded, in htxbuf() function.

It is probably the less intrusive way to fix the bug for now. But this part must
be reviewed to save meta-info of the HTX message outside of the message itself.

This commit should solve the issue #2741. It must be backported as far as 2.9.

(cherry picked from commit 52a3d807fc332b57b62f5e30aa6f697636a22695)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()

In sc_notify() function, the consumer side of the SC is tested to verify if
we must perform a shutdown on the endpoint. To do so, no output data must be
present in the buffer and in the iobuf. However, there is a bug here, the
iobuf of the opposite SC is tested instead of the one of the current SC. So
a shutdown can be performed on the endpoint while there are still output
data in the iobuf that must be sent. Concretely, it can only be data blocked
in a pipe.

Because of this bug, data blocked in the pipe will be never sent. I've not
tested but I guess this may block the stream during the client or server
timeout.

This patch must be backported as far as 2.9.

(cherry picked from commit 0fcfed9e231f2bc3963fe6085598970db2174af1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid

If a parsing error is reported by the mux on the response payload, a proxy
error (PRXCOND) must be reported instead of a server abort (SRVCL). Because
of this bug, inavlid response may are reported as "SD--" or "SL--" in logs
instead of "PD--" or "PL--".

This patch must be backported to all stable versions.

(cherry picked from commit 6790067e79566b2ca5943e72200361c40001bde2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send

When a send attempt is performed on the opposite side from sc_notify() and
all outgoing data are sent while a shut was scheduled, the SE is shut down
because we consider all data were sent and no more are expected. However,
here we must also be carefull to have sent all pending data in the
iobuf. Indeed, some spliced data may be blocked. In this case, if the SE is
shut down, these data may be lost.

This patch should fix the original bug reported in #2749. It must be
backported as far as 2.9.

(cherry picked from commit 48f1e2b6fe8457bb5b9d8db9447157c244d871b7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()

Latest patches on the mworker rework skipped the httpclient_proxy
creation by accident. This is not supposed to happen because haproxy is
supposed to stop when the proxy creation failed, but it shows a flaw in
the API.

When the httpclient_proxy or the proxy used in parameter of
httpclient_new_from_proxy() is NULL, it will be dereferenced and cause a
crash.

The patch only returns a NULL when doing an httpclient_new() if the
proxy is not available.

Must be backported as far as 2.7.

(cherry picked from commit e7b7072943d658702eba3651d66c6093f1a79fa8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queue: make sure never to queue when there's no more served conns

Since commit 53f52e67a0 ("BUG/MEDIUM: queue: always dequeue the backend when
redistributing the last server"), we've got two reports again still showing
the theoretically impossible condition in pendconn_add(), including a single
threaded one.

Thanks to the traces, the issue could be tracked down to the redispatch part.
In fact, in non-determinist LB algorithms (RR, LC, FAS), we don't perform the
LB if there are pending connections in the backend, since it indicates that
previous attempts already failed, so we directly return SRV_STATUS_FULL. And
contrary to a previous belief, it is possible to meet this condition with
be->served==0 when redispatching (and likely with maxconn not greater than
the number of threads).

The problem is that in this case, the entry is queued and then the
pendconn_must_try_again() function checks if any connections are currently
being served to detect if we missed a race, and tries again, but that
situation is not caused by a concurrent thread and will never fix itself,
resulting in the loop.

All that part around pendconn_must_try_again() is still quite brittle, and
a safer approach would involve a sequence counter to detect new arrivals
and dequeues during the pendconn_add() call. But it's more sensitive work,
probably for a later fix.

This fix must be backported wherever the fix above was backported. Thanks
to Patrick Hemmer, as well as Damien Claisse and Basha Mougamadou from
Criteo for their help on tracking this one!

(cherry picked from commit ca275d99ce02e72d707fc87da133d739cdda5146)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests

If a small request is received on QUIC MUX frontend, it can be
transmitted directly with the FIN on attach operation. rcv_buf is
skipped by the stream layer. Thus, it is necessary to ensure that there
is similar behavior when FIN is reported either on attach or rcv_buf.

One difference was that se_expect_data() was called only for rcv_buf but
not on attach. This most obvious effect is that stream timeout was
deactivated for this request : client timeout was disabled on EOI but
server one not armed due to previous se_expect_no_data(). This prevents
the early closure of too long requests.

To fix this, add an invokation of se_expect_data() on attach operation.

This bug can simply be detected using httpterm with delay request (for
example /?t=10000) and using smaller client/server timeouts. The bug is
present if the request is not aborted on timeout but instead continue
until its proper HTTP 200 termination.

This has been introduced by the following commit :
85eabfbf672c57e4ed082da1b96c95348b331320
MEDIUM: mux-quic: Don't expect data from server as long as request is unfinished

This must be backported up to 2.8.

(cherry picked from commit 232083c3e5ca3f23a44fa64def6a88dd257c3b23)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()

To execute sample fetches and converters from lua. hlua API leverages the
sample API. Prior to executing the sample func, the arg checker is called
from hlua_run_sample_{fetch,conv}() to detect potential errors.

However, hlua_run_sample_{fetch,conv}() both pass NULL as <err> argument,
but it is wrong for two reasons. First we miss an opportunity to report
precise error messages to help the user know what went wrong during the
check.. and more importantly, some val check functions consider that the
<err> pointer is never NULL. This is the case for example with
check_crypto_hmac(). Because of this, when such val check functions
encounter an error, they will crash the process because they will try
to de-reference NULL.

This bug was discovered and reported by GH user @JB0925 on #2745.

Perhaps val check functions should make sure that the provided <err>
pointer is != NULL prior to de-referencing it. But since there are
multiple occurences found in the code and the API isn't clear about that,
it is easier to fix the hlua part (caller) for now.

To fix the issue, let's always provide a valid <err> pointer when
leveraging val_arg() check function pointer, and make use of it in case
or error to report relevant message to the user before freeing it.

It should be backported to all stable versions.

(cherry picked from commit f88f162868df9053ca71e3be0628221c36153d9a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: hlua: make hlua_ctx_renew() safe

hlua_ctx_renew() is called from unsafe places where the caller doesn't
expect it to LJMP.. however hlua_ctx_renew() makes use of Lua library
function that could potentially raise errors, such as lua_newthread(),
and it does nothing to catch errors. Because of this, haproxy could
unexpectedly crash. This was discovered and reported by GH user
@JB0925 on #2745.

To fix the issue, let's simply make hlua_ctx_renew() safe by applying
the same logic implemented for hlua_ctx_init() or hlua_ctx_destroy(),
which is catching Lua errors by leveraging SET_SAFE_LJMP_PARENT() helper.

It should be backported to all stable versions.

(cherry picked from commit d0e01051813bde5cb06bebe88102f2b2885b3dea)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: server: server stuck in maintenance after FQDN change

Pierre Bonnat reported that SRV-based server-template recently stopped
to work properly.

After reviewing the changes, it was found that the regression was caused
by a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part of MAINT")

Indeed, HMAINT is not a regular maintenance flag. It was implemented in
b418c122 a4d04c6 ("BUG/MINOR: server: make sure the HMAINT state is part
of MAINT"). This flag is only set (and never removed) when the server FQDN
is changed from its initial config-time value. This can happen with "set
server fqdn" command as well as SRV records updates from the DNS. This
flag should ideally belong to server flags.. but it was stored under
srv_admin enum because cur_admin is properly exported/imported via server
state-file while regular server's flags are not.

Due to a4d04c6, when a server FQDN changes, the server is considered in
maintenance, and since the HMAINT flag is never removed, the server is
stuck in maintenance.

To fix the issue, we partially revert a4d04c6. But this latter commit is
right on one point: HMAINT flag was way too confusing and mixed-up between
regular MAINT flags, thus there's nothing to blame about a4d04c6 as it was
error-prone anyway.. To prevent such kind of bugs from happening again,
let's rename HMAINT to something more explicit (SRV_ADMF_FQDN_CHANGED) and
make it stand out under srv_admin enum so we're not tempted to mix it with
regular maintenance flags anymore.

Since a4d04c6 was set to be backported in all versions, this patch must
be backported there as well.

(cherry picked from commit 85298189bf4c268b15c33aea95e0cc35113e25f0)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MEDIUM: debug: on panic, make the target thread automatically allocate its buf

One main problem with panic dumps is that they're filling the dumping
thread's trash, and that the global thread_dump_buffer is too small to
catch enough of them.

Here we're proceeding differently. When dumping threads for a panic, we're
passing the magic value 0x2 as the buffer, and it will instruct the target
thread to allocate its own buffer using get_trash_chunk() (which is signal
safe), so that each thread dumps into its own buffer. Then the thread will
wait for the buffer to be consumed, and will assign its own thread_dump_buffer
to it. This way we can simply dump all threads' buffers from gdb like this:

  (gdb) set $t=0
        while ($t < global.nbthread)
          printf "%s\n", ha_thread_ctx[$t].thread_dump_buffer.area
          set $t=$t+1
        end

For now we make it wait forever since it's only called on panic and we
want to make sure the thread doesn't leave and continues to use that trash
buffer or do other nasty stuff. That way the dumping thread will make all
of them die.

This would be useful to backport to the most recent branches to help
troubleshooting. It backports well to 2.9, except for some trivial
context in tinfo-t.h for an updated comment. 2.8 and older would also
require TAINTED_PANIC. The following previous patches are required:

   MINOR: debug: make mark_tainted() return the previous value
   MINOR: chunk: drop the global thread_dump_buffer
   MINOR: debug: split ha_thread_dump() in two parts
   MINOR: debug: slightly change the thread_dump_pointer signification
   MINOR: debug: make ha_thread_dump_done() take the pointer to be used
   MINOR: debug: replace ha_thread_dump() with its two components

(cherry picked from commit 278b9613a32ddb0b5ffa1d66e6c25f434758c65a)
[wt: ctx updt in tinfo-t for comment]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: replace ha_thread_dump() with its two components

At the few places we were calling ha_thread_dump(), now we're
calling separately ha_thread_dump_fill() and ha_thread_dump_done()
once the data are consumed.

(cherry picked from commit afeac4bc026e64e27d99e50480bff5bc1ee60cb9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: make ha_thread_dump_done() take the pointer to be used

This will allow the caller to decide whether to definitely clear the
pointer and release the thread, or to leave it unlocked so that it's
easy to analyse from the struct (the goal will be to use that in panic()
so that cores are easy to analyse).

(cherry picked from commit d7c34ba479f79ed42c709c64679b4e3d1310cddd)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: slightly change the thread_dump_pointer signification

Now the thread_dump_pointer is returned ORed with 1 once done, or NULL
when cancelled (for now noone cancels). The goal will be to permit
the callee to provide its own pointer.

The ha_thread_dump_fill() function now returns the buffer pointer that
was used (without OR 1) or NULL, for ease of use from the caller.

(cherry picked from commit 091de0f9b2553463660a11c56598c4970d6b1066)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: split ha_thread_dump() in two parts

We want to have a function to trigger the dump and another one to wait
for it to be completed. This will be important to permit panic dumps to
be done on local threads. For now this does not change anything, as the
function still calls the two new functions one after the other.

(cherry picked from commit 2036f5bba1637c8dad3eca577aa3bed4cc53caaf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: chunk: drop the global thread_dump_buffer

This variable is not very useful and is confusing anyway. It was mostly
used to detect that a panic dump was still in progress, but we can now
check mark_tainted() for this. The pointer was set to one of the dumping
thread's trash chunks. Let's temporarily continue to copy the dumps to
that trash, we'll remove it later.

(cherry picked from commit a6698304e03847f3e114b22b4229b382f3ddffd1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: make mark_tainted() return the previous value

Since mark_tainted() uses atomic ops to update the tainted status, let's
make it return the prior value, which will allow the caller to detect
if it's the first one to set it or not.

(cherry picked from commit 8e048603d1bba0721433a5ae0480fbf1ab6f4897)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade

If a request is waiting for a protocol upgrade but it is not finished, the
data fast-forwarding is disabled. Otherwise, the request analyzers will miss
the end of the message.

This case is possible since the commit 01fb1a54 ("BUG/MEDIUM: mux-h1/mux-h2:
Reject upgrades with payload on H2 side only"). Indeed, before, a protocol
upgrade was not allowed for request with payload. But it is now possible and
this comes with a side-effect. It is not really satisfying but for now there
is no other way to sync the muxes and the applicative stream. It seems to be
a reasonnable fix for now, waiting for a deeper refactoring.

This patch must be backported with the commit above.

(cherry picked from commit cea1379cf1fcd5fd9c3b3b104f4c5374bc44bce2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding

During zero-copy data forwarding, the producer must set the EOI flag on the SE
when end of the message is reached. It is already done but there is a case where
this flag is set while it should not. When a request wants to perform a protocol
upgrade and it is waiting for the server response, the flag must not be set
because the HTTP message is finished but some data are possibly still expected,
depending on the server response. On a 101-switching-protocol, more data will be
sent because the producer is switch to TUNNEL state.

So, now, the right condition is used. In DONE state, SE_FL_EOI flag is set on the sedesc iff:

  - it is the response
  - it is the request and the response is also in DONNE state
  - it is a request but no a protocol upgrade nor a CONNECT

This patch must be backported as far as 2.9.

(cherry picked from commit 6b39e245e1d58698310fbb06b8122423232be9a4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server

An interesting bug was revealed by commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()"). When shutting
down a server to redistribute its connections, no check is made on the
backend's queue. If we're turning off the last server and the backend
has pending connections, these ones will wait there till the queue
timeout. But worse, since the commit above, we can enter an endless loop
in the following situation:

  - streams are present in the backend's queue
  - streams are purged on the last server via srv_shutdown_streams()
  - that one calls pendconn_redistribute(srv) which does not purge
    the backend's pendconns
  - a stream performs some load balancing and enters assign_server_and_queue()
  - assign_server() is called in turn
  - the LB algo is non-deterministic and there are entries in the
    backend's queue. The function notices it and returns SRV_STATUS_FULL
  - assign_server_and_queue() calls pendconn_add() to add the connection
    to the backend's queue
  - on return, pendconn_must_try_again() is called, it figures there's
    no stream served anymore on the server nor the proxy, so it removes
    the pendconn from the queue and returns 1
  - assign_server_and_queue() loops back to the beginning to try again,
    while the conditions have not changed, resulting in an endless loop.

Ideally a change count should be used in the queues so that it's possible
to detect that some dequeuing happened and/or that a last stream has left.
But that wouldn't completely solve the problem that is that we must never
ever add to a queue when there's no server streams to dequeue the new
entries.

The current solution consists in making pendconn_redistribute() take care
of the proxy after the server in case there's no more server available on
the proxy. It at least ensures that no pending streams are left in the
backend's queue when shutting streams down or when the last server goes
down. The try_again loop remains necessary to deal with inevitable races
during pendconn additions. It could be limited to a few rounds, though,
but it should never trigger if the conditions are sufficient to permit
it to converge.

One way to reproduce the issue is to run a config with a single server
with maxconn 1 and plenty of threads, then run in loops series of:

"disable server px/s;shutdown sessions server px/s;
  wait 100ms server-removable px/s; show servers conn px;
  enable server px/s"

on the CLI at ~10/s while injecting with around 40 concurrent conns at
40-100k RPS. In this case in 10s - 1mn the crash can appear with a
backtrace like this one for at least 1 thread:

  #0  pendconn_add (strm=strm@entry=0x17f2ce0) at src/queue.c:487
  #1  0x000000000064797d in assign_server_and_queue (s=s@entry=0x17f2ce0) at src/backend.c:1064
  #2  0x000000000064a928 in srv_redispatch_connect (s=s@entry=0x17f2ce0) at src/backend.c:1962
  #3  0x000000000064ac54 in back_handle_st_req (s=s@entry=0x17f2ce0) at src/backend.c:2287
  #4  0x00000000005ae1d5 in process_stream (t=t@entry=0x17f4ab0, context=0x17f2ce0, state=<optimized out>) at src/stream.c:2336

It's worth noting that other threads may often appear waiting after the
poller and one in server_atomic_sync() waiting for isolation, because
the event that is processed when shutting the server down is consumed
under isolation, and having less threads available to dequeue remaining
requests increases the probability to trigger the problem, though it is
not at all necessary (some less common traces never show them).

This should carefully be backported wherever the commit above was
backported.

(cherry picked from commit 53f52e67a08fc2cd5ae64caf4148a5de884f9959)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()

When shutting down server sessions, the queue was not considered, which
is a problem if some element reached the queue at the moment the server
was going down, because there will be no more requests to kick them out
of it. Let's always make sure we scan the queue to kick these streams
out of it and that they can possibly find a more suitable server. This
may make a difference in the time it takes to shut down a server on the
CLI when lots of servers are in the queue.

It might be interesting to backport this to 3.0 but probably not much
further.

(cherry picked from commit 1d403caf8aa59c9070f30ea16017261cab679fe2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: queue: make sure that maintenance redispatches server queue

Turning a server to maintenance currently doesn't redispatch the server
queue unless there's an explicit "option redispatch" and no "option
persist", while the former has never really been the purpose of this
test. Better refine this so that forced maintenance also causes the
queue to be flushed, and possibly redispatched unless the proxy has
option persist. This way now when turning a server to maintenance,
the queue is immediately flushed and streams can decide what to do.

This can be backported, though there's no need to go far since it was
never directly reported and only noticed as part of debugging some
rare "shutdown sessions" strangeness, which it might participate to.

(cherry picked from commit 1385e33eb089093dbc970dbc2759d2969ae533c5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: stream: make stream_shutdown() async-safe

The solution found in commit b500e84e24 ("BUG/MINOR: server: shut down
streams under thread isolation") to deal with inter-thread stream
shutdown doesn't work fine because there exists code paths involving
a server lock which can then deadlock on thread_isolate(). A better
solution then consists in deferring the shutdown to the stream itself
and just wake it up for that.

The only thing is that TASK_WOKEN_OTHER is a bit too generic and we
need to pass at least 2 types of events (SF_ERR_DOWN and SF_ERR_KILLED),
so we're now leveraging the new TASK_F_UEVT1 and _UEVT2 flags on the
task's state to convey these info. The caller only needs to wake the
task up with these flags set, and the stream handler will then finish
the job locally using stream_shutdown_self().

This needs to be carefully backported to all branches affected by the
dequeuing issue and containing any of the 5541d4995d ("BUG/MEDIUM:
queue: deal with a rare TOCTOU in assign_server_and_queue()"), and/or
b11495652e ("BUG/MEDIUM: queue: implement a flag to check for the
dequeuing").

(cherry picked from commit b8e3b0a18d59b4f52b4ecb7ae61cef0b8b2402a0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG

TASK_WOKEN_MSG only says "someone sent you a message" but doesn't convey
any info about the message. TASK_WOKEN_OTHER says "you're woken for another
reason" but doesn't tell which one. Most often they're used as-is by the
task handlers to report very specific situations.

For some important control notifications, having the ability to modulate
the message a little bit is useful, so let's define two user event types
UEVT1 and UEVT2 to be used in conjunction with TASK_WOKEN_MSG or _OTHER
so that the application can know that a specific condition was explicitly
requested. It will be used this way:

task_wakeup(s->task, TASK_WOKEN_MSG | TASK_F_UEVT1);
or:
task_wakeup(s->task, TASK_WOKEN_OTHER | TASK_F_UEVT2);

Since events are cumulative, keep in mind not to consider a 3rd value
as the combination of EVT1+EVT2; these really mean that the two events
appeared (though in unspecified order).

(cherry picked from commit b5281283bb00248152ba361f8d7306b37febbee0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: do not attempt to use backtrace() on linux without glibc

The function is provided by glibc. Nothing prevents us from using our
own outside of glibc there (tested on aarch64 with musl). We still do
not enable it by default as we don't yet know if all archs work well,
but it's sufficient to pass USE_BACKTRACE=1 when building with musl to
verify it's OK.

(cherry picked from commit 7caf073faa6962de15a18dadcaf200df95ce7889)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: tools: only include execinfo.h for the real backtrace() function

No need to include this possibly non-existing file when using our own
backtrace() implementation, it's only needed for the libc-provided one.
Because of this it's currently not possible to build musl with backtrace
enabled.

(cherry picked from commit 1c4776dbc3e7d0a73bd62bae78d509c3c54045d6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse-global: fix allowed args number for setenv

Keywords setenv and presetenv take 2 arguments: variable name and value.
So, the total number, that should be passed to alertif_too_many_args is 2
("setenv <name> <value>") instead of 3. For alertif_too_many_args the first
argument index is 0.

This should be backported in all stable versions.

(cherry picked from commit df68f7ec96c0c9d0e8f50643b96ec9062b7aa658)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server: make sure the HMAINT state is part of MAINT

In 1.8 when adding "set server fqdn" with commit b418c1228c ("MINOR:
server: cli: Add server FQDNs to server-state file and stats socket."),
the HMAINT flag was not made part of the MAINT ones, so technically
speaking when changing the FQDN, the server is not completely considered
as in maintenance mode.

In its defense, the code location around that was completely messy, with
the aggregator flag being hidden between other values and purposely but
discretely ignoring one of the flags, so the comments were updated to
make the intent clearer (particularly regarding CMAINT which looked like
it was also forgotten while it was on purpose).

This can be backported anywhere.

(cherry picked from commit a4d04c649af448cf2a728ebb91837ed69eea6217)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: cli: Deadlock when setting frontend maxconn

The proxy lock state isn't passed down to relax_listener
through dequeue_proxy_listeners, which causes a deadlock
in relax_listener when it tries to get that lock.

Backporting: Older versions didn't have relax_listener and directly called
resume_listener in dequeue_proxy_listeners. lpx should just be passed directly
to resume_listener then.

The bug was introduced in commit 001328873c352e5e4b1df0dcc8facaf2fc1408aa

[cf: This patch should fix the issue #2726. It must be backported as far as
2.4]

(cherry picked from commit a889413f5ec5e2c0c6ed30debfd16f666dbc0f99)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: cli: Be sure to catch immediate client abort

A client abort while nothing was sent is properly handled except when this
immediately happens after the connection was accepted. The read0 event is
caught before the CLI applet is created. In that case, the shutdown is not
handled and the applet is no longer wakeup. In that case, the stream remains
blocked and no timeout are armed.

The bug was due to the fact that when the applet I/O handler was called for
the first time, the applet context was initialized and nothing more was
performed. A shutdown, if any, would be handled on the next call. In that
case, it was too late.

Now, afet the init step, we loop to eval the first command. There is no
command here but the shutdown will be tested.

This patch should fix the issue #2727. It must be backported to 3.0.

(cherry picked from commit 14a413033c15aa4622368348ac6b142e7ec2c989)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-quic: report glitches to session

Glitch counter was implemented for QUIC/HTTP3. The counter is stored in
the QCC MUX connection instance. However, this is never reported at the
session level which is necessary if glitch counter is tracked via a
stick-table.

To fix this, use session_add_glitch_ctr() in various QUIC MUX functions
which may increment glitch counter.

This should be backported up to 3.0.

(cherry picked from commit fcd6d29acf108c55e1e1c17c04aeae621327adff)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: shorten a bit the delay for the h1/h2 upgrade test

Commit d6c4ed9a96 ("REGTESTS: h1/h2: Update script testing H1/H2
protocol upgrades") introduced a 0.5 second delay which is higher
than those of most other tests (usually 0.05 or 0.2) and triggers
timeouts on my side. Let's just shorten it to 0.2 since its goal
is only to send data separately.

Note: maybe a barrier approach would be possible, though not
studied.
(cherry picked from commit 33deb4babe83b361f9f7423030cd8cc8318ca2bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades

"http-messaging/protocol_upgrade.vtc" script was updated to test upgrades
for requests with a payload. It should fail when the request is sent to a H2
server. When sent to a H1 server, it should succeed, except if the server
replies before the end of the request.

(cherry picked from commit d6c4ed9a960d5125b3ca0b879eb127353a0b45c0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only

Since 1d2d77b27 ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade
requests with a body"), it is no longer possible to perform a protocol
upgrade for requests with a payload. The main reason was to be able to
support protocol upgrade for H1 client requesting a H2 server. In that case,
the upgrade request is converted to a CONNECT request. So, it is not
possible to convey a payload in that case.

But, it is a problem for anyone wanting to perform upgrades on H1 server
using requests with a payload. It is uncommon but valid. So, now, it is the
H2 multiplexer responsibility to reject upgrade requests, on server side, if
there is a payload. An INTERNAL_ERROR is returned for the H2S in that
case. On H1 side, the upgrade is now allowed, but only if the server waits
for the end of the request to return the 101-Switching-protocol
response. Indeed, it is quite hard to synchronise the frontend side and the
backend side in that case. Asking to servers to fully consume the request
payload before returned the response seems reasonable.

This patch should fix the issue #2684. It could be backported after a period
of observation, as far as 2.4 if possible. But only if it is not too
hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both
side are in DONE state".

(cherry picked from commit 001fb1a5488b5059ca4c26a82dd4f00e9850f1b4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state

For now, this case is already handled for all requests except for those
waiting for a tunnel establishment (CONNECT and protocol upgrades). It is
not an issue because only bodyless requests are supported in these cases. So
the request is always finished at the end of headers and therefore before
the response.

However, to relax conditions for full H1 protocol upgrades (H1 client and
server), this case will be necessary. Indeed, the idea is to be able to
perform protocol upgrades for requests with a payload. Today, the "Upgrade:"
header is removed before sending the request to the server. But to support
this case, this patch is required to properly finish transaction when the
server does not perform the upgrade.

(cherry picked from commit ad1ef94612fb6cba5b68f7d19c4646491b6da4af)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h2: reject extended connect for h2c protocol

This commit prevents forwarding of an HTTP/2 Extended CONNECT when "h2c"
or "h2" token is set as targetted protocol. Contrary to the previous
commit which deals with HTTP/1 mux, this time the request is rejected
and a RESET_STREAM is reported to the client.

This must be backported up to 2.4 after a period of observation.

(cherry picked from commit 7a5a30d28a3ee100c62603f7212cc1b313c53311)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h1: do not forward h2c upgrade header token

haproxy supports tunnel establishment through HTTP Upgrade mechanism.
Since the following commit, extended CONNECT is also supported for
HTTP/2 both on frontend and backend side.

commit 9bf957335e2c385b74901481f7a89c9565dfce53
MEDIUM: mux_h2: generate Extended CONNECT from htx upgrade

As specified by HTTP/2 rfc, "h2c" can be used by an HTTP/1.1 client to
request an upgrade to HTTP/2. In haproxy, this is not supported so it
silently ignores this. However, Connection and Upgrade headers are
forwarded as-is on the backend side.

If using HTTP/1 on the backend side and the server supports this upgrade
mechanism, haproxy won't be able to parse the HTTP response. If using
HTTP/2, mux backend tries to incorrectly convert the request to an
Extended CONNECT with h2c protocol, which may also prevent the response
to be transmitted.

To fix this, flag HTTP/1 request with "h2c" or "h2" token in an upgrade
header. On converting the header list to HTX, the upgrade header is
skipped if any of this token is present and the H1_MF_CONN_UPG flag is
removed.

This issue can easily be reproduced using curl --http2 argument to
connect to an HTTP/1 frontend.

This must be backported up to 2.4 after a period of observation.

(cherry picked from commit 7b89aa5b1943c1537a844cc375f7e8a86bac7942)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: connection: No longer include stconn type header in connection-t.h

It is a small change, but it is cleaner to no include stconn-t.h header in
connection-t.h, mainly to avoid circular definitions.

The related issue is #2502.

(cherry picked from commit 4b8098bf4831c0dfca4a058bd3170a5ed7ae8bbf)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>

[RELEASE] Released version 3.0.5

Released version 3.0.5 with the following main changes :
    - BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
    - BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
    - BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
    - BUILD: mux-pt: Use the right name for the sedesc variable
    - BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
    - BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
    - BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
    - BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
    - BUG/MEDIUM: http-ana: Report error on write error waiting for the response
    - BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
    - BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
    - BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
    - BUG/MINOR: fcgi-app: handle a possible strdup() failure
    - DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
    - BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
    - BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
    - BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
    - BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
    - BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
    - BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
    - BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
    - BUG/MINOR: proto_tcp: keep error msg if listen() fails
    - MINOR: channel: implement ci_insert() function
    - BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
    - REGTESTS: mcli: test the pipelined commands on master CLI
    - BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
    - BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
    - BUG/MINOR: h3: properly reject too long header responses
    - BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
    - BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
    - DOC: config: correct the table for option tcplog
    - BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
    - BUILD: quic: 32bits build broken by wrong integer conversions for printf()
    - BUG/MEDIUM: clock: also update the date offset on time jumps
    - MINOR: tools: Implement ipaddrcpy().
    - MINOR: quic: Implement quic_tls_derive_token_secret().
    - MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
    - MINOR: quic: Token for future connections implementation.
    - BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
    - MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
    - MINOR: quic: Implement qc_ssl_eary_data_accepted().
    - MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
    - BUG/MEDIUM: quic: always validate sender address on 0-RTT
    - BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
    - BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
    - DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
    - REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
    - BUG/MEDIUM: clock: detect and cover jumps during execution
    - BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
    - BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
    - BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
    - BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
    - BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
    - MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
    - BUG/MINOR: polling: fix time reporting when using busy polling
    - BUG/MINOR: clock: make time jump corrections a bit more accurate
    - BUG/MINOR: clock: validate that now_offset still applies to the current date
    - BUG/MEDIUM: queue: implement a flag to check for the dequeuing
    - BUG/MINOR: peers: local entries updates may not be advertised after resync
    - DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
    - BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
    - BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
    - BUG/MEDIUM: promex: Wait to have the request before sending the response
    - BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
    - MINOR: quic: convert qc_stream_desc release field to flags
    - MINOR: quic: implement function to check if STREAM is fully acked
    - BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
    - BUG/MINOR: quic: prevent freeze after early QCS closure

BUG/MINOR: quic: prevent freeze after early QCS closure

A connection freeze may occur if a QCS is released before transmitting
any data. This can happen when an error is detected early by the stream,
for example during HTTP response headers encoding, forcing the whole
connection closure.

In this case, a connection error is registered by the QUIC MUX to the
lower layer. MUX is then release and xprt layer is notified to prepare
CONNECTION_CLOSE emission. However, this is prevented because quic_conn
streams tree is not empty as it contains the qc_stream_desc previously
attached to the failed QCS instance. The connection will freeze until
QUIC idle timeout.

This situation is caused by an omission during qc_stream_desc release
operation. In the described situation, qc_stream_desc current buffer is
empty and can thus by removed, which is the purpose of this patch. This
unblocks this previously failed situation, with qc_stream_desc removal
from quic_conn tree.

This issue can be reproduced by modifying H3/QPACK code to return an
early error during HEADERS response processing.

This must be backported up to 2.6, after a period of observation.

(cherry picked from commit 3ef1ee477d3fef54633465b603206dd00bfefeba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

STREAM frames have dedicated handling on retransmission. A special check
is done to remove data already acked in case of duplicated frames, thus
only unacked data are retransmitted.

This handling is faulty in case of an empty STREAM frame with FIN set.
On retransmission, this frame does not cover any unacked range as it is
empty and is thus discarded. This may cause the transfer to freeze with
the client waiting indefinitely for the FIN notification.

To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer
have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC
when FIN has been transmitted. If set, it prevents qc_stream_desc to be
freed until FIN is acknowledged. On retransmission side,
qc_stream_frm_is_acked() has been updated. It now reports false if
FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN
set.

This must be backported up to 2.6. However, this modifies heavily
critical section for ACK handling and retransmission. As such, it must
be backported only after a period of observation.

This issue can be reproduced by using the following socat command as
server to add delay between the response and connection closure :
$ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;'

On the client side, ngtcp2 can be used to simulate packet drop. Without
this patch, connection will be interrupted on QUIC idle timeout or
haproxy client timeout with ERR_DRAINING on ngtcp2 :
$ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o"

Alternatively to ngtcp2 random loss, an extra haproxy patch can also be
used to force skipping the emission of the empty STREAM frame :

diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h
index efbdfe687..1ff899acd 100644
--- a/include/haproxy/quic_tx-t.h
+++ b/include/haproxy/quic_tx-t.h
@@ -26,6 +26,8 @@ extern struct pool_head *pool_head_quic_cc_buf;
/* Flag a sent packet as being probing with old data */
#define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5)

+#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6)
+
/* Structure to store enough information about TX QUIC packets. */
struct quic_tx_packet {
/* List entry point. */
diff --git a/src/quic_tx.c b/src/quic_tx.c
index 2f199ac3c..2702fc9b9 100644
--- a/src/quic_tx.c
+++ b/src/quic_tx.c
@@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
tmpbuf.size = tmpbuf.data = dglen;

TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc);
- if (!skip_sendto) {
+ if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) {
int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso);
if (ret < 0) {
if (gso && ret == -EIO) {
@@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
qc->cntrs.sent_bytes_gso += ret;
}
}
+ first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO;

b_del(buf, dglen + QUIC_DGRAM_HEADLEN);
qc->bytes.tx += tmpbuf.data;
@@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char *pos, const unsigned char *end,
continue;
}

+ switch (cf->type) {
+ case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F:
+ if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) {
+ TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc);
+ pkt->flags |= QUIC_FL_TX_PACKET_SKIP_SENDTO;
+ }
+ break;
+ default:
+ break;
+ }
+
quic_tx_packet_refinc(pkt);
cf->pkt = pkt;
}

(cherry picked from commit e177cf341cf3665e340855312714fe002688b2db)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: implement function to check if STREAM is fully acked

When a STREAM frame is retransmitted, a check is performed to remove
range of data already acked from it. This is useful when STREAM frames
are duplicated and splitted to cover different data ranges. The newly
retransmitted frame contains only unacked data.

This process is performed similarly in qc_dup_pkt_frms() and
qc_build_frms(). Refactor the code into a new function named
qc_stream_frm_is_acked(). It returns true if frame data are already
fully acked and retransmission can be avoided. If only a partial range
of data is acknowledged, frame content is updated to only cover the
unacked data.

This patch does not have any functional change. However, it simplifies
retransmission for STREAM frames. Also, it will be reused to fix
retransmission for empty STREAM frames with FIN set from the following
patch :
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.

(cherry picked from commit 714009b7bcf921836c2df7fc0d26b2ad257c8307)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: convert qc_stream_desc release field to flags

qc_stream_desc had a field <release> used as a boolean. Convert it with
a new <flags> field and QC_SD_FL_RELEASE value as equivalent.

The purpose of this patch is to be able to extend qc_stream_desc by
adding newer flags values. This patch is required for the following
patch
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM

As such, it must be backported prior to it.

(cherry picked from commit bb9ac256a1e5468535b8242dc762e7bb0d9a8bf3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: cfgparse-listen: fix option httpslog override warning message

"option httpslog" override warning messaged used to be reported as
"option httplog", probably as a result of copy paste without adjusting
the context. Let's fix that to prevent emitting confusing warning messages

The issue exists since 98b930d ("MINOR: ssl: Define a default https log
format"), thus it should be backported up to 2.6

(cherry picked from commit 17e52c922b577e1b677098b34e47cd0a85f31e8b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: promex: Wait to have the request before sending the response

It is similar to the previous fix about the stats applet ("BUG/MEDIUM:
cache/stats: Wait to have the request before sending the response").
However, for promex, there is no crash and no obvious issue. But it depends
on the filter. Indeed, the request is used by promex, independantly if it
was considered as forwarded or not. So if it is modified by the filter,
modification are just ignored.

Same bug, same fix. We now wait the request was forwarded before processing
it and produce the response.

(cherry picked from commit bb2a2bc5f2f6d864cd8770cbd2533d3df1878ad1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: cache/stats: Wait to have the request before sending the response

It seems obvious. On a classical workflow, the request headers analysis is
finished when these applets are woken up for the first time. So they don't
take care to really have the request to start to process it and to send the
response. But with a filter, it is possible to stop the request analysis
after the applet creation.

If this happens for the stats applet, this leads to a crash because we
retrieve the request start-line without checking if it is available. For the
cache applet, the response is just immediatly sent. And here it is a problem
if the compression is enabled. In that case too, this may lead to a crash
because the compression may be enabled but not initialized.

For a true server, there is no issue because the connection cannot be
established. The server is chosen only after the request analysis. The issue
with applets is that once created, an applet is quickly switched to the
established state. So it is probably a point that must be carefully reviewed
and probably reworked.

In the mean time, as a fix, in the cache and the stats applet, we just take
care to have the request before sending the response. This will do the
trick.

The patch must be backported as far as 2.6. On 2.6, the patch must be adapted.

(cherry picked from commit afc50f2445e02c778095cf4d7392c505f5af5278)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send

On a synchronous send from the stream to an applet, if some data were sent,
we must take care to wake the applet up. It is important because if
everything was sent at this stage, there is no other chance to wake the
applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc
in sc_update_tx() at the end of process_stream(). This flag prevent any
wakeup of the applet for a send event.

It is not necessary for a mux because the mux stream is called when a
syncrhonous send from the stream is performed. So it is reponsible to wake
the mux connection if necessary.

This patch must be backport to 3.0.

(cherry picked from commit 5fc12b0afd97ac68ef77335b9b5b9a0288f439f5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options

Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.

(cherry picked from commit 0f4fad5291027a7dfc8109fbbe2acd0bac8affd0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: peers: local entries updates may not be advertised after resync

Since commit 864ac3117 ("OPTIM: stick-tables: check the stksess without
taking the read lock"), when entries for a local table are learned from
another peer upon resynchro, and this is the only peer haproxy speaks to,
local updates on such entries are not advertised to the peer anymore,
until they eventually expire and can be recreated upon local updates.

This is due to the fact that ts->seen is always set to 0 when creating
new entry, and also when touch_remote is performed on the entry.

Indeed, while 864ac3117 attempts to avoid useless updates, it didn't
consider entries learned from a remote peer. Such entries are exclusively
learned in peer_treat_updatemsg(): once the entry is created (or updated)
with new data, touch_remote is used to commit the change. However, unlike
touch_local, entries committed using touch_remote will not be advertised
to the peer from which the entry was just learned (otherwise we would
enter a looping situation). Due to the above patch, once an entry is
learned from the (unique) remote peer, 'seen' will be stuck to 0 so it
will never be advertised for its whole lifetime.

Instead, when entries are learned from a peer, we should consider that
the peer that taught us the entry has seen it.

To do this, let's set seen=1 in peer_treat_updatemsg() after calling
touch_remote(). This way, if we happen to perform updates on this entry,
it will be properly advertized to relevant peers. This patch should not
affect the performance gain documented in 864ac3117 given that the test
scenario didn't involved entries learned by remote peers, but solely
locally created entries advertised to remote peers upon updates.

This should be backported in 3.0 with 864ac3117.

(cherry picked from commit 1e0920f85542d63755cb8824d1000c3a6f22bb9c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queue: implement a flag to check for the dequeuing

As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()") does have some
side effects in that it can occasionally cause an endless loop.

As Christopher analysed it, the problem is that process_srv_queue(),
which uses a trylock in order to leave only one thread in charge of
the dequeueing process, can lose the lock race against pendconn_add().
If this happens on the last served request, then there's no more thread
to deal with the dequeuing, and assign_server_and_queue() will loop
forever on a condition that was initially exepected to be extremely
rare (and still is, except that now it can become sticky). Previously
what was happening is that such queued requests would just time out
and since that was very rare, nobody would notice.

The root of the problem really is that trylock. It was added so that
only one thread dequeues at a time but it doesn't offer only that
guarantee since it also prevents a thread from dequeuing if another
one is in the process of queuing. We need a different criterion.

What we're doing now is to set a flag "dequeuing" in the server, which
indicates that one thread is currently in the process of dequeuing
requests. This one is atomically tested, and only if no thread is in
this process, then the thread grabs the queue's lock and dequeues.
This way it will be serialized with pendconn_add() and no request
addition will be missed.

It is not certain whether the original race covered by the fix above
can still happen with this change, so better keep that fix for now.

Thanks to @Yenya (Jan Kasprzak) for the precise and complete report
allowing to spot the problem.

This patch should be backported wherever the patch above was backported.

(cherry picked from commit b11495652e724d71f1f4247332f060fe48577664)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: clock: validate that now_offset still applies to the current date

We want to make sure that now_offset is still valid for the current
date: another thread could very well have updated it by detecting a
backwards jump, and at the very same moment the time got fixed again,
that we retrieve and add to the new offset, which results in a larger
jump. Normally, for this to happen, it would mean that before_poll
was also affected by the jump and was detected before and bounded
within 2 seconds, resulting in max 2 seconds perturbations.

Here we try to detect this situation and fall back to re-adjusting the
offset instead.

It's more of a strengthening of what's done by commit e8b1ad4c2b
("BUG/MEDIUM: clock: also update the date offset on time jumps") than a
pure fix, in that the issue was not direclty observed but it's visibly
possible by reading the code, so this should be backported along with
the patch above. This is related to issue GH #2704.

Note that this could be simplified in terms of operations by migrating
the deadlines to nanoseconds, but this was the path to least intrusive
changes.

(cherry picked from commit adaba6f904d11424327f90c3ab4df085c26fc8c4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: clock: make time jump corrections a bit more accurate

Since commit e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset
on time jumps") we try to update the now_offet based on the last known
valid date. But if it's off compared to the global_now_ns date shared
by other threads, we'll get the time off a little bit. When this happens,
we should consider the most recent of these dates so that if the global
date was already known to be more recent, we should use it and stick to
it. This will avoid setting too large an offset that could in turn provoke
a larger jump on another thread.

This is related to issue GH #2704.

This can be backported to other branches having the patch above.

(cherry picked from commit af48e4cc6b315cadb80d6980586af5c3c8368826)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: polling: fix time reporting when using busy polling

Since commit beb859abce ("MINOR: polling: add an option to support
busy polling") the time and status passed to clock_update_local_date()
were incorrect. Indeed, what is considered is the before_poll date
related to the configured timeout which does not correspond to what
is passed to the poller. That's not correct because before_poll+the
syscall's timeout will be crossed by the current date 100 ms after
the start of the poller. In practice it didn't happen when the poller
was limited to 1s timeout but at one minute it happens all the time.

That's particularly visible when running a multi-threaded setup with
busy polling and only half of the threads working (bind ... thread even).
In this case, the fixup code of clock_update_local_date() is executed
for each round of busy polling. The issue was made really visible
starting with recent commit e8b1ad4c2b ("BUG/MEDIUM: clock: also
update the date offset on time jumps") because upon a jump, the
shared offset is reset, while it should not be in this specific
case.

What needs to be done instead is to pass the configured timeout of
the poller (and not of the syscall), and always pass "interrupted"
set so as to claim we got an event (which is sort of true as it just
means the poller returned instantly). In this case we can still
detect backwards/forward jumps and will use a correct boundary
for the maximum date that covers the whole loop.

This can be backported to all versions since the issue was introduced
with busy-polling in 1.9-dev8.

(cherry picked from commit ad98edd00accdf5529da77eb79b72cf9fc025a7d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option

Since the 2.6, A parsing error is reported when the chunked encoding is
found twice. As stated in RFC9112, A sender must not apply the chunked
transfer coding more than once to a message body. It means only one chunked
coding must be found. In addition, empty values are also rejected becaues it
is forbidden by RFC9110.

However, in both cases, it may be useful to relax the rules for trusted
legacy servers when accept-invalid-http-response option is set. Especially
because it was accepted on 2.4 and older. In addition, T-E header is now
sanitized before sending it. It is not a problem Because it is a hop-by-hop
header

Note that it remains invalid on client side because there is no good reason
to relax the parsing on this side. We can argue a server is trusted so we
can decide to support some legacy behavior. It is not true on client side
and it is highly suspicious if a client is sending an invalid T-E header.

Note also we continue to reject unsupported T-E values (so all codings except
"chunked"). Because the "TE" header is sanitized and cannot contain other value
than "Trailers", there is absolutely no reason for a server to use something
else.

This patch should fix the issue #2677. It could probably be backported as
far as 2.6 if necessary.

(cherry picked from commit 1900ca475fa005b8000c4652457fedf282fa864d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: pattern: do not leave a leading comma on "set" error messages

Commit 4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by
coverity") dropped the condition to concatenate error messages and as
such introduced a leading comma in front of all of them. Then commit
911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found")
changed the behavior to stop at the first error anyway, so all the
mechanics dedicated to the concatenation of error messages is no longer
needed and we can simply return the error as-is, without inserting any
comma.

This should be backported where the patches above are backported.

(cherry picked from commit 9f8d9c9e8bdef8e7b8f7d1d641987b4d0dcae988)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established

This reverts commit 225a4d02e1f6a12c0b4f3584949fad3339d71708.

When a 200-OK response is replied to a CONNECT request or a
101-Switching-protocol, a tunnel is considered as established between the
client and the server. However, we must not declare the reponse as
bodyless. Of course, there is no payload, but tunneled data are expected.

Because of this bug, the zero-copy forwarding is disabled on the server
side.

This patch must be backported as far as 2.9.

(cherry picked from commit a99d58819f8184fb6f500807ddaf2371c5f1c110)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state

When the mux is woken up on I/O events, if the zero-copy forwarding is
enabled, receives are blocked. In this case, the SC is woken up to be able
to perform 0-copy forwarding to the other side. This works well, except for
the H1C in CLOSING state.

Indeed, in that case, in h1_process(), the SC is not woken up because only
RUNNING H1 connections are considered. As consequence, the mux will ignore
connection closure. The H1 connection remains blocked, waiting for the
shutdown timeout. If no timeout is configured, the H1 connection is never
closed leading to a leak.

This patch should fix leak reported by Damien Claisse in the issue #2697. It
should be backported as far as 2.8.

(cherry picked from commit f6e193f1b05bc3a73090b1efeb3f0440a0c776b0)
Signed-off-by: Willy Tarreau <w@1wt.eu>