Repositories - haproxy-3.1.git/log

[RELEASE] Released version 3.1.6

Released version 3.1.6 with the following main changes :
    - BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers
    - BUG/MEDIUM: uxst: fix outgoing abns address family in connect()
    - BUG/MINOR: log: fix outgoing abns address family
    - BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers
    - MINOR: clock: always use atomic ops for global_now_ms
    - BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()
    - BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()
    - MINOR: tinfo: add a new thread flag to indicate a call from a sig handler
    - BUG/MEDIUM: stream: never allocate connection addresses from signal handler
    - MINOR: freq_ctr: provide non-blocking read functions
    - BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper
    - BUG/MINOR: h2: always trim leading and trailing LWS in header values
    - BUG/MEDIUM: server: properly initialize PROXY v2 TLVs
    - BUG/MINOR: server: fix the "server-template" prefix memory leak
    - BUG/MINOR: h3: do not report transfer as aborted on preemptive response
    - CLEANUP: h3: fix documentation of h3_rcv_buf()
    - BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty
    - BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer
    - TESTS: ist: fix wrong array size
    - CI: github: fix h2spec.config proxy names
    - CLEANUP: log: removing "log-balance" references
    - BUG/MINOR: log: set proper smp size for balance log-hash
    - BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()
    - BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server
    - BUG/MINOR: cfgparse/peers: properly handle ignored local peer case
    - BUG/MINOR: server: dont return immediately from parse_server() when skipping checks
    - MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines
    - BUG/MINOR: stream: fix age calculation in "show sess" output
    - MINOR: stream/cli: rework "show sess" to better consider optional arguments
    - MINOR: stream/cli: make "show sess" support filtering on front/back/server
    - BUG/MINOR: cfgparse-tcp: relax namespace bind check
    - MINOR: startup: adjust alert messages, when capabilities are missed
    - BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
    - MINOR: compiler: add a simple macro to concatenate resolved strings
    - MINOR: compiler: add a new __decl_thread_var() macro to declare local variables
    - DOC: management: rename some last occurences from domain "dns" to "resolvers"
    - BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics
    - MINOR: tools: use only opportunistic symbols resolution
    - BUILD: tools: silence a build warning when USE_THREAD=0
    - BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if fd_hard_limit=0
    - BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()
    - BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was already sent
    - IMPORT: plock: give higher precedence to W than S
    - IMPORT: plock: lower the slope of the exponential back-off
    - IMPORT: plock: use cpu_relax() for a shorter time in EBO
    - MINOR: tinfo: split the signal handler report flags into 3
    - BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler
    - MINOR: cli: export cli_io_handler() to ease symbol resolution
    - MINOR: tools: improve symbol resolution without dl_addr
    - MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
    - MINOR: tools: teach resolve_sym_name() a few more common symbols
    - BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()

BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()

A build warning is emitted with gcc-4.8 in tools.c since commit
e920d73f59 ("MINOR: tools: improve symbol resolution without dl_addr")
because the compiler doesn't see that <size> is necessarily initialized.
Let's just preset it.

(cherry picked from commit ed75148ca0479c7f87e1c149f0b5f0c8dd50eb3c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: teach resolve_sym_name() a few more common symbols

This adds run_poll_loop, run_tasks_from_lists, process_runnable_tasks,
ha_dump_backtrace and cli_io_handler which are fairly common in
backtraces. This will be less relative symbols when dladdr is not
usable.

(cherry picked from commit 4e09789644efa6c0dacfbd278618594590019a6e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: ease the declaration of known symbols in resolve_sym_name()

Let's have a macro that declares both the symbol and its name, it will
avoid the risk of introducing typos, and encourages adding more when
needed. The macro also takes an optional second argument to permit an
inline declaration of an extern symbol.

(cherry picked from commit a3582a77f73a1c2ba84018ac361b4a763582ac44)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: improve symbol resolution without dl_addr

When dl_addr is not usable or fails, better fall back to the closest
symbol among the known ones instead of providing everything relative
to main. Most often, the location of the function will give some hints
about what it can be. Thus now we can emit fct+0xXXX in addition to
main+0xXXX or main-0xXXX. We keep a margin of +256kB maximum after a
function for a match, which is around the maximum size met in an object
file, otherwise it becomes pointless again.

(cherry picked from commit e920d73f598bd770a5dc861074268d1b2db4de59)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: cli: export cli_io_handler() to ease symbol resolution

It's common to meet this function in backtraces, it's a bit annoying
that it's not resolved, so let's export it so that it becomes resolvable.

(cherry picked from commit 1e99efccef21394f0b5e5940001aaee8b3cd6b4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler

In issue #2861, Jarosaw Rzeszótko reported another issue with
"show threads", this time in relation with the conversion of a stream's
accept date to local time. Indeed, if the libc was interrupted in this
same function, it could have been interrupted with a lock held, then
it's no longer possible to dump the date, and we face a deadlock.
This is easy to reproduce with logging enabled.

Let's detect we come from a signal handler and do not try to resolve
the time to localtime in this case.

(cherry picked from commit 2e0bac90da0013513969224db94661dda88d7b98)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tinfo: split the signal handler report flags into 3

While signals are not recursive, one signal (e.g. wdt) may interrupt
another one (e.g. debug). The problem this causes is that when leaving
the inner handler, it removes the outer's flag, hence the protection
that comes with it. Let's just have 3 distinct flags for regular signals,
debug signal and watchdog signal. We add a 4th definition which is an
aggregate of the 3 to ease testing.

(cherry picked from commit fb7874c286dbe1594837fc75b9e96783dfe160f5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

IMPORT: plock: use cpu_relax() for a shorter time in EBO

Tests have shown that on modern CPUs it's interesting to wait a bit less
in cpu_relax(). Till now we were looping down to 60 iterations and then
switching to just barriers. Increasing the threshold to 90 iterations
left before getting out of the loop improved the average and max time
to grab a write lock by a few percent (e.g. 10% at 1us, 20% at 256ns
or lower). Higher values tend to progressively lose that gain so let's
stick to this one. This was measured on an EPYC 74F3 like previous
measurements that initially led to this value, and the value might
possibly depend on the mask applied to the loop counter.

This is plock commit 74ca0a7307fa6aec3139f27d3b7e534e1bdb748e.

(cherry picked from commit b957e2f3ef63e1846988bed9d2bbb42213b29fbf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

IMPORT: plock: lower the slope of the exponential back-off

Along many tests involving both haproxy's scheduler and forwarded
traffic, various exponents and algorithms were attempted for the EBO
and their effects were measured. It was found that a growth in 1.25^N
limited to 128k cycles consistently gives a better latency than 1.5^N
limited to 256k cycles, without degrading general performance. The
measures of the time to grab a write lock on a 48-thread EPYC show
that the number of occurrences of low times was roughly multiplied by
2-3 while the number of occurrences of times above 64us was reduced
by similar factors, to even reach 300 at 64us and limiting the maximum
time by a factor of 4.

The other variants that were experimented with are:

  m = ((m + (m >> 1)) + 2) & 0x3ffff;            // original
  m = ((m + (m >> 1) + (m >> 3)) + 2) & 0x3ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x3ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x1ffff;
  m = ((m + (m >> 1) + (m >> 4)) + 1) & 0x1ffff;
  m = ((m + (m >> 2) + (m >> 4)) + 1) & 0x1ffff; // lowest CPU on pl_wr test + good perf
  m = ((m + (m >> 2)) + 1) & 0x1ffff;            // even lower cpu usage, lowest max
  m = ((m + (m >> 1) + (m >> 2)) + 1) & 0x1ffff; // correct but slightly higher maxes
  m = ((m + (m >> 1) + (m >> 3)) + 1) & 0x1ffff; // less good than m+m>>2
  m = ((m + (m >> 2) + (m >> 3)) + 1) & 0x1ffff; // better but not as good as m+m>>2
  m = ((m + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good, lower rates on small coounts.
  m = ((m + (m >> 2) + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good as well
  m = ((m & 0x7fff) + (m >> 1) + (m >> 4)) + 2;
  m = ((m & 0xffff) + (m >> 1) + (m >> 4)) + 2;

This is plock commit dddd9ee01c522da33c353e2e4d4fd743d8336ec3.

(cherry picked from commit 253fba01a7adf20303ec65cbe256681df588f065)
Signed-off-by: Willy Tarreau <w@1wt.eu>

IMPORT: plock: give higher precedence to W than S

It was noticed in haproxy that in certain extreme cases, a write lock
subject to EBO may fail for a very long time in front of a large set
of readers constantly trying to upgrade to the S state. The reason is
that among many readers, one will succeed in its upgrade, and this
situation can last for a very long time with many readers upgrading
in turn, while the writer waits longer and longer before trying again.

Here we're taking a reasonable approach which is that the write lock
should have a higher precedence in its attempt to grab the lock. What
is done is that instead of fully rolling back in case of conflict with
a pure S lock, the writer will only release its read part in order to
let the S upgrade to W if needed, and finish its operations. This
guarantees no other seek/read/write can enter. Once the conflict is
resolved, the writer grabs the read part again and waits for readers
to be gone (in practice it could even return without waiting since we
know that any possible wanderers would leave or even not be there at
all, but it avoids a complicated loop code that wouldn't improve the
practical situation but inflate the code).

Thanks to this change, the maximum write lock latency on a 48 threads
AMD with aheavily loaded scheduler went down from 256 to 64 ms, and the
number of occurrences of 32ms or more was divided by 300, while all
occurrences of 1ms or less were multiplied by up to 3 (3 for the 4-16ns
cases).

This is plock commit b6a28366d156812f59c91346edc2eab6374a5ebd.

(cherry picked from commit 9dd56da73072f88e1b4a81923616b5276013b186)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was already sent

On frontend side, when a stream is shut while the response was already fully
sent, it was cancelled by sending a RST_STREAM(CANCEL) frame. However, it is
not accurrate. CANCEL error code must only be used if the response headers
were sent, but not the full response. As stated in the RFC 9113, when the
response was fully sent, to stop the request sending, a RST_STREAM with an
error code of NO_ERROR must be sent.

This patch should solve the issue #1219. It must be backported to all stable
versions.

(cherry picked from commit e87397bc7d3b386c95d1489d19f29e6d5f1f1482)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()

Recent commit e5e36ce09 ("BUG/MEDIUM: hlua/cli: Fix lua CLI commands
to work with applet's buffers") revealed a bug in hlua cli applet handling

Indeed, playing with Willy's lua tetris script on the cli, a segfault
would be encountered when forcefully closing the session by sending a
CTRL+C on the terminal.

In fact the crash was caused by a UAF: while the cli applet was already
freed, the lua task responsible for waking it up would still point to it.
Thus hlua_applet_wakeup() could be called even if the applet didn't exist
anymore.

To fix the issue, in hlua_cli_io_release_fct() we must also free the hlua
task linked to the applet, like we already do for
hlua_applet_tcp_release() and hlua_applet_http_release().

While this bug exists on stable versions (where it should be backported
too for precaution), it only seems to be triggered starting with 3.0.

(cherry picked from commit 21601f4a27c4a1c8da0dbbfa22329ec1f927670e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if fd_hard_limit=0

'global.fd_hard_limit' stays uninitialized, if haproxy is started with -m
(global.rlimit_memmax). 'remain' is the MAX between soft and hard process fd
limits. It will be always bigger than 'global.fd_hard_limit' (0) in this case.

So, if we reassign 'remain' to the 'global.fd_hard_limit' unconditionally,
calculated then 'maxconn' will be even negative and the DEFAULT_MAXCONN (100)
will be set as the 'ideal_maxconn'.

During the 'global.maxconn' calculations in set_global_maxconn(), if the
provided 'global.rlimit_memmax' is quite big, system will refuse to calculate
based on its 'global.maxconn' and we will do a fallback to the 'ideal_maxconn',
which is 100.

Same problem for the configs with SSL frontends and backends.

This fixes the issue #2899.

This should be backported to v3.1.0.

(cherry picked from commit 060f441199aa97d9735dd553bafd231ca615f723)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: tools: silence a build warning when USE_THREAD=0

The dladdr_lock that was added to avoid re-entering into dladdr is
conditioned by threads, but the way it's declared causes a build
warning if threads are disabled due to the insertion of a lone semi
colon in the variables block. Let's switch to __decl_thread_var()
for this.

This can be backported wherever commit eb41d768f9 ("MINOR: tools:
use only opportunistic symbols resolution") is backported. It relies
on these previous two commits:

bb4addabb7 ("MINOR: compiler: add a simple macro to concatenate resolved strings")
69ac4cd315 ("MINOR: compiler: add a new __decl_thread_var() macro to declare local variables")

(cherry picked from commit b61ed9babe879fd5603af4643c1d60c8ab48d096)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: use only opportunistic symbols resolution

As seen in issue #2861, dladdr_and_size() an be quite expensive and
will often hold a mutex in the underlying library. It becomes a real
problem when issuing lots of "show threads" or wdt warnings in parallel
because threads will queue up waiting for each other to finish, adding
to their existing latency that possibly caused the warning in the first
place.

Here we're taking a different approach. If the thread is not isolated
and not panicking, it's doing unimportant stuff like showing threads
or warnings. In this case we try to grab a lock, and if we fail because
another thread is already there, we just pretend we cannot resolve the
symbol. This is not critical because then we fall back to the already
used case which consists in writing "main+<offset>". In practice this
will almost never happen except in bad situations which could have
otherwise degenerated.

(cherry picked from commit eb41d768f954d5c7360fd19ec69f9d707b900532)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics

Performing a diff on stats output before vs after commit 66152526
("MEDIUM: stats: convert counters to new column definition") revealed
that some metrics were not properly ported to to the new API. Namely,
"lbtot", "cli_abrt" and "srv_abrt" are now exposed on frontend and
listeners while it was not the case before.

Also, "hrsp_other" is exposed even when "mode http" wasn't set on the
proxy.

In this patch we restore original behavior by fixing the capabilities
and hide settings.

As this could be considered as a minor regression (looking at the commit
message it doesn't seem intended), better tag this as a bug. It should be
backported in 3.0 with 66152526.

(cherry picked from commit 8311be5ac60c10fc4af56e3df79031358236bc14)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: management: rename some last occurences from domain "dns" to "resolvers"

This is a complementary patch to cf913c2f9 ("DOC: management: rename show
stats domain cli "dns" to "resolvers"). The doc still refered to the
legacy "dns" domain filter for stat command. Let's rename those occurences
to "resolvers".

It may be backported to all stable versions.

(cherry picked from commit 4c3eb60e7019d12734501ceb9358d2714bce8922)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: add a new __decl_thread_var() macro to declare local variables

__decl_thread() already exists but is more suited for struct members.
When using it in a variables block, it appends the final trailing
semi-colon which is a statement that ends the variable block. Better
clean this up and have one precisely for variable blocks. In this
case we can simply define an unused enum value that will consume the
semi-colon. That's what the new macro __decl_thread_var() does.

(cherry picked from commit 69ac4cd315f05c69f3c34f824ef4e7f3221966cf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: add a simple macro to concatenate resolved strings

It's often useful to be able to concatenate strings after resolving
them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro
to do that, which calls _CONCAT() with the same arguments to make
sure the contents are resolved before being concatenated.

(cherry picked from commit bb4addabb742f2305ad6667ed42ebb12f5df2af3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity

A bug was uncovered by the work on NUMA. It only triggers in the CI
with libmusl due to a race condition. What happens is that the call
to set_thread_cpu_affinity() is done very early in the polling loop,
and that it relies on ha_pthread[tid] instead of pthread_self(). The
problem is that ha_pthread[tid] is only set by the return from
pthread_create(), which might happen later depending on the number of
CPUs available to run the starting thread.

Let's just use pthread_self() here. ha_pthread[] is only used to send
signals between threads, there's no point in using it here.

This can be backported to 2.6.

(cherry picked from commit 12383fd9f5b3614dbffec6260b82659c3c5fd0df)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: startup: adjust alert messages, when capabilities are missed

CAP_SYS_ADMIN support was added, in order to access sockets in namespaces. So
let's adjust the alert at startup, where we check preserved capabilities from
global.last_checks. Let's mention here cap_sys_admin as well.

(cherry picked from commit 7d427134fe01e8af56dfa48c6d9e6ecc5defe562)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse-tcp: relax namespace bind check

Commit 5cbb278 introduced cap_sys_admin support, and enforced checks for
both binds and servers. However, when binding into a namespace, the bind
is done before dropping privileges. Hence, checking that we have
cap_sys_admin capability set in this case is not needed (and it would
decrease security to add it).
For users starting haproxy with other user than root and without
cap_sys_admin, bind should have already failed.
As a consequence, relax runtime check for binds into a namespace.

(cherry picked from commit f0a07f834c001c5b505e84b0f0b103e530e87d1b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: stream/cli: make "show sess" support filtering on front/back/server

With "show sess", particularly "show sess all", we're often missing the
ability to inspect only streams attached to a frontend, backend or server.
Let's just add these filters to the command. Only one at a time may be set.

One typical use case could be to dump streams attached to a server after
issuing "shutdown sessions server XXX" to figure why any wouldn't stop
for example.

(cherry picked from commit 5e558c172791978d41a2e3f07519bc49ee2337e3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: stream/cli: rework "show sess" to better consider optional arguments

The "show sess" CLI command parser is getting really annoying because
several options were added in an exclusive mode as the single possible
argument. Recently some cumulable options were added ("show-uri") but
the older ones were not yet adapted. Let's just make sure that the
various filters such as "older" and "age" now belong to the options
and leave only <id>, "all", and "help" for the first ones. The doc was
updated and it's now easier to find these options.

(cherry picked from commit 2bd7cf53cb7b95622c2451a4e7bd46a267463617)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stream: fix age calculation in "show sess" output

The "show sess" output reports an age that's based on the last byte of
the HTTP request instead of the stream creation date, due to a confusion
between logs->request_ts and the request_date sample fetch function. Most
of the time these are equal except when the request is not yet full for
any reason (e.g. wait-body). This explains why a few "show sess" could
report a few new streams aged by 99 days for example.

Let's perform the correct request timestamp calculation like the sample
fetch function does, by adding t_idle and t_handshake to the accept_ts.
Now the stream's age is correct and can be correctly used with the
"show sess older <age>" variant.

This issue was introduced in 2.9 and the fix can be backported to 3.0.

(cherry picked from commit 1cdf2869f6757946546a2ef102ce822e95de78f8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines

Invalid (incomplete) "server" or "peer" lines under peers section are now
properly ignored. For completeness, in this patch we add some reports so
that the user knows that incomplete lines were ignored.

For an incomplete server line, since it is tolerated (see GH #565), we
only emit a diag warning.

For an incomplete peer line, we report a real warning, as it is not
expected to have a peer line without an address:port specified.

Also, 'newpeer == curpeers->local' check could be simplified since
we already have the 'local_peer' variable which tells us that the
parsed line refers to a local peer.

(cherry picked from commit dbb25720dd7157e0f180d17486f10340f80a9fda)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: server: dont return immediately from parse_server() when skipping checks

If parse_server() is called under peers section parser, and the address
needs to be parsed but it is missing, we directly return from the function

However since 0fc136ce5b ("REORG: server: use parsing ctx for server
parsing"), parse_server() uses parsing ctx to emit warning/errors, and
the ctx must be reset before returning from the function, yet this early
return was overlooked. Because of that, any ha_{warning,alert..} message
reported after early return from parse_server() could cause messages to
have an extra "parsing [file:line]" info.

We fix that by ensuring parse_server() doesn't return without resetting
the parsing context.

It should be backported up to 2.6

(cherry picked from commit a76b5358f0400568b641dc373cbd582875cd6aa6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse/peers: properly handle ignored local peer case

In 8ba10fea6 ("BUG/MINOR: peers: Incomplete peers sections should be
validated."), some checks were relaxed in parse_server(), and extra logic
was added in the peers section parser in an attempt to properly ignore
incomplete "server" or "peer" statement under peers section.

This was done in response to GH #565, the main intent was that haproxy
should already complain about incomplete peers section (ie: missing
localpeer).

However, 8ba10fea69 explicitly skipped the peer cleanup upon missing
srv association for local peers. This is wrong because later haproxy
code always assumes that peer->srv is valid. Indeed, we got reports
that the (invalid) config below would cause segmentation fault on
all stable versions:

global
   localpeer 01JM0TEPAREK01FQQ439DDZXD8

peers my-table
   peer 01JM0TEPAREK01FQQ439DDZXD8

listen dummy
   bind localhost:8080

To fix the issue, instead of by-passing some cleanup for the local
peer, handle this case specifically by doing the regular peer cleanup
and reset some fields set on the curpeers and curpeers proxy because
of the invalid local peer (do as if the peer was not declared).

It should still comply with requirements from #565.

This patch should be backported to all stable versions.

(cherry picked from commit 054443dfb908521e30aa57335dbcb5f9cd6f7218)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server

In the "peers" section parser, right after parse_server() is called, we
used to check whether the curpeers->peers_fe->srv pointer was set or not
to know if parse_server() successfuly added a server to the peers proxy,
server that we can then associate to the new peer.

However the check is wrong, as curpeers->peers_fe->srv points to the
last added server, if a server was successfully added before the
failing one, we cannot detect that the last parse_server() didn't
add a server. This is known to cause bug with bad "peer"/"server"
statements.

To fix the issue, we save a pointer on the last known
curpeers->peers_fe->srv before parse_server() is called, and we then
compare the save with the pointer after parse_server(), if the value
didn't change, then parse_server() didn't add a server. This makes
the check consistent in all situations.

It should be backported to all stable versions.

(cherry picked from commit 2560ab892f355e958007b287946f787b578d3131)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()

In check_config_validity() we evaluate some sample fetch expressions
(log-format, server rules, etc). These expressions may use external files like
maps.

If some particular 'default-path' was set in the global section before, it's no
longer applied to resolve file pathes in check_config_validity(). parse_cfg()
at the end of config parsing switches back to the initial cwd.

This fixes the issue #2886.

This patch should be backported in all stable versions since 2.4.0, including
2.4.0.

(cherry picked from commit e900ef987e4167cf0921e97b09059d757f2c0ea5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: log: set proper smp size for balance log-hash

result.data.u.str.size was set to size+1 to take into account terminating
NULL byte as per the comment. But this is wrong because the caller is free
to set size to just the right amount of bytes (without terminating NULL
byte). In fact all smp API functions will not read past str.data so there
is not risk about uninitialized reads, but this leaves an ambiguity for
converters that may use all the smp size to perform transformations, and
since we don't know about the "message" memory origin, we cannot assume
that its size may be greater than size. So we max it out to size just to
be safe.

This bug was not known to cause any issue, it was spotted during code
review. It should be backported in 2.9 with b30bd7a ("MEDIUM: log/balance:
support for the "hash" lb algorithm")

(cherry picked from commit 94a9b0f5deabd49020c8ff535a3404494345b399)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CLEANUP: log: removing "log-balance" references

This is a complementary patch to 0e1f389fe9 ("DOC: config: removing
"log-balance" references"): we properly removed all log-balance
references in the doc but there remained some in the code, let's fix
that.

It could be backported in 2.9 with 0e1f389fe9

(cherry picked from commit ddf66132f4d77e00c52c977d8a9e5c829965e7c7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CI: github: fix h2spec.config proxy names

h2spec.config config file emitted a warning because the frontend name
has the same name as the backend.

(cherry picked from commit 588237ca6e6624f7c1162289a6a00cab3f10ac61)
Signed-off-by: Willy Tarreau <w@1wt.eu>

TESTS: ist: fix wrong array size

test_istzero() and test_istpad() has the wrong array size buf[] which
lacks the space for the '\0';

Could be backported in every stable branches.

(cherry picked from commit ddd2c82a3521ad61398d24fa10f7483cd6518de8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer

As reported in issue #2882, using "no-send-proxy-v2" on a server line does
not properly disable the use of proxy-protocol if it was enabled in a
default-server directive in combination with other PP options. The reason
for this is that the sending of a proxy header is determined by a test on
srv->pp_opts without any distinction, so disabling PPv2 while leaving other
options results in a PPv1 header to be sent.

Let's fix this by explicitly testing for the presence of either send-proxy
or send-proxy-v2 when deciding to send a proxy header.

This can be backported to all versions. Thanks to Andre Sencioles (@asenci)
for reporting the issue and testing the fix.

(cherry picked from commit 730641f7cad32bfff97875716efe4bd784bb006b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty

Don't reserve space for the HTX overhead on receive if the demux buffer is
not empty. Otherwise, the demux buffer may be erroneously reported as full
and this may block records processing. Because of this bug, a ping-pong loop
till timeout between data reception and demux process can be observed.

This bug was introduced by the commit 5f927f603 ("BUG/MEDIUM: mux-fcgi:
Properly handle read0 on partial records"). To fix the issue, if the demux
buffer is not empty when we try to receive more data, all free space in the
buffer can now be used. However, if the demux buffer is empty, we still try
to keep it aligned with the HTX.

This patch must be backported to 3.1.

(cherry picked from commit 0e08252294d5a7389ad42b51b4b931fab2e66f31)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: h3: fix documentation of h3_rcv_buf()

Return value of h3_rcv_buf() is incorrectly documented. Indeed, it may
return a positive value to indicate that input bytes were converted into
HTX. This is especially important, as caller uses this value to consume
the reported data amount in QCS Rx buffer.

This should be backported up to 2.6. Note that on 2.8, h3_rcv_buf() was
named h3_decode_qcs().

(cherry picked from commit 0aa35289b3b51e09a5757c9991212ec416d281f2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h3: do not report transfer as aborted on preemptive response

HTTP/3 specification allows a server to emit the entire response even if
only a partial request was received. In particular, this happens when
request STREAM FIN is delayed and transmitted in an empty payload frame.

In this case, qcc_abort_stream_read() was used by HTTP/3 layer to emit a
STOP_SENDING. Remaining received data were not transmitted to the stream
layer as they were simply discared. However, this prevents FIN
transmission to the stream layer. This causes the transfer to be
considered as prematurely closed, resulting in a cL-- log line status.
This is misleading to users which could interpret it as if the response
was not sent.

To fix this, disable STOP_SENDING emission on full preemptive reponse
emission. Rx channel is kept opened until the client closes it with
either a FIN or a RESET_STREAM. This ensures that the FIN signal can be
relayed to the stream layer, which allows the transfer to be reported as
completed.

This should be backported up to 2.9.

(cherry picked from commit f6648d478b632bbd243ab374e24c02d566a4112b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server: fix the "server-template" prefix memory leak

The srv->tmpl_info.prefix was not freed in srv_free_params().

This could be backported to all stable versions.

(cherry picked from commit 0ae7a5d672f61cd4a949bf081b61857f6bbad476)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: server: properly initialize PROXY v2 TLVs

The PROXY v2 TLVs were not properly initialized when defined with
"set-proxy-v2-tlv-fmt" keyword, which could have caused a crash when
validating the configuration or malfunction (e.g. when used in
combination with "server-template" and/or "default-server").

The issue was introduced with commit 6f4bfed3a ("MINOR: server: Add
parser support for set-proxy-v2-tlv-fmt").

This should be backported up to 2.9.

(cherry picked from commit 6838fe43a320cf090892451ee907967666b626e5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h2: always trim leading and trailing LWS in header values

Annika Wickert reported some occasional disconnections between haproxy
and varnish when communicating over HTTP/2, with varnish complaining
about protocol errors while captures looked apparently normal. Nils
Goroll managed to reproduce this on varnish by injecting the capture of
the outgoing haproxy traffic and noticed that haproxy was forwarding a
header value containing a trailing space, which is now explicitly
forbidden since RFC9113.

It turns out that the only way for such a header to pass through haproxy
is to arrive in h2 and not be edited, in which case it will arrive in
HTX with its undesired spaces. Since the code dealing with HTX headers
always trims spaces around them, these are not observable in dumps, but
only when started in debug mode (-d). Conversions to/from h1 also drop
the spaces.

With this patch we trim LWS both on input and on output. This way we
always present clean headers in the whole stack, and even if some are
manually crafted by the configuration or Lua, they will be trimmed on
the output.

This must be backported to all stable versions.

Thanks to Annika for the helpful capture and Nils for the help with
the analysis on the varnish side!

(cherry picked from commit bbf824933f71eca90b5f07a51fa93f4fa7ac2256)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper

The stream dump function is called from signal handlers (warning, show
threads, panic). It makes use of read_freq_ctr() which might possibly
block if it tries to access a locked freq_ctr in the process of being
updated, e.g. by the current thread.

Here we're relying on the non-blocking API instead. It may return incorrect
values (typically smaller ones after resetting the curr counter) but at
least it will not block.

This needs to be backported to stable versions along with the previous
commit below:

MINOR: freq_ctr: provide non-blocking read functions

At least 3.1 is concerned as the warnings tend to increase the risk of
this situation appearing.

(cherry picked from commit 3c22fa315bcb2945d588cb64302f6ba5c89b382e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: freq_ctr: provide non-blocking read functions

Some code called by the debug handlers in the context of a signal handler
accesses to some freq_ctr and occasionally ends up on a locked one from
the same thread that is dumping it. Let's introduce a non-blocking version
that at least allows to return even if the value is in the process of being
updated, it's less problematic than hanging.

(cherry picked from commit 29e246a84ce27af63779d98b305ad53877ae9acc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stream: never allocate connection addresses from signal handler

In __strm_dump_to_buffer(), we call conn_get_src()/conn_get_dst() to try
to retrieve the connection's IP addresses. But this function may be called
from a signal handler to dump a currently running stream, and if the
addresses were not allocated yet, a poll_alloc() will be performed while
we might possibly already be running pools code, resulting in pool list
corruption.

Let's just make sure we don't call these sensitive functions there when
called from a signal handler.

This must be backported at least to 3.1 and ideally all other versions,
along with this previous commit:

MINOR: tinfo: add a new thread flag to indicate a call from a sig handler

(cherry picked from commit 84d4c948fce4b31220bb9a30cbf676613bbbf4f2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: tinfo: add a new thread flag to indicate a call from a sig handler

Signal handlers must absolutely not change anything, but some long and
complex call chains may look innocuous at first glance, yet result in
some subtle write accesses (e.g. pools) that can conflict with a running
thread being interrupted.

Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when
entering a signal handler and cleared when leaving them. Note, we're
speaking about real signal handlers (synchronous ones), not deferred
ones. This will allow some sensitive call places to act differently
when detecting such a condition, and possibly even to place a few new
BUG_ON().

(cherry picked from commit ddd173355c9c7452ff6ec317c8be6195d25dba2a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info()

This function may be called from a signal handler during a warning,
a panic or a show thread. We need to be more cautious about what may
or may not be dereferenced since an h1s is not necessarily fully
initialized. Loops of "show threads" sometimes manage to crash when
dereferencing a null h1s->sd, so let's guard it and add a comment
remining about the unusual call place.

This can be backported to the relevant versions.

(cherry picked from commit a56dfbdcb4cb3eb9ffd3db641efb3e5605a6c3f0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer()

co_data() was instrumented to detect cases where c->output > data and
emits a warning if that's not correct. The problem is that it happens
quite a bit during "show threads" if it interrupts traffic anywhere,
and that in some environments building with -DDEBUG_STRICT_ACTION=3,
it will kill the process.

Let's just open-code the channel functions that make access to co_data(),
there are not that many and the operations remain very simple.

This can be backported to 3.1. It didn't trigger in earlier versions
because they didn't have this CHECK_IF_HOT() test.

(cherry picked from commit 9d5bd47634707963ac6be4fc90e64feded174d0a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: clock: always use atomic ops for global_now_ms

global_now_ms is shared between threads so we must give hint to the
compiler that read/writes operations should be performed atomically.

Everywhere global_now_ms was used, atomic ops were used, except in
clock_update_global_date() where a read was performed without using
atomic op. In practise it is not an issue because on most systems
such reads should be atomic already, but to prevent any confusion or
potential bug on exotic systems, let's use an explicit _HA_ATOMIC_LOAD
there.

This may be backported up to 2.8

(cherry picked from commit 97a19517ffe3438562f80c314f5b6f3f27df7668)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers

When the connection for sink_forward_{oc}_applet fails or a previous one
is destroyed, the sft->appctx is instantly released.

However process_sink_forward_task(), which may run at any time, iterates
over all known sfts and tries to create sessions for orphan ones.

It means that instantly after sft->appctx is destroyed, a new one will
be created, thus a new connection attempt will be made.

It can be an issue with tcp log-servers or sink servers, because if the
server is unavailable, process_sink_forward() will keep looping without
any temporisation until the applet survives (ie: connection succeeds),
which results in unexpected CPU usage on the threads responsible for
that task.

Instead, we add a tempo logic so that a delay of 1second is applied
between two retries. Of course the initial attempt is not delayed.

This could be backported to all stable versions.

(cherry picked from commit 9561b9fb6964af325a10e7128b563114f144a3cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: log: fix outgoing abns address family

While reviewing the code in an attempt to fix GH #2875, I stumbled
on another case similar to aac570c ("BUG/MEDIUM: uxst: fix outgoing
abns address family in connect()") that caused abns(z) addresses to
fail when used as log targets.

The underlying cause is the same as aac570c, which is the rework of the
unix socket families in order to support custom addresses for different
adressing schemes, where a real_family() was overlooked before passing
a haproxy-internal address struct to socket-oriented syscall.

To fix the issue, we first copy the target's addr, and then leverage
real_family() to set the proper low-level address family that is passed
to sendmsg() syscall.

It should be backported in 3.1

(cherry picked from commit c9d41927266849208508b144ef15809a3a15c6cb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: uxst: fix outgoing abns address family in connect()

Since we reworked the unix socket families in order to support custom
addresses for different addressing schemes, we've been using extra
values for the ss_family field in sockaddr_storage. These ones have
to be adjusted before calling bind() or connect(). It turns out that
after the abns/abnsz updates in 3.1, the connect() code was not adjusted
to take care of the change, resulting in AF_CUST_ABNS or AF_CUST_ABNSZ
to be placed in the address that was passed to connect().

The right approach is to locally copy the address, get its length,
fixup the family and use the fixed value and length for connect().

This must be backported to 3.1. Many thanks for @Mewp for reporting
this issue in github issue #2875.

(cherry picked from commit aac570cd0384f7fb942e7eb1eeb8c7d0605f50e2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers

When "peers" keyword is followed by more than one argument and it's the first
"peers" section in the config, cfg_parse_peers() detects it and exits with
"ERR_ALERT|ERR_FATAL" err_code.

So, upper layer parser, parse_cfg(), continues and parses the next keyword
"peer" and then he tries to check the global cfg_peers, which should contain
"my_cluster". The global cfg_peers is still NULL, because after alerting a user
in alertif_too_many_args, cfg_parse_peers() exited.

peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000

In order to fix this, let's add ERR_ABORT, if "peers" keyword is followed by
more than one argument. Like this parse_cfg() will stops immediately and
terminates haproxy with "too many args for peers my_cluster..." alert message.

It's more reliable, than add checks "if (cfg_peers !=NULL)" in "peer"
subparser, as we may have many "peers" sections.

peers my_another_cluster
peer haproxy1 1.1.1.2 1000

peers my_cluster __some_wrong_data__
peer haproxy1 1.1.1.1 1000

In addition, for the example above, parse_cfg() will parse all configuration
until the end and only then terminates haproxy with the alert
"too many args...". Peer haproxy1 will be wrongly associated with
my_another_cluster.

This fixes the issue #2872.
This should be backported in all stable versions.

(cherry picked from commit 390df282c1ac4605273abfeb82c97fad205b7294)
Signed-off-by: Willy Tarreau <w@1wt.eu>

[RELEASE] Released version 3.1.5

Released version 3.1.5 with the following main changes :
- BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room
- BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK

BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK

In the SPOP protocol, ACK frame with empty payload are allowed. However, in
that case, because only the payload is transferred, there is no data to
return to the SPOE applet. Only the end of input is reported. Thus the
applet is never woken up. It means that the SPOE filter will be blocked
during the processing timeout and will finally return an error.

To workaournd this issue, a NOOP action is introduced with the value 0. It
is only an internal action for now. It does not exist in the SPOP
protocol. When an ACK frame with an empy payload is received, this noop
action is transferred to the SPOE applet, instead of nothing. Thanks to this
trick, the applet is properly notified. This works because unknown actions
are ignored by the SPOE filter.

This patch must be backported to 3.1.

(cherry picked from commit 851e52b551f80f319b0de1817fc91b7f3949fc67)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room

The commit 7214dcd52 ("BUG/MEDIUM: applet: Don't pretend to have more data
to handle EOI/EOS/ERROR") introduced a regression. Because of this patch, it
was possible to handle EOI/EOS/ERROR applet flags too early while the applet
was waiting for more room to transfer the last output data.

This bug can be encountered with any applet using its own buffers (cache and
stats for instance). And depending on the configuration and the timing, the
data may be truncated or the stream may be blocked, infinitely or not.
Streams blocked infinitely were observed with the cache applet and the HTTP
compression enabled.

For the record, it is important to detect EOI/EOS/ERROR applet flags to be
able to report the corresponding event on the SE and by transitivity on the
SC. Most of time, this happens when some data should be transferred to the
stream. The .rcv_buf callback function is called and these flags are
properly handled. However, some applets may also report them spontaneously,
outside of any data transfer. In that case, the .rcv_buf callback is not
called.

It is the purpose of this patch (and the one above). Being able to detect
pending EOI/EOS/ERROR applet flags. However, we must be sure to not handle
them too early at this place. When these flags are set, it means no more
data will be produced by the applet. So we must only wait to have
transferred everything to the stream. And this happens when the applet is no
longer waiting for more room.

This patch must be backported to 3.1 with the one above.

(cherry picked from commit efc46de29457b93f75ae4b581f70a9df8362d6ca)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.1.4

Released version 3.1.4 with the following main changes :
    - BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
    - BUG/MEDIUM: debug: close a possible race between thread dump and panic()
    - BUG/MINOR: quic: reserve length field for long header encoding
    - BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
    - BUG/MINOR: mworker: section ignored in discovery after a post_section_parser
    - BUG/MINOR: mworker: post_section_parser for the last section in discovery
    - BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
    - BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
    - BUG/MEDIUM: cli: Be sure to drop all input data in END state
    - BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker
    - BUG/MEDIUM: filters: Handle filters registered on data with no payload callback
    - BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
    - BUG/MINOR: quic: prevent crash on conn access after MUX init failure
    - BUG/MINOR: mux-quic: prevent crash after MUX init failure
    - REGTESTS: Fix truncated.vtc to send 0-CRLF
    - BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
    - BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types
    - MINOR: quic: rename pacing_rate cb to pacing_inter
    - MINOR: mux-quic: increment pacing retry counter on expired
    - MEDIUM: quic: implement credit based pacing
    - MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
    - MEDIUM: quic: use dynamic credit for pacing
    - MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
    - MINOR: quic: adapt credit based pacing to BBR
    - MINOR: epoll: permit to mask certain specific events
    - BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
    - DEBUG: fd: add a counter of takeovers of an FD since it was last opened
    - MINOR: fd: add a generation number to file descriptors
    - DEBUG: epoll: store and compare the FD's generation count with reported event
    - MEDIUM: epoll: skip reports of stale file descriptors
    - BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
    - DOC: htx: clarify <mark> parameter for htx_xfer_blks()
    - BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
    - BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval
    - BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
    - BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler
    - BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR
    - BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
    - MINOR: flt-spoe: Report end of input immediately after applet init
    - MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
    - MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
    - BUG/MINOR: cli: Don't set SE flags from the cli applet
    - BUG/MINOR: cli: Fix memory leak on error for _getsocks command
    - BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
    - BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive
    - BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
    - BUG/MINOR: flt-trace: Support only one name option
    - BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
    - DOC: option redispatch should mention persist options

DOC: option redispatch should mention persist options

"option redispatch" remains vague in which cases a session would persist;
let's mention "option persist" and "force-persist" as an example so folks
don't draw the conclusion that this may be default.

Should be backported to stable branches.

(cherry picked from commit 5926fb78233ab4ba95b07db3815ee62f5cbc5082)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer

A JSON integer is defined in the range [-(2**53)+1, (2**53)-1]. Macro are used
to define the minimum and the maximum value, The minimum one is defined using
the maximum one. So JSON_INT_MAX must be defined as a signed integer value to
avoid wrong cast of JSON_INT_MIN.

It was reported by Coverity in #2841: CID 1587769.

This patch could be backported to all stable versions.

(cherry picked from commit d48b5add889db1bf2f0fae4721abb46413303d33)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: flt-trace: Support only one name option

When a trace filter is defined, only one 'name' option is expected. But it
was not tested. Thus it was possible to set several names leading to a
memory leak.

It is now tested, and it is not allowed to redefine the trace filter name.

It was reported by Coverity in #2841: CID 1587768.

This patch could be backported to all stable versions.

(cherry picked from commit b20e2c96cfea06776db3200614f6d94441f15c2d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: auth: Fix a leak on error path when parsing user's groups

In a userlist section, when a user is parsed, if a specified group is not
found, an error is reported. In this case we must take care to release the
alredy built groups list.

It was reported by Coverity in #2841: CID 1587770.

This patch could be backported to all stable versions.

(cherry picked from commit a7f513af9187444680e3e054e0a13f818aed3307)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive

When a group is defined in a userlist section, only one 'users' option is
expected. But it was not tested. Thus it was possible to set several options
leading to a memory leak.

It is now tested, and it is not allowed to redefine the users option.

It was reported by Coverity in #2841: CID 1587771.

This patch could be backported to all stable versions.

(cherry picked from commit a1e14d2a8272511c29d9225a61c74dc89847287d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()

In _getsocks() functuoin, when we failed to set the unix socket in
non-blocking mode, a goto to "out" label led to loop infinitly. To fix the
issue, we must only let the function exit.

This patch should be backported to all stable versions.

(cherry picked from commit 75e8c8ed330f2ad1b6b33630efebb5041af3d5e9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Fix memory leak on error for _getsocks command

Some errors in parse function of _getsocks commands were not properly handled
and immediately returned, leading to a memory leak on cmsgbuf and tmpbuf
buffers.

To fix the issue, instead of immediately return with -1, we jump to "out"
label. Returning 1 intead of -1 in that case is valid.

This was reported by Coverity in #2841: CIDs 1587773 and 1587772.

This patch should be backported as far as 2.4.

(cherry picked from commit 372cc696d44e6853b9f7920f1c2d965736029764)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Don't set SE flags from the cli applet

Since the CLI was updated to use the new applet API, it should no longer set
directly the SE flags. Instead, the corresponding applet flags must be set,
using the applet API (appet_set_*). It is true for the CLI I/O handler but also
for the commands parse function and I/O callback function.

This patch should be backported as far as 3.0.

(cherry picked from commit 7e927243b9e86cee13ea9c16077a8523e79e41bb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only

The SPOP_CF_ERROR flag is now set on connection error only. It was also set
on some demux failures. But it is not mandatory because the connection is
closed anyway. And it is handy to have a flag dedicated to tcp connection
error. It was the original purpose of this flag.

This patch could be backported to 3.1 to ease future backports.

(cherry picked from commit 514a912a4d15c834a034be8ea7675c3ae0822080)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream

The spop stream now reports the end of input when the ACK is transferred to
the SPOE applet. To do so, the flag SPOP_SF_ACK_RCVD was added. It is set on
the SPOP stream when its ACK is received by the SPOP connection.

In addition when SPOP stream flags are propagated to the SE, the error is
now reported if end of input was not reached instead of testing the
connection error code. It is more accurate.

This patch should be backported to 3.1.

(cherry picked from commit d16c534511ca9e75437839bf654d45faf2992e0e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: flt-spoe: Report end of input immediately after applet init

The SPOE applet forwards the message that must be sent to agent during its
init stage. So just after it is created. When it is performed, the end of
input must be reported because no more data will be forwarded. However, it
was performed after receiving the ACK response. It is harmless, but there is
no reason to delay the EOI. It is now fixed.

This patch must be backported to 3.1.

(cherry picked from commit f7e571859629b40defb37ce3d4331a571ba619ec)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet

The previous fix ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed an issue with the way the SPOE applet was
reporting the end of stream, leading to never shut the applet down.

In fact, there is two bug in one. The first one is about the applet
shutdown. Since the fix above, the applet is no longer closed. Before, it
was closed because it was reported in error. But now, it is just delayed
because the applet and the SPOP stream are declared to support half close
connections. So the applet is only closed when the SPOP connection is
closed. To fix this bug, both side are now stating that half close
connections are not supported.

The second bug is about the way the end of stream is reported. It is
reported when the ACK response is received. But it is too early, because the
parent stream must process the response first. So now, we take care to have
processed the ACK from the parent applet before reporting an end of stream.

This patch must be backported with the commit above to 3.1.

(cherry picked from commit 38aac2c7bcfcce040eabd5e6df9b5c2f783a821a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR

The way appctx EOI/EOS/ERROR flags were reported for applets using the new
API were to state the applet had more data to deliver. But it was not
correct and for APPCTX_FL_EOS, this led to report an error on the SE because
it is not expected. More data to deliver and an end of stream is an
impossible situation.

This was added as a fix by commit b8ca114031 ("BUG/MEDIUM: applet: State
appctx have more data if its EOI/EOS/ERROR flag is set"), mainly to make the
SPOE applet work.

When an applet set one of these flags, it really means it has no more data
to deliver. So we must not try to trigger a new receive to handle these
flags. Instead we must handle them directly in task_process_applet()
function and only if the corresponding SE flags were not already set.

This patch must be backported to 3.1.

(cherry picked from commit 7214dcd52d52ab7fe6ef391aa8039fb3ec40a040)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O handler

The SPOE applet is using the new applet API. Thus end of input, end of
stream and errors must be reported using the applet flags, not the SE
flags. This was not the case. So let's fix it.

It seems this bug is harmless for now.

This patch must be backported to 3.1.

(cherry picked from commit db504fbdbee0afde435910510507212a5a751cba)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it

When a GET/HEAD/OPTIONS/DELETE healthcheck request was formatted, we claimed
there was a "content-length" header set even when there was no payload,
leading to actually send a "content-length: 0" header to the server. It was
unexpected and could be rejected by servers.

When a healthcheck request is sent we must take care to state there is a
"content-length" header when it is explicitly added.

This patch should fix the issue #2851. It must be backported as far as 2.9.

(cherry picked from commit fad68cb16d7f99acd2b327ff2f8a4d9ab88a68d8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: tcp-rules: Don't forward close during tcp-response content rules eval

When the tcp-response content ruleset evaluation is delayed because of an
ACL condition, the close forwarding on the client side is not explicitly
blocked. So it is possible to close the client side before the end of the
response evaluation.

To fix the issue, this is now done in all cases where some data are
missing. Concretely, channel_dont_close() is called in "missing_data" goto
label.

Note it is only a theorical bug (or pending bug). It is not possible to
trigger it for now because an ACL cannot wait for more data when a close was
received. But the code remains a bit weak. It is safer this way. It is
especially mandatory for the "force yield" option that should be added soon.

This patch could be backported to all stable versions.

(cherry picked from commit 04bbfa4354800b893f03294bd02475a16d90ec85)
[wt: trivial ctx adjustment]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records

A Read0 event could be ignored by the FCGI multiplexer if it is blocked on a
partial record. Instead of handling the event, it remained blocked, waiting
for the end of the record.

To fix the issue, the same solution than the H2 multiplexer is used. Two
flags are introduced. The first one, FCGI_CF_END_REACHED, is used to
acknowledge a read0. This flag is set when a read0 was received AND the FCGI
multiplexer must handle it. The second one, FCGI_CF_DEM_SHORT_READ, is set
when the demux is interrupted on a partial record. A short read and a read0
lead to set the FCGI_CF_END_REACHED flag.

With these changes, the FCGI mux should be able to properly handle read0 on
partial records.

This patch should be backported to all stable versions after a period of
observation.

(cherry picked from commit 5f927f603aae4614f82c73ff6c2e3324ecfdf506)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: htx: clarify <mark> parameter for htx_xfer_blks()

Clarify the fact that the first <mark> block is transferred before
stopping when using htx_xfer_blks()

(cherry picked from commit c17e02923256fef61f74b986045b7eb8d1e126c7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()

When transfering blocks from an src to another dst htx representation,
htx_xfer_blks() decreases the size of each block removed from the <count>
value passed in parameter, so it can't transfer more than <count>. The
size must also contains the metadata, represented by a simple
sizeof(struct htk_blk).

However, the code was doing a sizeof(dstblk) instead of a
sizeof(*dstblk) which as the consequence of removing only a size_t from
count. Fortunately htx_blk size is 64bits, so that does not provoke any
problem in 64bits. But on 32bits architecture, the count value is not
decreased correctly and the function could try to transfer more blocks
than allowed by the count parameter.

Must be backported in every stable release.

(cherry picked from commit c6390cdf9ce4e74540544207d6e3cfb31581eaea)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: epoll: skip reports of stale file descriptors

Now that we can see that some events are reported for older instances
of a file descriptor, let's skip these ones instead of reporting
dangerous events on them. It might possibly qualify as a bug if it
helps fixing strange issues in certain environments, in which case it
can make sense to backport it along with the following recent patches:

  DEBUG: fd: add a counter of takeovers of an FD since it was last opened
  MINOR: fd: add a generation number to file descriptors
  DEBUG: epoll: store and compare the FD's generation count with reported event

(cherry picked from commit 8235a24782e528b9bf8ca9dd69c0a147556dfcb5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: epoll: store and compare the FD's generation count with reported event

There have been some reported cases where races between threads in epoll
were causing wrong reports of close or error events. Since the epoll_event
data is 64 bits, we can store the FD's generation counter in the upper
bits to verify if we're speaking about the same instance of the FD as the
current one or a stale one. If the generation number does not match, then
we classify these into 3 conditions and increment the relevant COUNT_IF()
counters (stale report for closed FD, stale report of harmless event on
reopened FD, stale report of HUP/ERR on reopened FD). Tests have shown that
with heavy concurrency, a very small maxconn (typically 1 per thread),
http-reuse always and a server closing connections first but randomly
(httpterm with /C=2r), such events can happen at a pace of a few per second
for the closed FDs, and a few per minute for the other ones, so there's value
in leaving this accessible for troubleshooting. E.g after a few minutes:

  Count     Type Location function(): "condition" [comment]
  5541       CNT ev_epoll.c:296 _do_poll(): "1" [epoll report of event on a just closed fd (harmless)]
  10         CNT ev_epoll.c:294 _do_poll(): "1" [epoll report of event on a closed recycled fd (rare)]
  42         CNT ev_epoll.c:289 _do_poll(): "1" [epoll report of HUP on a stale fd reopened on the same thread (suspicious)]
  212        CNT ev_epoll.c:279 _do_poll(): "1" [epoll report of HUP/ERR on a stale fd reopened on another thread (harmless)]
  1          CNT mux_h1.c:3911 h1_send(): "b_data(&h1c->obuf)" [connection error (send) with pending output data]

This one with the following setup, whicih abuses threads contention by
starting 64 threads on two cores:
- config:
    global
        nbthread 64
        stats socket /tmp/sock1 level admin
        stats timeout 1h
    defaults
        timeout client 5s
        timeout server 5s
        timeout connect 5s
        mode http
    listen p2
        bind :8002
        http-reuse always
        server s1 127.0.0.1:8000 maxconn 4

- haproxy forcefully started on 2C4T:

    $ taskset -c 0,1,4,5 ./haproxy -db -f epoll-dbg.cfg

- httpterm on port 8000, cpus 2,3,6,7 (2C4T)

- h1load with responses larger than a single buffer, and randomly
  closing/keeping alive:

    $ taskset -c 2,3,6,7 h1load -e -t 4 -c 256 -r 1 0:8002/?s=19k/C=2r

(cherry picked from commit 5012b6c6d9e2278ae247e3e372765be3c11af86c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: fd: add a generation number to file descriptors

This patch adds a counter of close() on file descriptors in the fdtab.
The goal is to better detect if reported events concern the current or
a previous file descriptor. For now the counter is only added, and is
showed in "show fd" as "gen". We're reusing unused space at the end of
the struct. If it's needed for something more important later, this
patch can be reverted.

(cherry picked from commit d155924efe453faa0ea5b5e62e6a3e2bf7ee7358)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: fd: add a counter of takeovers of an FD since it was last opened

That's essentially in order to help with debugging strange cases like
the occasional epoll issues/races, by keeping a counter of how many
times an FD was taken over since last inserted. The room is available
so let's use it. If it's needed later, this patch can easily be reverted.
The counter is also reported in "show fd" as "tkov".

(cherry picked from commit 44ac7a7e731537c9388a141b69cff7074afe2376)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing

Late in 3.1 we've added an integrity check to make sure we didn't keep
trash objects allocated before resizing the trash with commit 0bfd36e7b8
("MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()"), but
it turns out that the counter that is being checked includes the number
of objects left in local thread caches. As such it can trigger despite
no object being allocated. This precisely happens when setting
tune.memory.hot-size to a few megabytes because some temporarily used
trash objects will remain in cache.

In order to address this, let's first flush the pool before running
the check. That was previously done by pool_destroy() but the check
had to be inserted before it. So now we first flush the trash pool,
then verify it's no longer used, and finally we can destroy it.

This needs to be backported to 3.1. Thanks to Christian Ruppert for
reporting this bug.

(cherry picked from commit bd7a688b8b38901037cd2f9856533a788ebdbdf7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: epoll: permit to mask certain specific events

A few times in the past we've seen cases where epoll was caught reporting
a wrong event that caused trouble (e.g. spuriously reporting HUP or RDHUP
after a successful connect()). The new tune.epoll.mask-events directive
permits to mask events such as ERR, HUP and RDHUP and convert them to IN
events that are processed by the regular receive path. This should help
better diagnose and troubleshoot issues such as this one, as well as rule
out such a cause when similar issues are reported:

https://github.com/haproxy/haproxy/issues/2368
https://www.spinics.net/lists/netdev/msg876470.html

It should be harmless to backport this if necessary.

(cherry picked from commit 7fa70da06db3e69a61e4451a82a4f980b40b4d0c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: quic: adapt credit based pacing to BBR

Credit based pacing has been further refined to be able to calculate
dynamically burst size based on congestion parameter. However, BBR
algorithm already provides pacing rate and burst size (labelled as
send_quantum) for 1ms of emission.

Adapt quic_pacing_reload() to use BBR values to compute pacing credit.
This is done via pacing_burst callback which is now only defined for
BBR. For other algorithms, determine the burst size over 1ms with the
congestion window size and RTT.

This should be backported up to 3.1.

(cherry picked from commit 42bac9339cc3e3eed99ae1664a4f3633183ab640)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path

Pacing burst size is now dynamic. As such, configuration value has been
removed and related fields in bind_conf and quic_cc_path structures can
be safely removed.

This should be backported up to 3.1.

(cherry picked from commit 7896edccdcdcdde7d7d196205b83e4b3b36f3852)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: quic: use dynamic credit for pacing

Major improvements have been introduced in pacing recently. Most
notably, QMUX schedules emission on a millisecond resolution, which
allow to use passive wait to be much CPU friendly.

However, an issue remains with the pacing max credit. Unless BBR is
used, it is fixed to the configured value from quic-cc-algo bind
statement. This is not practical as if too low, it may drastically
reduce performance due to 1ms sleep resolution. If too high, some
clients will suffer from too much packet loss.

This commit fixes the issue by implementing a dynamic maximum credit
value based on the network condition specific to each clients.
Calculation is done to fix a maximum value which should allow QMUX
current tasklet context to emit enough data to cover the delay with the
next tasklet invokation. As such, avg_loop_us is used to detect the
process load. If too small, 1.5ms is used as minimal value, to cover the
extra delay incurred by the system which will happen for a default 1ms
sleep.

This should be backported up to 3.1.

(cherry picked from commit cb91ccd8a8b548d85bf5155b1a8ce759f1480a4e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: mux-quic: reduce pacing CPU usage with passive wait

Pacing algorithm has been revamped in the previous commit to implement a
credit based solution. This is a far more adaptative solution, in
particular which allow to catch up in case pause between pacing emission
was longer than expected.

This allows QMUX to remove the active loop based on tasklet wake-up.
Instead, a new task is used when emission should be paced. The main
advantage is that CPU usage is drastically reduced.

New pacing task timer is reset each time qcc_io_send() is invoked. Timer
will be set only if pacing engine reports that emission must be
interrupted. In this case timer is set via qcc_wakeup_pacing() to the
delay reported by congestion algorithm, or 1ms if delay is too short. At
the end of qcc_io_cb(), pacing task is queued if timer has been set.

Pacing task execution is simple enough : it immediately wakes up QCC I/O
handler.

Note that to have decent performance, it requires to have a large enough
burst defined in configuration of quic-cc-algo. However, this value is
common to every listener clients, which may cause too much loss under
network conditions. This will be address in a future patch.

This should be backported up to 3.1.

(cherry picked from commit 8098be1fdc71fd548ade4aa0530918bf7cc2a8f4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: quic: implement credit based pacing

Implement a new method for QUIC pacing emission based on credit. This
represents the number of packets which can be emitted in a single burst.
After emission, decrement from the credit the number of emitted packets.
Several emission can be conducted in the same sequence until the credit
is completely decremented.

When a new emission sequence is initiated (i.e. under a new QMUX tasklet
invokation), credit is refilled according to the delay which occured
between the last and current emission context.

This new mechanism main advantage is that it allows to conduct several
emission in the same task context without having to wait between each
invokation. Wait is only forced if pacing is expired, which is now
equivalent to having a null credit.

Furthermore, if delay between two emissions sequence would have been
smaller than expected, credit is only partially refilled. This allows to
restart emission without having to wait for the whole credit to be
available.

On the implementation side, a new field <credit> is avaiable in
quic_pacer structure. It is automatically decremented on
quic_pacing_sent_done() invokation. Also, a new function
quic_pacing_reload() must be used by QUIC MUX when a new emission
sequence is initiated to refill credit. <next> field from quic_pacer has
been removed.

For the moment, credit is based on the burst configured via quic-cc-algo
keyword, or directly reported by BBR.

This should be backported up to 3.1.

(cherry picked from commit 4489a61585283803726b10c5b081c26f24b7d8dd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: increment pacing retry counter on expired

A field <paced_sent_ctr> from quic_pacer structure is used to report the
number of occurences where emission has been interrupted due to pacing.
However, it was not incremented when QUIC MUX had to pause immediately
emission as pacing was still not yet expired.

Fix this by incrementing <paced_sent_ctr> in qcc_io_send() prior to
emission if pacing is expired. Note that incrementation is only done
once if the tasklet is then repeatdely woken up until the timer is
expired.

This should be backported up to 3.1.

(cherry picked from commit 9d8589f0dee050352ae874be6a4fdeecbb428992)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: rename pacing_rate cb to pacing_inter

Rename one of the congestion algorithms pacing callback from pacing_rate
to pacing_inter. This better reflects that this function returns a delay
(in nanoseconds) which should be applied between each packet emission to
fill the congestion window with a perfectly smoothed emission.

This should be backported up to 3.1.

(cherry picked from commit 7c0820892f47642cfa27e49aa1218dfaf640e3b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table types

Some actions such as "sc0_get_gpc0" (using smp_fetch_sc_stkctr()
internally) can take an optional table name as parameter to perform the
lookup on a different table from the tracked one but using the key from
the tracked entry. It is done by leveraging the stktable_lookup() function
which was originally meant to perform intra-table lookups.

Calling sc0_get_gpc0() with a different table name will result in
stktable_lookup() being called to perform lookup using a stktsess from
a different table. While it is theorically fine, it comes with a pitfall:
both tables (the one from where the stktsess originates and the actual
target table) should rely on the exact same key type and length.

Failure to do so actually results in undefined behavior, because the key
type and/or length from one table is used to perform the lookup in
another table, while the underlying lookup API expects explicit type and
key length.

For instance, consider the below example:

  peers testpeers
    bind 127.0.0.1:10001
    server localhost

    table test type binary len 1 size 100k expire 1h store gpc0
    table test2 type string size 100k expire 1h store gpc0

  listen test_px
    mode http
    bind 0.0.0.0:8080
    http-request track-sc0 bin(AA) table testpeers/test
    http-request track-sc1 str(ok) table testpeers/test2
    log-format "%[sc0_get_gpc0(testpeers/test2)]"
    log stdout format raw local0

    server s1 git.haproxy.org:80

Performing a curl request to localhost:8080 will cause unitialized reads
because string "ok" from test2 table will be compared as a string against
"AA" binary sample which is not NULL terminated:

==2450742== Conditional jump or move depends on uninitialised value(s)
==2450742==    at 0x484F238: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2450742==    by 0x27BCE6: stktable_lookup (stick_table.c:539)
==2450742==    by 0x281470: smp_fetch_sc_stkctr (stick_table.c:3580)
==2450742==    by 0x283083: smp_fetch_sc_get_gpc0 (stick_table.c:3788)
==2450742==    by 0x2A805C: sample_process (sample.c:1376)

So let's prevent that by adding some comments in stktable_set_entry()
func description, and by adding a check in smp_fetch_sc_stkctr() to ensure
both source stksess and target table share the same key properties.

While it could be relevant to backport this in all stable versions, it is
probably safer to wait for some time before doing so, to ensure that no
existing configs rely on this ambiguity because the fact that the target
table and source stksess entry need to share the same key type and length
is not explicitly documented.

(cherry picked from commit 5bbdd14f56a2a672e8fef4449e4143b0f0642812)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut

On shut, truncated HTX messages were not properly handled by the H2
multiplexer. Depending on how data were emitted, a chunked HTX message
without the 0-CRLF could be considered as full and an empty data with ES
flag set could be emitted instead of a RST_STREAM(CANCEL) frame.

In the H2 multiplexer, when a shut is performed, an HTX message is
considered as truncated if more HTX data are still expected. It is based on
the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream.
However, this flag is set or unset depending on the HTX extra field
value. This field is used to state how much data that must still be
transferred, based on the announced data length. For a message with a
content-length, this assumption is valid. But for a chunked message, it is
not true. Only the length of the current chunk is announced. So we cannot
rely on this field in that case to know if a message is full or not.

Instead, we must rely on the HTX start-line flags to know if more HTX data
are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag
is set on the HTX start-line), it means that more data are always expected,
until the end of message is reached (the HTX_FL_EOM flag is set on the HTX
message). This is true for bodyless message because the end of message is
reported with the end of headers. This is also true for tunneled messages
because the end of message is received before switching the H2 stream in
tunnel mode.

This patch must be backported as far as 2.8.

(cherry picked from commit b70921f2c19a0612a4f1e31ef16cf8c8b216fdf9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: Fix truncated.vtc to send 0-CRLF

When a chunked messages is sent, the 0-CRLF must be explicitely sent. Since
the begining, it is missing. Just add it.

(cherry picked from commit b93e419750aa8921e76e1e58c57a9bf8eefea969)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-quic: prevent crash after MUX init failure

qmux_init() may fail for several reasons. In this case, connection
resources are freed and underlying and a CONNECTION_CLOSE will be
emitted via its quic_conn instance.

In case of qmux_init() failure, qcc_release() is used to clean up
resources, but QCC <conn> member is first resetted to NULL, as
connection released must be delayed. Some cleanup operations are thus
skipped, one of them is the resetting of <ctx> connection member to
NULL. This may cause a crash as <ctx> is a dangling pointer after QCC
release. One of the possible reproducer is to activate QMUX traces,
which will cause a segfault on the qmux_init() error leave trace.

To fix this, simply reset <ctx> to NULL manually on qmux_init() failure.

This must be backported up to 3.0.

(cherry picked from commit 2715dbe9d065d8700a8fba6e2605a451cfbb72b8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: prevent crash on conn access after MUX init failure

Initially, QUIC-MUX was responsible to reset quic_conn <conn> member to
NULL when MUX was released. This was performed via qcc_release().

However, qcc_release() is also used on qmux_init() failure. In this
case, connection must be freed via its session, so QCC <conn> member is
resetted to NULL prior to qcc_release(), which prevents quic_conn <conn>
member to also be resetted. As the connection is freed soon after,
quic_conn <conn> is a dangling pointer, which may cause crashes.

This bug should be very rare as first it implies that QUIC-MUX
initialization has failed (for example due to a memory alloc error).
Also, <conn> member is rarely used by quic_conn instance. In fact, the
only reproducible crash was done with QUIC traces activated, as in this
case connection is accessed via quic_conn under __trace_enabled()
function.

To fix this, detach connection from quic_conn via the XPRT layer instead
of the MUX. More precisely, this is performed via quic_close(). This
should ensure that it will always be conducted, either on normal
connection closure, but also after special conditions such as MUX init
failure.

This should be backported up to 2.6.

(cherry picked from commit 2cdc4695cb82fce46d67cef17300ec7cf978906e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: fcgi: Don't set the status to 302 if it is already set

When a "Location" header was found in a FCGI response, the status code was
forced to 302. But it should only be performed if no status code was set
first.

So now, we take care to not override an already defined status code when the
"Location" header is found.

This patch should fix the issue #2865. It must backported to all stable
versions.

(cherry picked from commit ca79ed5eefaa65fc82e1a8c1ec4308eaaadaebd1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: filters: Handle filters registered on data with no payload callback

An HTTP filter with no http_payload callback function may be registered on
data. In that case, this filter is obviously not called when some data are
received but it remains important to update its internal state to be sure to
keep it synchronized on the stream, especially its offet value. Otherwise,
the wrong calculation on the global offset may be performed in
flt_http_end(), leading to an integer overflow when data are moved from
input to output. This overflow triggers a BUG_ON() in c_adv().

The same is true for TCP filters with no tcp_payload callback function.

This patch must be backport to all stable versions.

(cherry picked from commit 34542d5ec29c89ec45b63107f9330f185f0bfd40)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker

On reload, the new worker requests bound FDs to the old one. The old worker
sends them in message of at most 252 FDs. Each message is acknowledged by
the new worker. All messages sent or received by the old worker are handled
manually via sendmsg/recv syscalls. So the old worker must be sure consume
all the ACK replies. However, the last one was never consumed. So it was
considered as a command by the CLI applet. This issue was hidden since
recently. But it was the root cause of the issue #2862.

Note this last ack is also the first one when there are less than 252 FDs to
transfer.

This patch must be backported to all stable versions.

(cherry picked from commit 49b7bcf583261efedabad5ba15c4026f2e713c61)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: cli: Be sure to drop all input data in END state

Commit 7214dcd ("BUG/MEDIUM: applet: Don't pretend to have more data to
handle EOI/EOS/ERROR") revealed a bug with the CLI applet. Pending input
data when the applet is in CLI_ST_END state were never consumed or dropped,
leading to a wakeup loop.

The CLI applet implements its own snd_buf callback function. It is important
it consumes all pending input data. Otherwise, the applet is woken up in
loop until it empties the request buffer. Another way to fix the issue would
be to report an error. But in that case, it seems reasonnable to drop these
data.

The issue can be observed on reload, in master/worker mode, because of issue
about the last ACK message which was never consummed by the _getsocks()
command.

This patch should fix the issue #2862. It must be backported to 3.1 with the
commit above.

(cherry picked from commit 972ce87676f40daa76fff61539fb5b0a66b56f4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals

1d3c8223 ("MINOR: ssl: allow to change the server signature algorithm")
mplemented the sigals keyword in the crt-list but never the dump of the
keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 5a7cbb8d8115770c4776836fdf727b26d490d2ad)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals

b6ae2aafde43 ("MINOR: ssl: allow to change the signature algorithm for
client authentication") implemented the client-sigals keyword in the
crt-list but never the dump of the keyword over the CLI.

Must be backported as far as 2.8.

(cherry picked from commit 037d2e5498917d2323a9ad748b9d97aaa688f351)
Signed-off-by: Willy Tarreau <w@1wt.eu>