Repositories - haproxy-3.0.git/log

MINOR: sample: add the "when" converter to condition some expressions

Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.

Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.

A suggested example will be:

# log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"

(cherry picked from commit b74fb1325e60a2573f1ed8a460129bc6c0c9e53f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-h1: Add support of the debug string for logs

Now it is possible to have info about front and back H1 multiplexer. For instance:

<134>Oct 22 18:10:46 haproxy[3841864]: 127.0.0.1:44280 [22/Oct/2024:18:10:43.265] front-http back-http/www 0/0/-1/-1/3082 503 217 - - SC-- 1/1/0/0/3 0/0 "GET / HTTP/1.1" fs=< h1s=0x13b6f10 h1s.flg=0x14010 .sd.flg=0x50404601 .req.state=MSG_DONE .res.state=MSG_DONE .meth=GET status=503 .sd.flg=0x50404601 .sc.flg=0x00034482 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x0 .sub=0 .ibuf
=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x1337d10 .exp=<NEVER> conn.flg=0x80000300> bs=< h1s=0x13bb400 h1s.flg=0x100010 .sd.flg=0x10400001 .req.state=MSG_RQBEFORE .res.state=MSG_RPBEFORE .meth=UNKNOWN status=0 .sd.flg=0x10400001 .sc.flg=0x0003c007 .sc.app=0x11e4c30 .subs=(nil) h1c.flg=0x80000000 .sub=0 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 .task=0x12ba610 .exp=<NEVER> conn.flg=0x5c0300>

The have this log message, the log-format must be set to:

log-format "$HAPROXY_HTTP_LOG_FMT fs=<%[fs.debug_str]> bs=<%[bs.debug_str]>"

(cherry picked from commit ce314cfb395a407484e1b37d76e8af8007c1ff4c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: measure QCS lifetime and its blocking state

Reuse newly defined tot_time structure to measure various values related
to a QCS lifetime.

First, a timer is used to comptabilize the total QCS lifetime. Then, two
other timers are used to account the total time during which Tx from
stream layer to MUX is blocked, either on lack of buffer or due to
flow-control.

These three timers are reported in qmux_dump_qcs_info(). Thus, they are
available in traces and for QUIC MUX debug string sample.

(cherry picked from commit 9f829ea3f34fd15bee50ca91fac8526077f77142)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: time: define tot_time structure

Define a new utility type tot_time. Its purpose is to be able to account
elapsed time accross multiple periods. Functions are defined to easily
start and stop measures, and return the current value.

(cherry picked from commit a6e2523ca1f3dcc90b050d75af62bb867a2acc07)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: dump quic_conn debug string for logs

Define a new xprt_ops callback named dump_info. This can be used to
extend MUX debug string with infos from the lower layer.

Implement dump_info for QUIC stack. For now, only minimal info are
reported : bytes in flight and size of the sending window. This should
allow to detect if the congestion controller is fine. These info are
reported via QUIC MUX debug string sample.

(cherry picked from commit 663416b4ef2fafcffa61c04aa09056853d9674f7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: implement debug string for logs

Implement MUX_SCTL_DBG_STR for QUIC MUX. This returns info for the
current QCS and QCC instances, reusing qmux_dump_qc{c,s}_info functions
already used for traces, and the connection flags.

This stream operation is useful for debug string sample support.

(cherry picked from commit 630fa53c51ddfd3dbdf5e9d2495ca3fb45ff050e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: define dump functions for QCC and QCS

Extract trace code to dump QCC and QCS instances into dedicated
functions named qmux_dump_qc{c,s}_info(). This will allow to easily
print QCC/QCS infos outside of traces.

(cherry picked from commit eb4dfa3b36ae949d4a55782417af08f4689096f4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-h2: implement the debug string for logs

Now it permits to have this for a front and a back:

<134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:64860 [30/Jul/2024:19:32:53.732] test2 test2/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/2.0"  h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1e03fb0(.flg=0x00034482 .app=0x1e04020) .sd=0x1e03f30(.flg=0x50405601) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x100e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x80000300
<134>Jul 30 19:32:53 haproxy[24405]: 127.0.0.1:65246 [30/Jul/2024:19:32:53.732] test1 test1/s1 0/0/0/0/0 200 130 - - ---- 2/1/0/0/0 0/0 "GET /blah HTTP/1.1"  h2s.id=1 .st=CLO .flg=0x7003 .rxbuf=0@(nil)+0/0 .sc=0x1dfc7b0(.flg=0x0006d01b .app=0x1c65fe0) .sd=0x1dfc820(.flg=0x1040ca01) .subs=(nil) h2c.st0=FRH .err=0 .maxid=1 .lastid=-1 .flg=0x108e00 .nbst=0 .nbsc=1, .glitches=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=1 .orph_cnt=0 .sub=1 .dsi=1 .dbuf=0@(nil)+0/0 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] .task=(nil) conn.flg=0x000300

Just with this in the front and back proxies respectively:
  log-format "$HAPROXY_HTTP_LOG_FMT %[bs.debug_str(15)]"
  log-format "$HAPROXY_HTTP_LOG_FMT %[fs.debug_str(15)]"

For now the mux only implements muxs, muxc, conn. Xprt is ignored.

(cherry picked from commit 490cb16d3af03e735b25253a1ac3a641b51760b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: stconn: add a new pair of sf functions {bs,fs}.debug_str

These are passed to the underlying mux to retrieve debug information
at the mux level (stream/connection) as a string that's meant to be
added to logs.

The API is quite complex just because we can't pass any info to the
bottom function. So we construct a union and pass the argument as an
int, and expect the callee to fill that with its buffer in return.

Most likely the mux->ctl and ->sctl API should be reworked before
the release to simplify this.

The functions take an optional argument that is a bit mask of the
layers to dump:
  muxs=1
  muxc=2
  xprt=4
  conn=8
  sock=16

The default (0) logs everything available.

(cherry picked from commit 921e04bf87a94b410287a85d2397a0502520348c)
[cf: The doc was adapted to move debug_str samples in the right section]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: h1-htx: Properly handle bodyless messages

During h1 parsing, there are some postparsing checks to detect bodyless
messages and switch the parsing in DONE state. However, a case was not
properly handled. Responses to HEAD requests with a "transfer-encoding"
header. The response parser remained blocked waiting for the response body.

To fix the issue, the postparsing was sliglty modified. Instead of trying to
handle bodyless messages in a common way between the request and the
response, it is now performed in the dedicated postparsing functions. It is
easier to enumerate all cases, especially because there is already a test
for responses to HEAD requests.

This patch should fix the issue #2836. It must be backported as far as 2.9.

(cherry picked from commit b9cc361b35e66c1f2d26a9b703f8759f70cbc03c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit becc475a16dea31037617baa91d1a5de01ccdd1e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: promex/resolvers: Don't dump metrics if no nameserver is defined

A 'resolvers' section may be defined without any nameserver. In that case,
we must take care to not dump corresponding Prometheus metrics. However
there is an issue that could lead to a crash or a strange infinite loop
because we are looping on an empty list and, at some point, we are
dereferencing an invalid pointer.

There is an issue because the loop on the nameservers of a resolvers section
is performed via callback functions and not the standard list_for_each_entry
macro. So we must take care to properly detect end of the list and empty
lists for nameservers. But the fix is not so simple because resolvers
sections with and without nameservers may be mixed.

To fix the issue, in rslv_promex_start_ts() and rslv_promex_next_ts(), when the
next resolvers section must be evaluated, a loop is now used to properly skip
empty sections.

This patch is related to #2831. Not sure it fixes it. It must be backported
as far as 3.0.

(cherry picked from commit 892eb2bb2ceba2c0e80d51eed69dfc9494a56988)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 00f01f5e636351e1d91dbf62c80e1d0fea34172f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: config: Alert about extra arguments for errorfile and errorloc

errorfile and errorloc directives expect excatly two arguments. But extra
arguments were just ignored while an error should be emitted. It is now
fixed.

This patch could be backported as far as 2.2 if necessary.

(cherry picked from commit f578811c4e8965f243cd16a6f569909a2a07498c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 03d6c648dfc3ab364149aa747c1574c847526193)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add missing "track-sc0" in action keywords matrix

In d54e8f8107 ("DOC: config: reorganize actions into their own section"),
"track-sc0" keyword was properly documented but the keyword was not placed
in the action keywords matrix alongside other track-sc* statements. It
was probably overlooked, so let's fix that.

Could be backported up to 2.9 with d54e8f8107.

(cherry picked from commit f94c63021b9d46a5a77540b6b95cf6e60dcbf5e0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 29b1710c9a86fb41f1c322f68a90d199298696f1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stats: fix segfault caused by uninitialized value in "show schema json"

Since b3d5708 ("MINOR: stats: remove implicit static trash_chunk usage")
a segfault can occur when issuing "show schema json" on the stats socket.

Indeed, now the dumping functions don't rely on trash_chunk anymore, but
instead they rely on the appctx->chunk buffer. However, unlike other
stats dumping commands, the "show schema json" only have an io handler,
and no parse function. With other command, the parse function is
responsible for pre-setting some data, including applet ctx reservation.

Thus due to "show schema json" lacking parsing function, the applet ctx is
used uninitialized, which is a bug obviously.

To fix the issue we simply add a parse function for "show schema json",
although all it does for now is calling applet_reserve_svcctx() for the
current applet ctx.

This issue was reported by @dsuch in GH #2825. It must be backported up
to 3.0.

(cherry picked from commit ac1f413590932d4eb3b78ca5a536845f162b7866)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit f3a3a8941561e0140ac9c76f7b82942b30ab63a0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queue: Make process_srv_queue return the number of streams

Make process_srv_queue() return the number of streams unqueued, as
pendconn_grab_from_px() did, as that number is used by
srv_update_status() to generate logs.

This should be backported up to 2.6 with
111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8

(cherry picked from commit 5b8899b6ccc7dab3a54a51dcb8ba1512bd0c886c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 70588a16903002709cf3c84255ad8ded73f8e584)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h2/rhttp: fix HTTP2 conn counters on reverse

Dedicated HTTP/2 stats proxy counters are available for current and
total number of HTTP/2 connection on both frontend and backend sides.
Both counters are simply incremented into h2_init().

This causes issues when using reverse HTTP. First, increment is not
performed on the expected side, as it is triggered before
h2_conn_reverse() which switches a connection from frontend to backend
or vice versa. For example on active revers side, h2_total_connections
is incremented on the backend only even after connection is reversed and
attached to a listener for the remainder of its lifetime.

h2_open_connections suffers from a similar but arguably worst behavior
as it is also decremented. If increment and decrement operations are not
performed on the same proxy side, which happens for every connection
which has been successfully reversed, it causes an invalid counter
value, possibly with an integer overflow.

To fix this, delay increment operations on reverse HTTP from h2_init()
to h2_conn_reverse(). Both counters are updated only after reverse has
completed, thus using the expected frontend or backend side.

To prevent overflow on h2_open_connections, ensure h2_release()
decrement is not performed if a connection is freed before achieving its
reversal, as in this case it would not have been accounted by H2
counters.

This should be backported up to 2.9.

This should fix github issue #2821.

(cherry picked from commit 8633446337cd322f010ce7b3ca9c257d1ce7848c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit aedc04be5a1fbf72392e68037c4e23b9aa034dc0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add "tune.lua.burst-timeout" to the list of global parameters

"tune.lua.burst-timeout" was properly defined but not listed in the list
of global parameters as it was overlooked in 58e36e5b1 ("MEDIUM: hlua:
introduce tune.lua.burst-timeout")

(cherry picked from commit 67e3270c59f4cb997f2078d395d365e0189326ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 69a0ad6f781a374863af86d87b3abda0a1246773)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: reorder "tune.lua.*" keywords by alphabetical order

Effort was made to properly organize "tune.*" keywords by alphabetical
order, but "tune.lua" keywords didn't follow that rule with care.

Let's fix that.

(cherry picked from commit 985a45d9c77568798b5c0c77cae7ddba219f3b63)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit c2c2047b240a33519d57ef53806bd014cd45d6b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: add example for server "track" keyword

As requested on GH #2325, "track" server keyword could benefit from a
simple config example to show how to make use of it.

That's what we're doing in this commit, thanks to GH user @HAkmiller
for the suggestion.

(cherry picked from commit 48545113f4db86ad43a99ba01742a9811410e6dc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit d270bc5864a3cec0b5faf40ae8ca9db0a365da81)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queues: Do not use pendconn_grab_from_px().

pendconn_grab_from_px() was called when a server was brought back up, to
get some streams waiting in the proxy's queue and get them to run on the
newly available server. It is very similar to process_srv_queue(),
except it only goes through the proxy's queue, which can be a problem,
because there is a small race condition that could lead us to add more
streams to the server queue just as it's going down. If that happens,
the server would just be ignored when back up by new streams, as its
queue is not empty, and it would never try to process its queue.
The other problem with pendconn_grab_from_px() is that it is very
liberal with how it dequeues streams, and it is not very good at
enforcing maxconn, it could lead to having 3*maxconn connections.
For both those reasons, just get rid of pendconn_grab_from_px(), and
just use process_srv_queue().
Both problems are easy to reproduce, especially on a 64 threads machine,
set a maxconn to 100, inject in H2 with 1000 concurrent connections
containing up to 100 streams each, and after a few seconds/minutes the
max number of concurrent output streams will be much higher than
maxconn, and eventually the server will stop processing connections.

It may be related to github issue #2744. Note that it doesn't totally
fix the problem, we can occasionally see a few more connections than
maxconn, but the max that have been observed is 4 more connections, we
no longer get multiple times maxconn.

have more outgoing connections than maxconn,
This should be backported up to 2.6.

(cherry picked from commit 111ea83ed4e13ac3ab028ed5e95201a1b4aa82b8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit ab4ff1b7a6c7685f28fbdea01b38caf7e816fddf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: queues: Make sure we call process_srv_queue() when leaving

In stream_free(), make sure we call process_srv_queue() each time we
call sess_change_server(), otherwise a server may end up not dequeuing
any stream when it could do so. In some extreme cases it could lead to
an infinite loop, as the server would appear to be available, as its
"served" parameter would be < maxconn, but would end up not being used,
as there are elements still in its queue.

This should be backported up to 2.6.

(cherry picked from commit dc9ce9c26469e00ab71fe6387dbd13010d4930f0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 1385e4ca16b3797b0091a959b626935cd7f29b38)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Only consider I/O timers to update stream's expiration date

In sc_notify(), it remained a case where it was possible to set an
expiration date on the stream in the past, leading to a crash because of a
BUG_ON(). This must never happen of course.

In sc_notify(), The stream's expiration may be updated in case no wakeup
conditions are encoutered. In that case, we must take care to never set an
expiration date in the past. However, it appeared there was still a
condition to do so. This code is based on an implicit postulate: the
stream's expiration date must always be set when we leave
process_stream(). It was true since the 2.9. But in 3.0, the buffer
allocation mechanism was improved and on an alloc failure in
process_stream(), the stream is inserted in a wait-list and its expiration
date is set to TICK_ETERNITY. With the good timing, and an analysis
expiration date set on a channel, it is possible to set the stream's
expiration date in past.

After analysis, it appeared that the proper way to fix the issue is to only
evaluate I/O timers (read and write timeout) and not stream's timers
(analase_exp or conn_exp) because only I/O timers may have changed since the
last process_stream() call.

This patch must be backported as far as 3.0 to fix the issue. But it is
probably a good idea to also backported it as far as 2.8.

(cherry picked from commit 4f32d03360e42a3e7fe4ed71012ec90339b9731d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 0510c41910778257fd398309045c9d09bf34ec49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: ssl: add a PEM with mix of LF and CRLF line endings

User tried to update a PEM, generated automatically. Part of this PEM has LF
line endings, and another part (CA certificate), added by some API, has CRLF
line endings. This has revealed a bug in cli_snd_buf(), see more
details in issue GitHUB #2818. So, let's add an example of such PEM in our
SSL regtest.

(cherry picked from commit ea4a148a7d9ead2132757bffd4699146e80da2df)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 296cbefc98f76862111170b8a7b3aeb8d88d107d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: cli: cli_snd_buf: preserve \r\n for payload lines

cli_snd_buf() analyzez input line by line. Before this patch it has always
scanned a given line for the presence of '\r' followed by '\n'.

This is only needed for strings, that contain the commands itself like
"show ssl cert\n", "set ssl cert test.pem <<\n".

In case of strings, which contain the command's payload, like
"-----BEGIN CERTIFICATE-----\r\n", '\r\n' should be preserved
as is.

This patch fixes the GitHub issue #2818.

This patch should be backported in v3.1 and in v3.0.

(cherry picked from commit d60c893991699af1103bcdcd33c915885c056559)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 09ac38f57bb773a05cb4947e85d9adb93856d8b2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY

In clock ticks, 0 is TICK_ETERNITY. Long ago we used to make sure now_ms
couldn't be zero so that it could be assigned to expiration timers, but
it has long changed after functions like tick_add() were instrumented to
make the check. The problem is that aside the rare few accidental direct
assignments to expiration dates, it's also used to mark the beginning of
an event that's later checked against TICK_ETERNITY to know if it has
already struck. The problem in this case is that certain events may just
be replaced or dropped just because they apparently never appeared. It's
probably the case for stconn's "lra" and "fsb" fields, just like it is
for all those involving tick_add_ifset(), like h2c->idle_start.

The right approach would be to change the type of now_ms to something
else that cannot take direct computations and that represents a timestamp,
forcing to always use the conversion functions. The variables holding such
timestamps would also be distinguished from intervals. At first glance we
could have for timestamps:
  - 0 = never happened (for the past), eternity (for the future)
  - X = date
and for intervals:
  - 0 = not set
  - X = interval

However this requires significant changes. Instead for now, let's just
make sure again that now_ms is never 0 by setting it to 1 when this
happens (1 / 4 billion times, or 1ms every 49.7 days).

This will need to be carefully backported to older versions. Note that
with this patch backported, the previous ones fixing the zero date are
not strictly needed.

(cherry picked from commit 5a3735a155d47786c20774a1492f3ed20cfe4df3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stats/server: use watcher to track server during stats dump

If a server A is deleted while a stats dump is currently on it, deletion
is delayed thanks to reference counting. Server A is nonetheless removed
from the proxy list. However, this list is a single linked list. If the
next server B is deleted and freed immediately, server A would still
point to it. This problem has been solved by the prev_deleted list in
servers.

This model seems correct, but it is difficult to ensure completely its
validity. In particular, it implies when stats dump is resumed, server A
elements will be accessed despite the server being in a half-deleted
state.

Thus, it has been decided to completely ditch the refcount mechanism for
stats dump. Instead, use the watcher element to register every stats
dump currently tracking a server instance. Each time a server is deleted
on the CLI, each stats dump element which may points to it are updated
to access the next server instance, or NULL if this is the last server.
This ensures that a server which was deleted via CLI but not completely
freed is never accessed on stats dump resumption.

Currently, no race condition related to dynamic servers and stats dump
is known. However, as described above, the previous model is deemed too
fragile, as such this patch is labelled as bug-fix. It should be
backported up to 2.6, after a reasonable period of observation. It
relies on the following patch :
MINOR: list: define a watcher type

(cherry picked from commit 071ae8ce3d1a318d2227fad2ebf63e78a05815f0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 4e932f79bea653157516a4c41ee26b32f22c470c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: list: define a watcher type

Define a new watcher type into list module. This type is similar to bref
and can be used to register an element which is currently tracking a
dynamic target. Contrary to bref, if the target is freed, every watcher
element are updated to point to a next valid entry or NULL.

This type will simplify handling of dynamic servers deletion, in
particular while stats dump are performed.

This patch is not a bug-fix. However, it is mandatory to fix a race
condition in dynamic servers. Thus, it should be backported along the
next commit up to 2.6.

(cherry picked from commit eafa8a32bb19ee567cb0a60a4042a8ef436e1e94)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b4501697a6321e990a94243c0274b4294dcbbd6b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stats: decrement srv refcount on stats-file release

Servers instance may be removed at runtime. This can occurs during a
stat dump which currently references this server instance. This case is
protected by server refcount to prevent the server immediate release.

CLI output may be interrupted prior to stats dump completion, for
example if client CLI has been disconnected before the full response
transfer. As such, srv_drop() must be called in every stats dump release
callback.

srv_drop() was missing for stats-file dump release callback. This could
cause a race condition which would prevent a server instance to be fully
removed. Fix this by adding srv_drop() invokation into
cli_io_handler_release_dump_stat_file().

This should be backported up to 3.0.

(cherry picked from commit 2199179461957c86f57e39c4d97c37f3e4d10935)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 41df9cac1f2092473347ee9c2f51739bf7ad0013)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: stconn: Don't forward shut for SC in connecting state

In connecting state, shutdown must not be forwarded or scheduled because
otherwise this will prevent any connection retries. Indeed, if a EOS is
reported by the mux during the connection establishment, this should be
handled by the stream to eventually retries. If the write side is closed
first, this will not be possible because the stconn will be switched in DIS
state. If the shut is scheduled because pending data are blocked, the same
may happen, depending on the abort-on-close option.

This patch should be slowly be backported as far as 2.4. But an observation
period is mandatory. On 2.4, the patch must be adapted to use the
stream-interface API.

(cherry picked from commit 72e529829b8eedebc1891970cf8ac64eaf26d006)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.0.7

Released version 3.0.7 with the following main changes :
    - BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
    - MINOR: quic: notify connection layer on handshake completion
    - BUG/MINOR: stream: unblock stream on wait-for-handshake completion
    - BUG/MEDIUM: quic: support wait-for-handshake
    - MINOR: quic: simplify qc_parse_pkt_frms() return path
    - MINOR: quic: use dynamically allocated frame on parsing
    - MINOR: quic: extend return value of CRYPTO parsing
    - BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
    - CLEANUP: guid: remove global tree export
    - BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete
    - BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
    - BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI
    - BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list
    - BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID
    - BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state
    - BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
    - BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration
    - BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration
    - BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
    - DOC: config: A a space before ':' for {bs,fs}.aborted and {bs,fs}.rst_code
    - DOC: config: Fix a typo in "1.3.1. The Request line"
    - BUG/MINOR: http_ana: Report -1 for %Tr for invalid response only
    - DOC: config: Slightly improve the %Tr documentation
    - DOC: config: Move wait_end in section about internal samples
    - DOC: config: Move fs.* and bs.* in section about L5 samples
    - DOC: lua: fix yield-dependent methods expected contexts
    - DOC: configuration: explain quotes and spaces in conditional blocks
    - DOC: configuration: wrap long line for "strstr()" conditional expression
    - BUG/MINOR: http-ana: Adjust the server status before the L7 retries
    - BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames
    - BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding
    - BUG/MEDIUM: h3: Properly limit the number of headers received
    - BUG/MEDIUM: h3: Increase max number of headers when sending headers
    - DOC: config: Improve documentation of tune.http.maxhdr directive
    - BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler()
    - BUG/MEDIUM: wdt: fix the stuck detection for warnings
    - BUG/MINOR: activity/memprofile: reinitialize the free calls on DSO summary
    - MINOR: activity/memprofile: offer a function to unregister stale info
    - BUG/MEDIUM: pools/memprofile: always clean stale pool info on pool_destroy()
    - BUG/MAJOR: mux-h1: Properly handle wrapping on obuf when dumping the first-line
    - BUG/MAJOR: quic: fix wrong packet building due to already acked frames
    - DEV: lags/show-sess-to-flags: Properly handle fd state on server side
    - BUG/MEDIUM: http-ana: Don't release too early the L7 buffer
    - BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set
    - MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
    - BUG/MINOR: log: fix lf_text() behavior with empty string
    - BUG/MEDIUM: event_hdl: fix uninitialized value in async mode when no data is provided
    - BUG/MEDIUM: http-ana: Reset request flag about data sent to perform a L7 retry
    - BUG/MINOR: h1-htx: Use default reason if not set when formatting the response
    - BUG/MINOR: signal: register default handler for SIGINT in signal_init()
    - BUG/MINOR: quic: remove startup alert if conn socket-owner unsupported
    - MINOR: mux-h2/traces: add a missing trace on negative initial window size
    - CLEANUP: mux-h2/traces: reword certain ambiguous traces
    - BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks

BUG/MINOR: server-state: Fix expiration date of srvrq_check tasks

"hold.timeout" was used as expiration date for srvrq_check tasks. But it is
not accurrate. The expiration date must be based on the resolution timeouts
instead (resolve and retry).

The purpose of srvrq_check task is to clean up the server resolution status
when outdated info are inherited from the state file. Using "hold.timeout"
is not accurrate here because hold timeouts concern the resolution response
items not the resolution status of servers. It may be set to a huge value or
0. The expiration date of these tasks must be based on the resolution
timeouts instead.

So now the ("timeout resolve" + resolve_retries * "timeout retry") value is
used.

This patch should fix the issue #2816. It must be backported to all stable
versions.

(cherry picked from commit 647a2906626e9c3d9c3349d338a35798325496f2)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 3746a7d0639ced74bb9f7cff79181be9a0f18e56)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: mux-h2/traces: reword certain ambiguous traces

Some h2 traces were not very clear, let's reword them a bit.

(cherry picked from commit 7c8e9420a23584c7f366aacbfeb308d949f5c7b3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit ff64cbe2092bd5a6a2874d7b44afe322f6348e41)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-h2/traces: add a missing trace on negative initial window size

When a negative initial windows size is reported, we're going to close
the connection, so it's important to report a trace to explain why!
This should be backported at least to 3.1 and possibly 3.0 (adapting the
context since there's no glitches there).

(cherry picked from commit 86823c828f983bd986b150542c7a6482d60b291d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 35bb79b797b85f8b89c87b8b925bc50f6f25865c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: remove startup alert if conn socket-owner unsupported

QUIC relies on several advanced network API features from the kernel to
perform optimally. Checks are performed during startup to ensure that
these features are supported. A fallback is automatically performed for
every incompatible feature.

Besides the automatic fallback mechanism, a message is also reported to
the user at the same time. Previously, alert level was used, but it is
incorrect as it is reserved for unrecoverable errors which should
prevent haproxy to start. Warning level could be used, but this can
annoy users running with zero-warning mode.

This patch removes the alert message when 'socket-owner connection' mode
cannot be activated. Convert the message to a diag level. This allows
users to start without forcing configuration modification to hide a
warning. Besides, several feature fallback such as the polling mechanism
does not emit any warning either, so it's better to adopt a similar
behavior for QUIC features.

This must be backported up to 2.8.

(cherry picked from commit 6fed219fd786f3fdca155f686cf2fa2f9e572697)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 24fa1cc97310e436607f64aa1ce3fd4330a26597)
[cf: ctx adjt]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: signal: register default handler for SIGINT in signal_init()

When haproxy is launched in a background and in a subshell (see example below),
according to POSIX standard (2.11. Signals and Error Handling), it inherits
from the subshell SIG_IGN signal handler for SIGINT and SIGQUIT.

$ (./haproxy -f env4.cfg &)

So, when haproxy is lanched like this, it doesn't stop upon receiving
the SIGINT. This can be a root cause of some unexpected timeouts, when haproxy
is started under VTest, as VTest sends to the process SIGINT in order to
terminate it. To fix this, let's explicitly register the default signal
handler for the SIGINT in signal_init() initcall.

This should be backported in all stable versions.

(cherry picked from commit d3c20b02469dea6f46369bb91965d8b4924bb2b7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a78b02f37d70366ef5afd308de48e8e2c4b54b4a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: h1-htx: Use default reason if not set when formatting the response

When the response status line is formatted before sending it to the client,
if there is no reason set, HAProxy should add one that matches the status
code, as stated in the configuration manual. However it is not performed.

It is possible to hit this bug when the response comes from a H2 server,
because there is no reason field in HTTP/2 and above.

This patch should fix the issue #2798. It should be backported to all stable
versions.

(cherry picked from commit 37487ada739fc86e3acb46c9949196f4f15cc9b1)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 736d4e2c3550dc9c56e5f05778457466b3ce13d9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: http-ana: Reset request flag about data sent to perform a L7 retry

It is possible to loose the request after several L7 retries, leading to
crashes, because the request channel flag stating some data were sent is not
properly reset.

When a L7 retry is performed, some flags on different entities must be reset
to be sure a new connection will be properly retried, just like it was the
first one, mainly because there was no connection establishment failure. One
of them, on the request channel, is not reset. The flag stating some data
were already sent. It is annoying because this flag is used during the
connection establishment to know if an error is triggered at the connection
level or at the data level. In the last case, the error must be handled by
the HTTP response analyzer, to eventually perform another L7 retry.

Because CF_WROTE_DATA flag is not removed when a L7 retry is performed, a
subsequent connection establishment error may be handled as a L7 error while
in fact the request was never sent. It also means the request was never
saved in the buffer used to performed L7 retries. Thus, on the next L7
retires, the request is just lost. This forecefully leads to a bunch of
undefined behavior. One of them is a crash, when the request is used to
perform the load-balancing.

This patch should fix issue #2793. It must be backported to all stable
versions.

(cherry picked from commit 62f37801c881f68060cedb7a74b5b8cb5fcfec81)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit d0129d2c2a408a9dabd486ee129f3ec8b0199270)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: event_hdl: fix uninitialized value in async mode when no data is provided

In _event_hdl_publish(), when we prepare the asynchronous event and no
<data> was provided (set to NULL), we forgot to initialize the _data
event_hdl_async_event struct member to NULL, which leads to uninitialized
reads in event_hdl_async_free_event() when the event is freed:

==1002331== Conditional jump or move depends on uninitialised value(s)
==1002331==    at 0x35D9D1: event_hdl_async_free_event (event_hdl.c:224)
==1002331==    by 0x1CC8EC: hlua_event_runner (hlua.c:9917)
==1002331==    by 0x39AD3F: run_tasks_from_lists (task.c:641)
==1002331==    by 0x39B7B4: process_runnable_tasks (task.c:883)
==1002331==    by 0x314B48: run_poll_loop (haproxy.c:2976)
==1002331==    by 0x315218: run_thread_poll_loop (haproxy.c:3190)
==1002331==    by 0x18061D: main (haproxy.c:3747)

The bug severity was set to MEDIUM because of its nature, and it's best
if this patch can be backported up to 2.8. But in practise it can only be
triggered with events that don't provide optional data: since PAT_REF
events are the first native events making use of this feature, this bug
shouldn't be an issue before f72a66e ("MINOR: pattern: publish event_hdl
events on pat_ref updates")

(cherry picked from commit dd56616067d19060425940f6906cefe6efcd1955)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 5b4381c19fbe87ad2972110330c59e1f231449ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: log: fix lf_text() behavior with empty string

As reported by Baptiste in GH #2797, if a logformat alias leveraging
lf_text() ends up printing nothing (empty string), the whole logformat
evaluation stops, leading garbage log message.

This bug was introduced during 3.0 cycle in fcb7e4b ("MINOR: log: add
lf_rawtext{_len}() functions"). At that time I genuinely thought that
if strlcpy2() returned 0, it was due to a lack of space, actually
forgetting that the function may simply be called with an empty string.

Because of that, lf_text() would return NULL if called with an empty
string, and since all lf_*() helpers are expected to return NULL on
error, this explains why the logformat evaluation immediately stops in
this case.

To fix the issue, let's simply consider that strlcpy2() returning 0 is
not an error, like it was already the case before.

It should be backported in 3.1 and 3.0 with fcb7e4b.

(cherry picked from commit 3e470471b7e0ec113807f6981699fda9538e7ffc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit ef8324f124f1ba0a98648edd49723ee2b8819bbe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding

It may only happens when there is no data to forward but a last stream frame
must be sent with the FIN bit. It is not invalid, but it is useless to send
an empty H3 DATA frame in that case.

(cherry picked from commit 6697e87ae5e1f569dc87cf690b5ecfc049c4aab0)
[ad: This patch was merely considered as an optimization. However, it
is in fact mandatory as it fixes a bug on QUIC zero-copy
implementation. As such, it must be backported up to 2.9.

This bug can happen when iobuf data is null in done_ff, indicating that
no data were transferred. Despite this, qcc_send_stream() was always
called with data incorrectly incremented to iobuf offset, which is equal
to HTTP/3 frame header length. This could cause garbage data emission by
QUIC MUX. The most visible effect is that it provokes a BUG_ON() crash
when QCS instance is released due to Tx offsets desynchronization.

This bug is related to github issue #2678.]

Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>

BUG/MEDIUM: sock: Remove FD_POLL_HUP during connect() if FD_POLL_ERR is not set

epoll_wait() may return EPOLLUP and/or EPOLLRDHUP after an asynchronous
connect(), to indicate that the peer accepted the connection then
immediately closed before epoll_wait() returned. When this happens,
sock_conn_check() is called to check whether or not the connection correctly
established, and after that the receive channel of the socket is assumed to
already be closed. This lets haproxy send the request at best (if RDHUP and
not HUP) then immediately close.

Over the last two years, there were a few reports about this spuriously
happening on connections where network captures proved that the server had
not closed at all (and sometimes even received the request and responded to
it after haproxy had closed). The logs show that a successful connection is
immediately reported on error after the request was sent. After
investigations, it appeared that a EPOLLUP, or eventually a EPOLLRDHUP, can
be reported by epool_wait() during the connect() but in sock_conn_check(),
the connect() reports a success. So the connection is validated but the HUP
is handled on the first receive and an error is reported.

The same behavior could be observed on health-checks, leading HAProxy to
consider the server as DOWN while it is not.

The only explanation at this point is that it is a kernel bug, notably
because it does not even match the documentation for connect() nor epoll. In
addition for now it was only observed with Ubuntu kernels 5.4 and 5.15 and
was never reproduced on any other one.

We have no reproducer but here is the typical strace observed:

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 114
fcntl(114, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(114, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(19, EPOLL_CTL_ADD, 114, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=151, u64=151}}, {events=EPOLLIN, data={u32=59, u64=59}}, {events=EPOLLIN|EPOLLRDHUP, data={u32=114, u64=114}}], 200, 0) = 4
epoll_ctl(19, EPOLL_CTL_MOD, 114, {events=EPOLLOUT, data={u32=114, u64=114}}) = 0
epoll_wait(19, [{events=EPOLLOUT, data={u32=114, u64=114}}, {events=EPOLLIN, data={u32=15, u64=15}}, {events=EPOLLIN, data={u32=10, u64=10}}, {events=EPOLLIN, data={u32=165, u64=165}}], 200, 0) = 4
connect(114, {sa_family=AF_INET, sin_port=htons(11000), sin_addr=inet_addr("A.B.C.D")}, 16) = 0
sendto(114, "POST "..., 1009, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 1009
close(114)                              = 0

Some ressources about this issue:
  - https://www.spinics.net/lists/netdev/msg876470.html
  - https://github.com/haproxy/haproxy/issues/1863
  - https://github.com/haproxy/haproxy/issues/2368

So, to workaround the issue, we have decided to remove FD_POLL_HUP flag on
the FD during the connection establishement if FD_POLL_ERR is not reported
too in sock_conn_check(). This way, the call to connect() is able to
validate or reject the connection. At the end, if the HUP or RDHUP flags
were valid, either connect() would report the error itself, or the next
recv() would return 0 confirming the closure that the poller tried to
report. EPOLL_RDHUP is only an optimization to save a syscall anyway, and
this pattern is so rare that nobody will ever notice the extra call to
recv().

Please note that at least one reporter confirmed that using poll() instead
of epoll() also addressed the problem, so that can also be a temporary
workaround for those discovering the problem without the ability to
immediately upgrade.

The event is accounted via a COUNT_IF(), to be able to spot it in future
issue. Just in case.

This patch should fix the issue #1863 and #2368. It may be related
to #2751. It should be backported as far as 2.4. In 3.0 and below, the
COUNT_IF() must be removed.

(cherry picked from commit 7262433183f590377ace31ff96b1fafa4525b7c2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit b369bdcddfab9627cc3bacc0e75c9e94ac3b24fa)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: http-ana: Don't release too early the L7 buffer

In some cases, the buffer used to store the request to be able to perform a
L7 retry is released released too early, leading to a crash because a retry
is performed with an empty request.

First, there is a test on invalid 101 responses that may be caught by the
"junk-response" retry policy. Then, it is possible to get an error
(empty-response, bad status code...) after an interim response. In both
cases, the L7 buffer is already released while it should not.

To fix the issue, the L7 buffer is now released at the end of the
AN_RES_WAIT_HTTP analyser, but only when a response was successfully
received and processed. In all error cases, the stream is quickly released,
with the L7 buffer. So there is no leak and it is safer this way.

This patch may fix the issue #2793. It must be as far as 2.4.

(cherry picked from commit dc15581c02171eeb49ef3ffbab0f583f38482b4c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DEV: lags/show-sess-to-flags: Properly handle fd state on server side

It must be handled as an hexadecimal value.

(cherry picked from commit ceb80aed579bab9d8db38aa87790bc04b5c9767a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: quic: fix wrong packet building due to already acked frames

If a packet build was asked to probe the peer with frames which have just
been acked, the frames build run by qc_build_frms() could be cancelled  by
qc_stream_frm_is_acked() whose aim is to check that current frames to
be built have not been already acknowledged. In this case the packet build run
by qc_do_build_pkt() is not interrupted, leading to the build of an empty packet
which should be ack-eliciting.

This is a bug detected by the BUG_ON() statement in qc_do_build_pk():

    BUG_ON(qel->pktns->tx.pto_probe &&
           !(pkt->flags & QUIC_FL_TX_PACKET_ACK_ELICITING));

Thank you to @Tristan971 for having reported this issue in GH #2709

This is an old bug which must be backported as far as 2.6.

(cherry picked from commit 96b2641fc8ce58eb1875e7b525c57e58e4b794c3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-h1: Properly handle wrapping on obuf when dumping the first-line

The formatting of the first-line, for a request or a response, does not
properly handle the wrapping of the output buffer. This may lead to a data
corruption for the current response or eventually for the previous one.

Utility functions used to format the first-line of the request or the
response rely on the chunk API. So it is not expected to pass a buffer that
wraps. Unfortunatly, because of a change performed during the 2.9 dev cycle,
the output buffer was direclty used instead of a non-wrapping buffer created
from it with b_make() function. It is not an issue for the request because
its start-line is always the first block formatted in the output buffer. But
for the response, the output may be not empty and may wrap. In that case,
the response start-line is dumped at a random position in the buffer,
corrupting data. AFAIK, it is only an issue if the HTTP request pipelining
is used.

To fix the issue, we now take care to create a non-wapping buffer from the
output buffer.

This patch should fix issues #2779 and #2996. It must be backported as far as
2.9.

(cherry picked from commit b150ae46dd97caa5050d8abefc1d9b619ab5ab9a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: pools/memprofile: always clean stale pool info on pool_destroy()

There's actually a problem with memprofiles: the pool pointer is stored
in ->info but some pools are replaced during startup, such as the trash
pool, leaving a dangling pointer there, that may randomly report crap or
even crash during "show profile memory".

Let's make pool_destroy() call memprof_remove_stale_info() added
by previous patch so that these entries are properly unregistered.

This must be backported along with the previous patch (MINOR:
activity/memprofile: offer a function to unregister stale info) as
far as 2.8.

(cherry picked from commit ed3ed358676edf058663bde7ec6098b51f8bc745)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: activity/memprofile: offer a function to unregister stale info

There's actually a problem with memprofiles: the pool pointer is stored
in ->info but some pools are replaced during startup, such as the trash
pool, leaving a dangling pointer there.

Let's complete the API with a new function memprof_remove_stale_info()
that will remove all stale references to this info pointer. It's also
present when USE_MEMORY_PROFILING is not set so as to ease the job on
callers.

(cherry picked from commit 859341c1ec583c586ef36db0b63cd84f3843bfab)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: activity/memprofile: reinitialize the free calls on DSO summary

In commit 401fb0e87a ("MINOR: activity/memprofile: show per-DSO stats")
we added a summary per DSO. However the free calls/tot were not initialized
when creating a new entry because initially they were applied to any entry,
but since we don't update free calls for non-free capable callers, we still
need to reinitialize these entries when reassigning one. Because of this
bug, a "show profiling memory" output can randomly show highly negative
values on the DSO lines if it turns out that the DSO entry was created on
an alloc instead of a realloc/free.

Since the commit above was backported to 2.9, this one must go there as
well.

(cherry picked from commit c42a2b8c945d1b45672a2b1715dfa586daaec657)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: wdt: fix the stuck detection for warnings

If two slow tasks trigger one warning even a few seconds apart, the
watchdog code will mistakenly take this for a definite stuck task and
kill the process. The reason is that since commit 148eb5875f ("DEBUG:
wdt: better detect apparently locked up threads and warn about them")
the updated ctxsw count is not the correct one, instead of updating
the private counter it resets the public one, preventing it from making
progress and making the wdt believe that no progress was made. In
addition the initial value was read from [tid] instead of [thr].

Please note that another fix is needed in debug_handler() otherwise the
watchdog will fire early after the first warning or thread dump.

A simple test for this is to issue several of these commands back-to-back
on the CLI, which crashes an unfixed 3.1 very quickly:

$ socat /tmp/sock1 - <<< "expert-mode on; debug dev loop 1000"

This needs to be backported to 2.9 since the fix above was backported
there. The impact on 3.0 and 2.9 is almost inexistent since the watchdog
there doesn't apply the shorter warning delay, so the first call already
indicates that the thread is stuck.

(cherry picked from commit 24ce001771a7609b2a3902fc1f851668ef176c59)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: debug: don't set the STUCK flag from debug_handler()

Since 2.0 with commit e6a02fa65a ("MINOR: threads: add a "stuck" flag
to the thread_info struct"), the TH_FL_STUCK flag was set by the
debugger to flag that a thread was stuck and report it in the output.

However, two commits later (2bfefdbaef "MAJOR: watchdog: implement a
thread lockup detection mechanism"), this flag was used to detect that
a thread had already been reported as stuck. The problem is that it
seldom happens that a "show threads" command instantly crashes because
it calls debug_handler(), which sets the flag, and if the watchdog timer
was about to trigger before going back to the scheduler, the watchdog
believes that the thread has been stuck for a while and will kill the
process.

The issue was magnified in 3.1 with the lower-delay warning, because
it's possible for a thread to die on the next wakeup after the first
warning (which calls debug_handler() hence sets the STUCK flag).

One good approach would have been to use two distinct flags, one for
"stuck" as reported by the debug handler, and one for "stuck" as seen
by the watchdog. However, one could also argue that since the second
commit, given that the wdt monitors the threads, there's no point any
more for the debug handler to set the flag itself. Removing this code
means that two consecutive "show threads" will not report "stuck" until
the watchdog sets it, which aligns better with expectations.

This can be backported to all stable releases. This code has changed a
bit over time, the "if" block and the harmless variables just need to
be removed.

(cherry picked from commit 1151fe68186cf862882f147de208c509c25d525e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Improve documentation of tune.http.maxhdr directive

The description was inproved to clrealy mentionned it is applied on received
requests and responses. In addition, a comment was added about HTTP/2 and
HTTP/3 limitation when messages are encoded to be sent.

(cherry picked from commit e863d8d6814224961724157c605c77ddab85cbae)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: h3: Increase max number of headers when sending headers

In the same way than for the H2, the maximum number of headers that can be
encoded when headers are sent must be increased to match the limit imposed
when they are received.

Reasons are the sames. On receive path, the maximum number of headers
accepted must be higher than the configured limit to be able to handle
pseudo headers and cookies headers. On the sending path, the same limit must
be applied because the pseudo headers will consume some extra slots and the
cookie header could be splitted.

This patch should be backported as far as 2.6.

(cherry picked from commit 3bd9a9e7d7a8d7869015eaf041b3ae7a0761c1d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: h3: Properly limit the number of headers received

The number of headers are limited before the decoding but pseudo headers and
cookie headers consume extra slots. In practice, this lowers the maximum number
of headers that can be received.

To workaround this issue, the limit is doubled during the frame decoding to be
sure to have enough extra slots. And the number of headers is tested against the
configured limit after the HTX message was created to be able to report an
error. Unfortunatly no parsing error are reported because the QUIC multiplexer
is not able to do so for now.

The same is performed on trailers to be consistent with H2.

This patch should be backported as far as 2.6.

(cherry picked from commit 785e63335374a6db8ef35205cdb36ea726710061)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Check the number of headers in HEADERS frame after decoding

There is no explicit test on the number of headers when a HEADERS frame is
received. It is implicitely limited by the size of the header list. But it
is twice the configured limit to be sure to decode the frame.

So now, a check is performed after the HTX message was created. This way, we
are sure to not exceed the configured limit after the decoding stage. If
there are too many headers, a parsing error is reported.

Note the same is performed on the trailers.

This patch should patially address the issue #2685. It should be backported
to all stable versions.

(cherry picked from commit 63d2760dfa679bea4b7a61a1a8702af23cf26e75)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Increase max number of headers when encoding HEADERS frames

When a HEADERS frame is encoded to be sent, the maximum number of headers
allowed in the frame is lower than on receiving path. This can lead to
report a sending error while the message was accepted. It could be
confusing.

In addition, the start-line is splitted into pseudo-headers and consummes
this way some header slots, increasing the difference between HEADERS frames
encoding and decoding. It is even more noticeable because when a HEADERS
frame is decoded, a margin is used to be able to handle splitted cookie
headers. Concretly, on decoding path, a limit of twice the maxumum number of
headers allowed in a message (tune.http.maxhdr * 2) is used. On encoding
path, the exact limit is used. It is not consistent.

Note that when a frame is decoded, we must use a larger limit because the
pseudo headers are reassembled in the start-line and must count for one. But
also because, most of time, the cookies are splitted into several headers
and are reassembled too.

To fix the issue, the same ratio is applied on sending path. A limit must be
defined because an dynamic allocation is not acceptable. Twice of the
configured limit should be good enough to support headers manipulation.

This patch should be backported to all stable versions.

(cherry picked from commit e415e3cb7aa1feaac3ed703687656e09dd464eb3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-ana: Adjust the server status before the L7 retries

The server status must be adjusted, if necessary, at each retry. It is
properly performed when "obersve layer4" directive is set. But for the layer
7, only the last attempt was considered.

When the L7 retries were implemented, all retries were added before the
server status adjutement. So only the last attempt was considered. To fix
the issue, we must adjut the server status first, and then try to perform a
L7 retry.

This patch should fix the issue #2679. It must be backported to all stable
versions.

(cherry picked from commit 2a5da31ccef239e21d17ec34430fdc6b51b9cc67)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: configuration: wrap long line for "strstr()" conditional expression

This keyword had too long a description line, let's split it. This can be
backported to 2.8.

(cherry picked from commit 5c15899410c722e2ff4a01f6d70dc40095b43ff5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: configuration: explain quotes and spaces in conditional blocks

Conditional blocks inherit the same tokenizer and argument parser as
the rest of the configuration, but are also silently concatenated
around groups of spaces and tabs. This can lead to subtle failures
for configs containing spaces around commas and parenthesis, where
a string comparison might silently fail for example. Let's better
document this particular case.

Thanks to Valentine for analysing and reporting the problem.

This can be backported to 2.4.

(cherry picked from commit da1620b3175c63b768a8537951667885fef77e8c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: lua: fix yield-dependent methods expected contexts

Contrary to what the doc states, it is not expected (nor relevant) to
use yield-dependent methods such as core.yield() or core.(m)sleep() from
contexts that don't support yielding. Such contexts include body, init,
fetches and converters.

Thus the doc got it wrong since the beginning, because such methods were
never supported from the above contexts, yet it was listed in the list
of compatible contexts (probably the result of a copy-paste), which is
error-prone because it could either cause a Lua runtime error to be
thrown, or be ignored in some other cases.

It should be backported to all stable versions.

(cherry picked from commit 501827ebe0ad8f4121c4397267afbc7968e3d9af)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Move fs.* and bs.* in section about L5 samples

These sample fetch functions were added in the wrong section. Move them in
the section about sample fetch functions at L5 layer.

(cherry picked from commit e68c6852adb7051a30e209c5a0604f192182b42d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Move wait_end in section about internal samples

wait_end is an internal sample fetch functions and not a L6 one. So move it
in the corresponding section.

(cherry picked from commit 4ccc3f40488bfeed93f0df7d339444fe6503ee4e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Slightly improve the %Tr documentation

Specify -1 can also be reported for %Tr delay when the response is invalid.

(cherry picked from commit e9021a4ca1d6a70cb647441aae78ec4d35bb7c1a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http_ana: Report -1 for %Tr for invalid response only

The server response time is erroneously reported as -1 when it is
intercepted by HAProxy.

As stated in the documentation, the server response time is reported as -1
when the last response header was never seen. It happens when a server
timeout is triggered before the server managed to process the request. It
also happens if the response is invalid. This may be reported by the mux
during the response parsing, but also by the HTTP analyzers. However, in
this last case, the response time must only be reported as -1 on 502.

This patch must be backported to all stable versions. It should fix the
issue #2384.

(cherry picked from commit 5863d33fce702c46b77c07d4ea82e036b11417a6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Fix a typo in "1.3.1. The Request line"

At the beginning of the last paragraph of this section, HTTP/3 was used
instead of HTTP/2. It is not fixed.

(cherry picked from commit 18de419f9647ad5fe0006900e2c1587bffd49c24)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: A a space before ':' for {bs,fs}.aborted and {bs,fs}.rst_code

A space was missing before the ':' for the sample fetch functions above. It
was an issue for the text to HTML conversion script. So, let's fix it.

(cherry picked from commit 3af2d91b3b6ebe1587bcb17f5fb223436df67253)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration

Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be a reconnect programmed upon signal
receipt at the wrapping date not having a working timeout.

This should be backported where it applies.

(cherry picked from commit ed55ff878d5af35dae70f78023ab2141d36e5866)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration

Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact looks nul since the task is also woken up, but better
not leave such tasks in the timer tree anyway.

This should be backported where it applies.

(cherry picked from commit f66bfcff96082ce5c98c635c5da7a9ba157a20af)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration

Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be mailers suddenly stopping.

This should be backported where it applies.

(cherry picked from commit 841be4cdd15b3d0834a478cc95ebda0f47171b4d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration

Now_ms can be zero nowadays, so it's not suitable for direct assignment to
t->expire, as there's a risk that the timer never wakes up once assigned
(TICK_ETERNITY). Let's use tick_add(now_ms, 0) for an immediate wakeup
instead. The impact here might be health checks suddenly stopping.

This should be backported where it applies.

(cherry picked from commit 2f287f14f355e734e512732e35aebf993d000792)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state

L7-retries may be ignored if server aborts are detected during the request
forwarding, when the request is already in DONE state.

When a request was fully processed (so in HTTP_MSG_DONE state) and is
waiting for be forwarded to the server, there is a test to detect server
aborts, to be able to report the error. However, this test must be skipped
if the response was not received yet, to let the reponse analyszers handle
the abort. It is important to properly handle the retries. This test must
only be performed if the response analysis was finished. It means the
response must be at least in HTTP_MSG_BODY state.

This patch should be backported as far as 2.8.

(cherry picked from commit a930e99f4699676ea72f72ba1fb99c953da0d74e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID

On server side, the H2 stream is first created with an unassigned ID (ID ==
0). Its ID is assigned when the request is emitted, before formatting the
HEADERS frame. However, the session may be aborted during that stage. We
must take care to not emit RST_STREAM frame for this stream, because it does
not exist yet for the server.

It is especially important to do so because, depending on the timing, it may
also happens before the H2 PREFACE was sent.

This patch must be backported to all stable versions. It is related to issue

(cherry picked from commit f065d0009888c394e5f93dfdaa2ae79958b2c2e2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list

When a resolver is woken up to process DNS resolutions, it is possible to
trigger an infinite loop on the resolver's wait list because delayed
resolutions are always reinserted at the end of this list. This leads the
watchdog to kill the process. By re-inserting them in front of the list,
that fixes the bug.

When a resolver tries to send the queries for the resolutions in its wait
list, it may be unable to proceed for a resolution. This may happen because
the resolution must be skipped (no hostname to resolv, a resolution already
in-progress) or when an error occurred. In that case, the resolution is
re-inserted in the resolver's wait list to be retry later, on a next wakeup.

However, the resolution is inserted at the end of the wait list. So it is
immediately reevaluated, in the same execution loop, instead of to be
delayed. Most of time, it is not an issue because the resolution is
considered as not expired on the second run. But it is an problem when the
internal time wraps and is equal to 0. In that case, the resolution
expiration date is badly computed and it is always considered as expired. If
two or more resolutions are in that state, the resolver loops for ever on
its wait list, until the process is killed by the watchdog.

So we can argue that the way the resolution expiration date is computed must
be fixed. And it would be true in a perfect world. However, the resolvers
code is so crapy that it is hard to be sure to not introduce regressions. It
is farly easier to re-insert delayed resolutions in front of the wait
list. This fixes the issue and at worst, these resolutions will be evaluated
one time too many on the next wakeup and only if now_ms was equal to 0 on
the prior wakeup.

This patch should be backported to all stable versions. On 2.2, LIST_ADD()
must be used instead of LIST_INSERT()

(cherry picked from commit 8f28dbeea94e11e2327362755f16d18b301fd153)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI

Before this fix, HAPROXY_CLI and HAPROXY_MASTER_CLI have contained along with
CLI sockets addresses internal sockpairs, which are used only for master CLI
(reload sockpair and sockpair shared with a worker process). These internal
sockpairs are always need to be hidden.

At the moment there is no any client, who uses sockpair addresses for the
stats listener or in order to connect to master CLI. So, let's simply not copy
these internal sockpair addresses of MASTER and GLOBAL proxy listeners.

As listeners with sockpairs are skipped and they can be presented in the
listeners list in any order, let's add semicolon separator between addresses
only in the case, when there are already some string saved in the trash and we
are sure, that we are adding a new address to it. Otherwise, we could have such
weird output:

HAPROXY_MASTER_CLI=unix@/tmp/mcli.sock;;

This fix is need to be backported in all stable versions.

(cherry picked from commit 113745e6f0c0ef8fe89e89fdfdcc6ed994889d4a)
[cf: ctx adjt]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error

A packet which contains several splitted and out of order CRYPTO frames
may be parsed multiple times to ensure it can be handled via ncbuf. Only
3 iterations can be performed to prevent excessive CPU usage.

There is a risk of crash if packet parsing is interrupted after maximum
iterations is reached, or no progress can be made on the ncbuf. This is
because <frm> may be dangling after list_for_each_entry_safe()

The crash occurs on qc_frm_free() invokation, on error path of
qc_parse_pkt_frms(). To fix it, always reset frm to NULL after
list_for_each_entry_safe() to ensure it is not dangling.

This should fix new report on github isue #2776. This regression has
been triggered by the following patch :
1767196d5b2d8d1e557f7b3911a940000166ecda
BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As such, it must be backported up to 2.6, after the above patch.

(cherry picked from commit 2975e8805d9e84010bf5199a2365d650923dbb2c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete

Since 3.0, it is possible to assign a GUID to proxies, listeners and
servers. These objects are stored in a global tree guid_tree.

Proxies and listeners are static. However, servers may be added or
deleted at runtime, which imply that guid_tree must be protected. Fix
this by declaring a read-write lock to protect tree access.

For now, only guid_insert() and guid_remove() are protected using a
write lock. Outside of these, GUID tree is not accessed at runtime. If
server CLI commands are extended to support GUID as server identifier,
lookup operation should be extended with a read lock protection.

Note that during stat-file preloading, GUID tree is accessed for lookup.
However, as it is performed on startup which is single threaded, there
is no need for lock here. A BUG_ON() has been added to ensure this
precondition remains true.

This bug could caused a segfault when using dynamic servers with GUID.
However, it was never reproduced for now.

This must be backported up to 3.0. To avoid a conflict issue, the
previous cleanup patch can be merged before it.

(cherry picked from commit 8e0e7d9d1af5b2dfec2e625d2c19dd034c36eb04)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

CLEANUP: guid: remove global tree export

guid_tree is not directly used outside of functions provided by the guid
module. Remove its export from the include file.

(cherry picked from commit b70880cdc9c01602197fd124c84ab264f6b4ddfb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

A ClientHello may be splitted accross several different CRYPTO frames,
then mixed in a single QUIC packet. This is used notably by clients such
as chrome to render the first Initial packet opaque to middleboxes.

Each packet frame is handled sequentially. Out-of-order CRYPTO frames
are buffered in a ncbuf, until gaps are filled and data is transferred
to the SSL stack. If CRYPTO frames are heavily splitted with small
fragments, buffering may fail as ncbuf does not support small gaps. This
causes the whole packet to be rejected and unacknowledged. It could be
solved if the client reemits its ClientHello after remixing its CRYPTO
frames.

This patch is written to improve CRYPTO frame parsing. Each CRYPTO
frames which cannot be buffered due to ncbuf limitation are now stored
in a temporary list. Packet parsing is completed until all frames have
been handled. If temporary list is not empty, reparsing is done on the
stored frames. With the newly buffered CRYPTO frames, ncbuf insert
operation may this time succeeds if the frame now covers a whole gap.
Reparsing will loop until either no progress can be made or it has been
done at least 3 times, to prevent CPU utilization.

This patch should fix github issue #2776.

This should be backported up to 2.6, after a period of observation. Note
that it relies on the following refactor patches :
  MINOR: quic: extend return value of CRYPTO parsing
  MINOR: quic: use dynamically allocated frame on parsing
  MINOR: quic: simplify qc_parse_pkt_frms() return path

(cherry picked from commit 1767196d5b2d8d1e557f7b3911a940000166ecda)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: extend return value of CRYPTO parsing

qc_handle_crypto_frm() is the function used to handled a newly received
CRYPTO frame. Change its API to use a newly dedicated return type. This
allows to report if the frame was properly handled, ignored if already
parsed previously or rejected after a fatal error.

This commit does not have any functional changes. However, it allows to
simplify qc_handle_crypto_frm() API by removing <fast_retrans> as output
parameter. Also, this patch will be necessary to support multiple
iteration of packet parsing for CRYPTO frames.

(cherry picked from commit d65e782c8cd2f8554404dd1424e2d64f3786edb1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: use dynamically allocated frame on parsing

qc_parse_pkt_frms() is the function responsible to parse a received QUIC
packet. Payload is decoded and splitted into individual frames which are
then handled individually. Previously, frame was used as locally stack
allocated. Change this to work on a dynamically allocated frame.

This commit does bring any functional changes. However, it will be
useful to extend packet parsing. In particular, it will be necessary to
save some frames during parsing to reparse them after the others.

(cherry picked from commit 190fc97606560568bf4a611d92c1e70aed057843)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: simplify qc_parse_pkt_frms() return path

Change qc_parse_pkt_frms() return path for normal and error cases. Most
notably, it allows to remove local variable ret as now return value is
hardcoded on normal and err label. This also allows to define a
different trace for error leaving code.

(cherry picked from commit 498a99a84956535a9ce2a61cb908d0fc81165606)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: quic: support wait-for-handshake

wait-for-handshake http-request action was completely ineffective with
QUIC protocol. This commit implements its support for QUIC.

QUIC MUX layer is extended to support wait-for-handshake. A new function
qcc_handle_wait_for_hs() is executed during qcc_io_process(). It detects
if MUX processing occurs after underlying QUIC handshake completion. If
this is the case, it indicates that early data may be received. As such,
connection is flagged with CO_FL_EARLY_SSL_HS, which is necessary to
block stream processing on wait-for-handshake action.

After this, qcc subscribs on quic_conn layer for RECV notification. This
is used to detect QUIC handshake completion. Thus,
qcc_handle_wait_for_hs() can be reexecuted one last time, to remove
CO_FL_EARLY_SSL_HS and notify every streams flagged as
SE_FL_WAIT_FOR_HS.

This patch must be backported up to 2.6, after a mandatory period of
observation. Note that it relies on the backport of the two previous
patches :
- MINOR: quic: notify connection layer on handshake completion
- BUG/MINOR: stream: unblock stream on wait-for-handshake completion

(cherry picked from commit 0918c41ef63964a986c627d20b8a1324de639cc2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: unblock stream on wait-for-handshake completion

wait-for-handshake is an http-request action which permits to delay the
processing of content received as TLS early data. The action yields
as long as connection handshake is in progress. In the meantime, stconn
is flagged with SE_FL_WAIT_FOR_HS.

When the handshake is finished, MUX layer is responsible to woken up
SE_FL_WAIT_FOR_HS flagged stconn instances to restart the stream
processing. On sc_conn_process(), SE_FL_WAIT_FOR_HS flag is removed and
stream layer is woken up.

However, there may be a blocking after MUX notification. sc_conn_recv()
may return 0 due to no new data reception, which prevents
sc_conn_process() execution. The stream is thus blocked until its
timeout.

To fix this, checks in sc_conn_recv() about the handshake termination
condition. If true, explicitely returns 1 to ensure sc_conn_process()
will be executed.

Note that this bug is not reproducible due to various conditions related
to early data implementation in haproxy. Indeed, connection layer
instantiation is always delayed until SSL handshake completion, which
prevents the handling of early data as expected.

This fix will be necessary to implement wait-for-handshake support for
QUIC. As such, it must be backported with the next commit up to 2.6,
after a mandatory period of observation.

(cherry picked from commit 73031e81cdd5cf5ba889ed4c676a4ae6284f5cf6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: quic: notify connection layer on handshake completion

Wake up connection layer on QUIC handshake completion via
quic_conn_io_cb. Select SUB_RETRY_RECV as this was previously unused by
QUIC MUX layer.

For the moment, QUIC MUX never subscribes for handshake completion.
However, this will be necessary for features such as the delaying of
early data forwarding via wait-for-handshake.

This patch will be necessary to implement wait-for-handshake support for
QUIC. As such, it must be backported with next commits up to 2.6,
after a mandatory period of observation.

(cherry picked from commit 5a5950e42d7060ee311e51438f4f16ad0effefd9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}

Using valgrind when running map_beg or map_str, the following error is
reported:

==242644== Conditional jump or move depends on uninitialised value(s)
==242644==    at 0x2E4AB1: pat_match_str (pattern.c:457)
==242644==    by 0x2E81ED: pattern_exec_match (pattern.c:2560)
==242644==    by 0x343176: sample_conv_map (map.c:211)
==242644==    by 0x27522F: sample_process_cnv (sample.c:1330)
==242644==    by 0x2752DB: sample_process (sample.c:1373)
==242644==    by 0x319917: action_store (vars.c:814)
==242644==    by 0x24D451: http_req_get_intercept_rule (http_ana.c:2697)

In fact, the error is legit, because in pat_match_{beg,str}, we
dereference the buffer on len+1 to check if a value was previously set,
and then decide to force NULL-byte if it wasn't set.

But the approach is no longer compatible with current architecture:
data past str.data is not guaranteed to be initialized in the buffer.
Thus we cannot dereference the value, else we expose us to uninitialized
read errors. Moreover, the check is useless, because we systematically
set the ending byte to 0 when the conditions are met.

Finally, restoring the older value after the lookup is not relevant:
indeed, either the sample is marked as const and in such case it
is already duplicated, or the sample is not const and we forcefully add
a terminating NULL byte outside from the actual string bytes (since we're
past str.data), so as we didn't alter effective string data and that data
past str.data cannot be dereferenced anyway as it isn't guaranteed to be
initialized, there's no point in restoring previous uninitialized data.

It could be backported in all stable versions. But since this was only
detected by valgrind and isn't known to cause issues in existing
deployments, it's probably better to wait a bit before backporting it
to avoid any breakage.. although the fix should be theoretically harmless.

(cherry picked from commit 8157c1caf26618d77b32be7906e4b608a8c0729b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 3.0.6

Released version 3.0.6 with the following main changes :
    - MINOR: connection: No longer include stconn type header in connection-t.h
    - BUG/MINOR: h1: do not forward h2c upgrade header token
    - BUG/MINOR: h2: reject extended connect for h2c protocol
    - MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
    - BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
    - REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
    - REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
    - BUG/MINOR: mux-quic: report glitches to session
    - BUG/MEDIUM: cli: Be sure to catch immediate client abort
    - BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
    - BUG/MINOR: server: make sure the HMAINT state is part of MAINT
    - BUG/MINOR: cfgparse-global: fix allowed args number for setenv
    - BUILD: tools: only include execinfo.h for the real backtrace() function
    - MINOR: tools: do not attempt to use backtrace() on linux without glibc
    - MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
    - BUG/MEDIUM: stream: make stream_shutdown() async-safe
    - BUG/MINOR: queue: make sure that maintenance redispatches server queue
    - MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
    - BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
    - BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
    - BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
    - MINOR: debug: make mark_tainted() return the previous value
    - MINOR: chunk: drop the global thread_dump_buffer
    - MINOR: debug: split ha_thread_dump() in two parts
    - MINOR: debug: slightly change the thread_dump_pointer signification
    - MINOR: debug: make ha_thread_dump_done() take the pointer to be used
    - MINOR: debug: replace ha_thread_dump() with its two components
    - MEDIUM: debug: on panic, make the target thread automatically allocate its buf
    - BUG/MEDIUM: server: server stuck in maintenance after FQDN change
    - BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
    - BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()
    - BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
    - BUG/MEDIUM: queue: make sure never to queue when there's no more served conns
    - BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()
    - BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send
    - BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid
    - BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()
    - BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter
    - REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
    - BUG/MINOR: quic: avoid leaking post handshake frames
    - BUG/MEDIUM: quic: avoid freezing 0RTT connections
    - DOC: config: fix rfc7239 forwarded typo in desc
    - BUG/MINOR: mworker: fix mworker-max-reloads parser
    - BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
    - BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
    - BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
    - BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
    - MINOR: activity/memprofile: always return "other" bin on NULL return address
    - MINOR: activity/memprofile: show per-DSO stats
    - BUG/MINOR: server: fix dynamic server leak with check on failed init
    - BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
    - BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
    - BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
    - BUG/MEDIUM: server: fix race on servers_list during server deletion
    - BUILD: debug: silence a build warning with threads disabled
    - MINOR: pools: export the pools variable
    - MINOR: debug: place a magic pattern at the beginning of post_mortem
    - MINOR: debug: place the post_mortem struct in its own section.
    - MINOR: debug: store important pointers in post_mortem
    - MINOR: cli: remove non-printable characters from 'debug dev fd'
    - BUG/MINOR: trace: stop rewriting argv with -dt
    - BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
    - DOC: config: add missing glitch_{cnt,rate} data types
    - DOC: config: add missing glitch_{cnt,rate} sample definitions
    - BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
    - BUG/MINOR: http-ana: Report internal error if an action yields on a final eval
    - MINOR: stream: Save last evaluated rule on invalid yield
    - BUG/MEDIUM: promex: Fix dump of extra counters
    - DOC: config: document connection error 44 (reverse connect failure)
    - CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
    - BUG/MINOR: quic: fix malformed probing packet building
    - MINOR: cli/debug: show dev: add cmdline and version
    - MINOR: stream/stats: Expose the current number of streams in stats
    - MINOR: stream/stats: Expose the total number of streams ever created in stats
    - BUG/MINOR: stats: Fix the name for the total number of streams created
    - MINOR: connection: add more connection error codes to cover common errno
    - MINOR: rawsock: set connection error codes when returning from recv/send/splice
    - MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name
    - MINOR: debug: print gdb hints when crashing
    - MINOR: debug: do not limit backtraces to stuck threads
    - MINOR: debug: also add a pointer to struct global to post_mortem
    - MINOR: debug: also add fdtab and acitvity to struct post_mortem
    - MINOR: debug: remove the redundant process.thread_info array from post_mortem
    - MINOR: wdt: move the local timers to a struct
    - MINOR: debug: add a function to dump a stuck thread
    - DEBUG: wdt: better detect apparently locked up threads and warn about them
    - DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
    - DEBUG: wdt: make the blocked traffic warning delay configurable
    - DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
    - BUILD: debug: also declare strlen() in __ABORT_NOW()
    - BUILD: Missing inclusion header for ssize_t type
    - MINOR: debug: move the "recover now" warn message after the optional notes

MINOR: debug: move the "recover now" warn message after the optional notes

At the end of the too long processing warning added by commit 0950778b3a
("MINOR: debug: add a function to dump a stuck thread"), there can be some
optional notes about lua and memory trimming. However it's a bit awkward
that they appear after the "trying to recover now" message. Let's just move
that message after the notes.

(cherry picked from commit 5dcf2012fc035e790c118590a12240e0769fbcaa)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: Missing inclusion header for ssize_t type

Compilation issue detected as follows by gcc:

In file included from src/ncbuf.c:19:
src/ncbuf.c: In function 'ncb_write_off':
include/haproxy/bug.h:144:10: error: unknown type name 'ssize_t'
144 | extern ssize_t write(int, const void *, size_t); \

(cherry picked from commit bc9821fd26b3a118415f579cdfa6e430b03f96da)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: debug: also declare strlen() in __ABORT_NOW()

Previous commit 8f204fa8ae ("MINOR: debug: print gdb hints when crashing")
broken on the CI where strlen() isn't known. Let's forward-declare it in
the __ABORT_NOW() functions, just like write(). No backport is needed.

(cherry picked from commit 2d27c80288c0acee85326c0574ed70d0b2e486ef)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info

Every time a warning is issued about traffic being blocked, let's
increment a global counter so that we can check for this situation
in "show info".

(cherry picked from commit 84dd05e7d83eeee4e7b8c64dc656cdd608c78806)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: wdt: make the blocked traffic warning delay configurable

The new global "warn-blocked-traffic-after" allows one to configure
after how much time a warning should be emitted when traffic is blocked.

(cherry picked from commit 6127e5a4e9722c1b47f5a9810fd41892b675557b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: cli: make it possible for "debug dev loop" to trigger warnings

A new argument "warn" allows to force the emission of a warning while
stuck in the loop by making the internal state inconsistent.

(cherry picked from commit 7337c422247b7af342048cfd48ac0aa2a4b7335e)
[wt: backported only to help testing the watchdog backports]
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: wdt: better detect apparently locked up threads and warn about them

In order to help users detect when threads are behaving abnormally, let's
try to emit a warning when one is no longer making any progress. This will
allow to catch faulty situations more accurately, instead of occasionally
triggering just after the long task. It will also let users know that there
is something wrong with their configuration, and inspect the call trace to
figure whether they're using excessively long rules or Lua for example (the
usual warnings about lua-load vs lua-load-per-thread are still reported).

The warning will only be emitted for threads not yet marked as stuck so
as not to interfere with panic dumps and avoid sending a warning just
before a panic. A tainted flag is set when this happens however (0x2000).

(cherry picked from commit 148eb5875fb7e6c46c0a9eac486dcb7b3bca931d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: add a function to dump a stuck thread

There's currently no way to just emit a warning informing that a thread
is stuck without crashing. This is a problem because sometimes users
would benefit from this info to clean up their configuration (e.g. abuse
of map_regm, lua-load etc).

This commit adds a new function ha_stuck_warning() that will emit a
warning indicating that the designated thread has been stuck for XX
milliseconds, with a number of streams blocked, and will make that
thread dump its own state. The warning will then be sent to stderr,
along with some reminders about the impacts of such situations to
encourage users to fix their configuration.

In order not to disrupt operations, a local 4kB buffer is allocated
in the stack. This should be quite sufficient.

For now the function is not used.

(cherry picked from commit 0950778b3a13fe31ff83223827d6692076cba5e5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: wdt: move the local timers to a struct

Better have a local struct for per-thread timers, as this will allow us
to store extra info that are useful to improve accurate reporting.

(cherry picked from commit 3f4d646849a253f3dc15972e40023495725efe98)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: remove the redundant process.thread_info array from post_mortem

That one is huge and unneeded since we now have the pointer to the
whole thread_info[] array, which does contain the freshest version
of these info and many more. Let's just get rid of it entirely.

(cherry picked from commit 52240680f1d98cc7eb1e762a04becaf54660e96b)
[wt: adjusted ctx in feed_post_mortem_late()]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: also add fdtab and acitvity to struct post_mortem

These ones are often used as well when trying to analyse sequences of
events, let's add them.

(cherry picked from commit da5cf52173853bcacb12c6ebb045fe395d4b3ba6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: also add a pointer to struct global to post_mortem

The pointer to struct global is also an important element to have in
post_mortem given that it's used a lot to take decisions in the code.
Let's just add it. It's worth noting that we could get rid of argc/argv
at this point since they're also present in the global struct, but they
don't cost much there anyway.

(cherry picked from commit 2f04ebe14aca91f4a0fafcd03a0f310d98d97aaf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: do not limit backtraces to stuck threads

Historically for size limitation reasons, we would only dump the
backtrace of stuck threads. The problem is that when triggering
a panic or other reasons, we have no backtrace, which effectively
limits it to the watchdog timer. It's also visible in "show threads"
which used to report backtraces for all threads in 2.4 and displays
none nowadays, making its use much more limited.

A first approach could be to just dump the thread that triggers the
panic (in addition to stuck threads). But that remains quite limited
since "show threads" would still display nothing. This patch takes a
better approach consisting in dumping all non-idle threads. This way
the output is less polluted that with the older approach (no need to
dump all those waiting in the poller), and all active threads are
visible, in panics as well as in "show threads". As such, the CLI
command "debug dev panic" now dmups backtraces again. This is already
a benefit which will ease testing of various locations against the
ability to resolve useful symbols.

(cherry picked from commit 4adb2d864d7e3ca9df1e39beabf7b2ffa5aee35c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: print gdb hints when crashing

To make bug reporting easier for users, when crashing, let's suggest
what to do. Typically when a BUG_ON() matches, only the current thread
is useful the vast majority of the time, while when the watchdog
triggers, all threads are interesting.

The messages are printed at the end after the dump. We may adjust these
with wiki links in the future is more detailed instructions are relevant.

(cherry picked from commit 8f204fa8aeadef3faea4471ba9cfd93d9d168960)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name

These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).

The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.

(cherry picked from commit 601b34fe7bd50c733a437f26817580bbd56c8d56)
Signed-off-by: Willy Tarreau <w@1wt.eu>