Repositories - haproxy-3.1.git/log

DOC: config: recommend disabling libc-based resolution with resolvers

Using both libc and haproxy resolvers can lead to hard to diagnose issues
when their bevahiour diverges; recommend using only one type of resolver.

Should be backported to stable versions.

Link: https://www.mail-archive.com/haproxy@formilux.org/msg45663.html
Co-authored-by: Lukas Tribus <lukas@ltri.eu>
(cherry picked from commit 4e20fab7ac85c0018e77a52657f773982f8c1875)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: config: restore default values for resolvers hold directive

Default values for hold directive (resolver context) used to be documented
but this was lost when the keyword description was reworked in 24b319b
("Default value is 10s for "valid", 0s for "obsolete" and 30s for
others.")

Restoring the part that describes the default value.

It may be backported to all stable versions with 24b319b

(cherry picked from commit 4bceca83fc14269ef8f2528294bc3cdb3fcc6c99)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: ring: refer to newer RFC5424

In the ring configuration example we refer to RFC3164 - the original BSD
syslog protocol without support for structured data (SDATA).

Let's refer to RFC5424 instead so SDATA is by default forwarded if
someone copy & pastes from the documentation:

https://discourse.haproxy.org/t/structured-data-lost-when-forwarding-logs-voa-syslog-forwarding-feature/11741/5

Should be backported to 2.6.

(cherry picked from commit 5f9ce99c799eaa8f58518201b58c32d970b10cdf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: peers: also limit the number of incoming updates

There's a configurable limit to the number of messages sent to a
peer (tune.peers.max-updates-at-once), but this one is not applied to
the receive side. While it can usually be OK with default settings,
setups involving a large tune.bufsize (1MB and above) regularly
experience high latencies and even watchdogs during reloads because
the full learning process sends a lot of data that manages to fill
the entire buffer, and due to the compactness of the protocol, 1MB
of buffer can contain more than 100k updates, meaning taking locks
etc during this time, which is not workable.

Let's make sure the receiving side also respects the max-updates-at-once
setting. For this it counts incoming updates, and refrains from
continuing once the limit is reached. It's a bit tricky to do because
after receiving updates we still have to send ours (and possibly some
ACKs) so we cannot just leave the loop.

This issue was reported on 3.1 but it should progressively be backported
to all versions having the max-updates-at-once option available.

(cherry picked from commit b26f3fe54e0ca5d54cc4fbcd771bfefbe728991c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: sink: detect and warn when using "send-proxy" options with ring servers

using "send-proxy" or "send-proxy-v2" option on a ring server is not
relevant nor supported. Worse, on 2.4 it causes haproxy process to
crash as reported in GH #2965.

Let's be more explicit about the fact that this keyword is not supported
under "ring" context by ignoring the option and emitting a warning message
to inform the user about that.

Ideally, we should do the same for peers and log servers. The proper way
would be to check servers options during postparsing but we currently lack
proper cross-type server postparsing hooks. This will come later and thus
will give us a chance to perform the compatibilty checks for server
options depending on proxy type. But for now let's simply fix the "ring"
case since it is the only one that's known to cause a crash.

It may be backported to all stable versions.

(cherry picked from commit 098a5e5c0b9e6821c03945295dd1350e5d834eb3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: mux-spop: Review some trace messages to adjust the message or the level

Some trace messages were not really accurrate, reporting a CLOSED connection
while only an error was reported on it. In addition, an TRACE_ERROR() was
used to report a short read on HELLO/DISCONNECT frames header. But it is not
an error. a TRACE_DEVEL() should be used instead.

This patch could be backported to 3.1 to ease future backports.

(cherry picked from commit e2ae8a74e87d32730e3edba26626518fd33e3a58)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-spop; Don't report a read error if there are pending data

When an read error is detected, no error must be reported on the SPOP
connection is there are still some data to parse. It is important to be sure
to process all data before reporting the error and be sure to not truncate
received frames. However, we must also take care to handle short read case
to not wait data that will never be received.

This patch must be backported to 3.1.

(cherry picked from commit 6e46f0bf934360186aa878d75f8bf81eaae0c141)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: spop-conn: Report short read for partial frames payload

When a frame was not fully received, a short read must be reported on the
SPOP connection to help the demux to handle truncated frames. This was
performed for frames truncated on the header part but not on the payload
part. It is now properly detected.

This patch must be backported to 3.1.

(cherry picked from commit 71feb49a9f0ef1142970fca98ca8eebd516928a3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-spop: Properly handle CLOSING state

The CLOSING state was not handled at all by the SPOP multiplexer while it is
mandatory when a DISCONNECT frame was sent and the mux should wait for the
DISCONNECT frame in reply from the agent. Thanks to this patch, it should be
fixed.

In addition, if an error occurres during the AGENT HELLO frame parsing, the
SPOP connection is no longer switched to CLOSED state and remains in ERROR
state instead. It is important to be able to send the DISCONNECT frame to
the agent instead of closing the TCP connection immediately.

This patch depends on following commits:

  * BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
  * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
  * BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error
  * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

All the series must be backported to 3.1.

(cherry picked from commit ddc5f8d92e3613ec7060b0ee8bdb220866182e86)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state

SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame
parsing, were removed. The demux process now relies on the demux stream ID
to know if it is waiting for the frame header or the frame
payload. Concretly, when the demux stream ID is not set (dsi == -1), the
demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is
waiting for the frame payload. It is especially important to be able to
properly handle DISCONNECT frames sent by the agents.

SPOP_CS_RUNNING state is introduced to know the hello handshake was finished
and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK
frames with the agents.

It depends on the following fixes:

* MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
* BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

This change will be mandatory for the next fix. It must be backported to 3.1
with the commits above.

(cherry picked from commit a3940614c24049d2c1ee9f686e0579e3a2e49b33)
[wt: adj ctx WRT tevt which is absent from 3.1
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing

After the ACK frame was parsed, it is useless to set the SPOP connection
state to SPOP_CS_FRAME_H state because this will be automatically handled by
the demux function. If it is not an issue, but this will simplify changes
for the next commit.

(cherry picked from commit 6b0f7de4e34c7da1ec7ae82e8e8a298823e50d72)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-spop: Don't open new streams for SPOP connection on error

Till now, only SPOP connections fully closed or those with a TCP connection on
error were concerned. But available streams could be reported for SPOP
connections in error or closing state. But in these states, no NOTIFY frames
will be sent and no ACK frames will be parsed. So, no new SPOP streams should be
opened.

This patch should be backported to 3.1.

(cherry picked from commit 197eaaadfd636441ca6cc5d3e35c52197537bbd6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

The demux stream ID of a SPOP connection, used when received frames are
parsed, must be a signed integer because it is set to -1 when the SPOP
connection is initialized. It will be important for the next fix.

This patch must be backported to 3.1.

(cherry picked from commit cbc10b896e647d5c62d9b021f3e2dd391d14dc4f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-spop: Don't report error for stream if ACK was already received

When a SPOP connection was closed or was in error, an error was
systematically reported on all its SPOP streams. However, SPOP streams that
already received their ACK frame must be excluded. Otherwise if an agent
sends a ACK and close immediately, the ACK will be ignored because the SPOP
stream will handle the error first.

This patch must be backported to 3.1.

(cherry picked from commit 6d68beace5435f24abc4292f22021e4465b4dcee)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-spop: Use the right bitwise operator in spop_ctl()

Becaues of a typo, '||' was used instead of '|' to test the SPOP conneciton
flags and decide if the mux is ready or not. The regression was introduced
in the commit fd7ebf117 ("BUG/MEDIUM: mux-spop: Wait end of handshake to
declare a spop connection ready").

This patch must be backported to 3.1 with the commit above.

(cherry picked from commit 2ccfebcebff254c95b6a70e987f3d960b38499c1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-spop: Wait end of handshake to declare a spop connection ready

A SPOP connection must not be considered as ready while the hello handshake
is not finished with success. In addition, no error or shutdown must have
been reported for the underlying connection. Otherwise a freshly openned
spop connexion may be reused while it is in fact dead, leading to a
connection retry.

This patch must be backported to 3.1.

(cherry picked from commit fd7ebf117bdefdcfbd1c30a079585867eb22a57d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-spop: Respect the negociated max-frame-size value to send frames

When a SPOP connection is opened, the maximum size for frames is negociated.
This negociated size is properly used when a frame is received and if a too
big frame is detected, an error is triggered. However, the same was not
performed on the sending path. No check was performed on frames sent to the
agent. So it was possible to send frames bigger than the maximum size
supported by the the SPOE agent.

Now, the size of NOTIFY and DISCONNECT frames is checked before sending them
to the agent.

Thanks to Miroslav to have reported the issue.

This patch must be backported to 3.1.

(cherry picked from commit ce8c2d359bc338a0ac88f117bca440c93b749f15)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: spoe: Don't report error on applet release if filter is in DONE state

When the SPOE applet was released, if a SPOE filter context was still
attached to it, an error was reported to the filter. However, there is no
reason to report an error if the ACK message was already received. Because
of this bug, if the ACK message is received and the SPOE connection is
immediately closed, this prevents the ACK message to be processed.

This patch should be backported to 3.1.

(cherry picked from commit 1cd30c998bce72166855d6b83a8e6865c33095a1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: hlua: Fix Channel:data() and Channel:line() to respect documentation

When the channel API was revisted, the both functions above was added. An
offset can be passed as argument. However, this parameter could be reported
to be out of range if there was not enough input data was received yet. It
is an issue, especially with a tcp rule, because more data could be
received. If an error is reported too early, this prevent the rule to be
reevaluated later. In fact, an error should only be reported if the offset
is part of the output data.

Another issue is about the conditions to report 'nil' instead of an empty
string. 'nil' was reported when no data was found. But it is not aligned
with the documentation. 'nil' must only be returned if no more data cannot
be received and there is no input data at all.

This patch should fix the issue #2716. It should be backported as far as 2.6.

(cherry picked from commit a5de0e15959a241afc9afb39f1b02a2517894f7b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: threads: fix soft-stop without multithreading support

When thread support is disabled ("USE_THREAD=" or "USE_THREAD=0" when
building), soft-stop doesn't work as haproxy never ends after stopping
the proxies.

This used to work fine in the past but suddenly stopped working with
ef422ced91 ("MEDIUM: thread: make stopping_threads per-group and add
stopping_tgroups") because the "break;" instruction under the stopping
condition is never executed when support for multithreading is disabled.

To fix the issue, let's add an "else" block to run the "break;"
instruction when USE_THREAD is not defined.

It should be backported up to 2.8

(cherry picked from commit 7d057e56af86cdc98c5388986931c247dbafa5b3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/ckch: always free() the previous entry during parsing

The ckch_conf_parse() function is the generic function which parses
crt-store keywords from the crt-store section, and also from a crt-list.

When having multiple time the same keyword, a leak of the previous value
happens. This patch ensure that the previous value is always freed
before overwriting it.

This patch should be backported as far as 3.0.

(cherry picked from commit 311e0aa5c712f38700b7b185c0d5f1aa33c48613)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: quic: free stream_desc on all data acked

The following patch simplifies qc_stream_desc_ack(). The qc_stream_desc
instance is not freed anymore, even if all data were acknowledged. As
implies by the commit message, the caller is responsible to perform this
cleaning operation.
f4a83fbb14bdd14ed94752a2280a2f40c1b690d2
MINOR: quic: do not remove qc_stream_desc automatically on ACK handling

However, despite the commit instruction, qc_stream_desc_free()
invokation was not moved in the caller. This commit fixes this by adding
it after stream ACK handling. This is performed only when a transfer is
completed : all data is acknowledged and qc_stream_desc has been
released by its MUX stream instance counterpart.

This bug may cause a significant increase in memory usage when dealing
with long running connection. However, there is no memory leak, as every
qc_stream_desc attached to a connection are finally freed when quic_conn
instance is released.

This must be backported up to 3.1.

(cherry picked from commit 3fdb039a99cdee543f4bbbfc983a51eb18f7f1f0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: fix too many args detection for commands

d3f928944 ("BUG/MINOR: cli: Issue an error when too many args are passed
for a command") added a new check to prevent the command to run when
too many arguments are provided. In this case an error is reported.

However it turns out this check (despite marked for backports) was
ineffective prior to 20ec1de21 ("MAJOR: cli: Refacor parsing and
execution of pipelined commands") as 'p' pointer was reset to the end of
the buffer before the check was executed.

Now since 20ec1de21, the check works, but we have another issue: we may
read past initialized bytes in the buffer because 'p' pointer is always
incremented in a while loop without checking if we increment it past 'end'
(This was detected using valgrind)

To fix the issue introduced by 20ec1de21, let's only increment 'p' pointer
if p < end.

For 3.2 this is it, now for older versions, since d3f928944 was marked for
backport, a sligthly different approach is needed:

- conditional p increment must be done in the loop (as in this patch)
- max arg check must moved above "fill unused slots" comment where p is
assigned to the end of the buffer

This patch should be backported with d3f928944.

(cherry picked from commit 976e0bd32f6d87fccf351edf1b5a18f3e0c5c6ab)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: peers: hold the refcnt until updating ts->seen

In peer_treat_updatemsg(), we call stktable_touch_remote() after
releasing the write lock on the TS, asking it to decrement the
refcnt, then we update ts->seen. Unfortunately this is racy and
causes the issue that Christian reported in issue #2959.

The sequence of events is very hard to trigger manually, but what happens
is the following:

T1.  stktable_touch_remote(table, ts, 1);
      -> at this point the entry is in the mt_list, and the refcnt is zero.

      T2.  stktable_trash_oldest() or process_table_expire()
           -> these can run, because the refcnt is now zero.
              The entry is cleanly deleted and freed.

T1.  HA_ATOMIC_STORE(&ts->seen, 1)
      -> we dereference freed memory.

A first attempt at a fix was made by keeping the refcnt held during
all the time the entry is in the mt_list, but this is expensive as
such entries cannot be purged, causing lots of skips during
trash_oldest_data(). This managed to trigger watchdogs, and was only
hiding the real cause of the problem.

The correct approach clearly is to maintain the ref_cnt until we
touch ->seen. That's what this patch does. It does not decrement
the refcnt, while calling stktable_touch_remote(), and does it
manually after touching ->seen. With this the problem is gone.

Note that a reproducer involves the following:
  - a config with 10 stick-ctr tracking the same table with a
    random key between 10M and 100M depending on the machine.
  - the expiration should be between 10 and 20s. http_req_cnt
    is stored and shared with the peers.
  - 4 total processes with such a config on the local machine,
    each corresponding to a different peer. 3 of the peers are
    bound to half of the cores (all threads) and share the same
    threads; the last process is bound to the other half with
    its own threads.
  - injecting at full load, ~256 conn, on the shared listening
    port. After ~2x expiration time to 1 minute the lone process
    should segfault in pools code due to a corrupted by_lru list.

This problem already exists in earlier versions but the race looks
narrower. Given how difficult it is to trigger on a given machine
in its current form, it's likely that it only happens once in a
while on stable branches. The fix must be backported wherever the
code is similar, and there's no hope to reproduce it to validate
the backport.

Thanks again to Christian for his amazing help!

(cherry picked from commit 006a3acbde309f11190a003c4ac1c026480444e4)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: reject invalid max_udp_payload size

Add a checks on received max_udp_payload transport parameters. As
defined per RFC 9000, values below 1200 are invalid, and thus the
connection must be closed with TRANSPORT_PARAMETER_ERROR code.

Prior to this patch, an invalid value was silently ignored.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit 4bc7aa548adcd9ee424c65cd346e94f8749dce64)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: fix TP reject on invalid max-ack-delay

Checks are implemented on some received transport parameter values,
to reject invalid ones defined per RFC 9000. This is the case for
max_ack_delay parameter.

The check was not properly implemented as it only reject values strictly
greater than the limit set to 2^14. Fix this by rejecting values of 2^14
and above. Also, the proper error code TRANSPORT_PARAMETER_ERROR is now
set.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit ffabfb0fc3ad8774024d152fc31a7711a8a9c382)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: use proper error code on invalid received TP value

As per RFC 9000, checks must be implemented to reject invalid values for
received transport parameters. Such values are dependent on the
parameter type.

Checks were already implemented for ack_delay_exponent and
active_connection_id_limit, accordingly with the QUIC specification.
However, the connection was closed with an incorrect error code. Fix
this to ensure that TRANSPORT_PARAMETER_ERROR code is used as expected.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit b60a17aad768369ab7e328949112b50cd78bc987)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: reject retry_source_cid TP on server side

Close the connection on error if retry_source_connection_id transport
parameter is received. This is specified by RFC 9000 as this parameter
must not be emitted by a client. Previously, it was silently ignored.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit 10f1f1adce032742d60fe14ee780871c4e6a1db1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: use proper error code on invalid server TP

This commit is similar to the previous one. It fixes the error code
reported when dealing with invalid received transport parameters. This
time, it handles reception of original_destination_connection_id,
preferred_address and stateless_reset_token which must not be emitted by
the client.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit a54fdd3d926fabfc438dbaedbd3d08814fb99862)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: use proper error code on missing CID in TPs

Handle missing received transport parameter value
initial_source_connection_id / original_destination_connection_id.
Previously, such case would result in an error reported via
quic_transport_params_store(), which triggers a TLS alert converted as
expected as a CONNECTION_CLOSE. The issue is that the error code
reported in the frame was incorrect.

Fix this by returning QUIC_TP_DEC_ERR_INVAL for such conditions. This is
directly handled via quic_transport_params_store() which set the proper
TRANSPORT_PARAMETER_ERROR code for the CONNECTION_CLOSE. However, no
error is reported so the SSL handshake is properly terminated without a
TLS alert. This is enough to ensure that the CONNECTION_CLOSE frame will
be emitted as expected.

This should be backported up to 2.6. Note that is relies on previous
patch "MINOR: quic: extend return value on TP parsing".

(cherry picked from commit df6bd4909e8dfa64337662a78adc39ca702c1dc7)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: quic: extend return value during TP parsing

Extend API used for QUIC transport parameter decoding. This is done via
the introduction of a dedicated enum to report the various error
condition detected. No functional change should occur with this patch,
as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the
connection closure via a TLS alert.

This patch will be necessary to properly reject transport parameters
with the proper CONNECTION_CLOSE error code. As such, it should be
backported up to 2.6 with the following series.

(cherry picked from commit 294bf26c06404e35edd4ad3381ccb26e835bd7a1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: quic: rename min/max fields for congestion window algo

There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.

Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.

This should be backported up to 3.1.

(cherry picked from commit 2eb1b0cd96f663b9260ee48921612566417b3b8d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: proxy: only use proxy_inc_fe_cum_sess_ver_ctr() with frontends

proxy_inc_fe_cum_sess_ver_ctr() was implemented in 9969adbc
("MINOR: stats: add by HTTP version cumulated number of sessions and
requests")

As its name suggests, it is meant to be called for frontends, not backends

Also, in 9969adbc, when used under h1_init(), a precaution is taken to
ensure that the function is only called with frontends.

However, this precaution was not applied in h2_init() and qc_init().

Due to this, it remains possible to have proxy_inc_fe_cum_sess_ver_ctr()
being called with a backend proxy as parameter. While it did not cause
known issues so far, it is not expected and could result in bugs in the
future. Better fix this by ensuring the function is only called with
frontends.

It may be backported up to 2.8

(cherry picked from commit b39825ee45150415d7ed64b7ce785bb946f727bd)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: connections: Report connection closing in conn_create_mux()

Add an extra parametre to conn_create_mux(), "closed_connection".
If a pointer is provided, then let it know if the connection was closed.
Callers have no way to determine that otherwise, and we need to know
that, at least in ssl_sock_io_cb(), as if the connection was closed we
need to return NULL, as the tasklet was free'd, otherwise that can lead
to memory corruption and crashes.
This should be backported if 9240cd4a2771245fae4d0d69ef025104b14bfc23
is backported too.

(cherry picked from commit b138eab302531bc40248337132ac01a541a111be)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-h1: Fix trace message in h1_detroy() to not relay on connection

h1_destroy() may be called to release a H1C after a multiplexer upgrade. In
that case, the connection is no longer attached to the H1C. It must not be
used in the h1 trace message because the connection context is no longer a H1C.

Because of this bug, when a H1>H2 upgrade is performed, a crash may be
experienced if the H1 traces are enabled.

This patch must be backport to all stable versions.

(cherry picked from commit 7dc4e94830ef8fe9b0ffc2901d63b9f3183ed12c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-h1: Don't pretend connection was released for TCP>H1>H2 upgrade

When an applicative upgrade of the H1 multiplexer is performed, we must not
pretend the connection was released. Indeed, in that case, a H1 stream is
still their with a stream connector attached on it. It must be detached
first before releasing the H1 connection and the underlying connection. So
it is important to not pretend the connection was already released.

Concretely, in that case h1_process() must return 0 instead of -1. It is
minor error because, AFAIK, it is harmless. But it is not correct. So let's
fix it to avoid futur bugs.

To be clear, this happens when a TCP connection is upgraded to H1 connection
and a H2 preface is detected, leading to a second upgrade from H1 to H2.

This patch may be backport to all stable versions.

(cherry picked from commit 2dc334be61a0a9feaa7b844e122c2c4ce37e1b1a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: dns: prevent ds accumulation within dss

when dns session callback (dns_session_release()) is called upon error
(ie: when some pending queries were not sent), we try our best to
re-create the applet in order to preserve the pending queries and give
them a chance to be retried. This is done at the end of
dns_session_release().

However, doing so exposes to an issue: if the error preventing queries
from being sent is still encountered over and over the dns session could
stay there indefinitely. Meanwhile, other dns sessions may be created on
the same dns_stream_server periodically. If previous failing dns sessions
don't terminate but we also keep creating new ones, we end up accumulating
failing sessions on a given dns_stream_server, which can eventually cause
ressource shortage.

This issue was found when trying to address ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers")

To fix it, we track the number of failed consecutive sessions for a given
dns server. When we reach the threshold (set to 100), we consider that the
link to the dns server is broken (at least temporarily) and we force
dns_session_new() to fail, so that we stop creating new sessions until one
of the existing one eventually succeeds.

A workaround for this fix consists in setting the "maxconn" parameter on
nameserver directive (under resolvers section) to a reasonnable value so
that no more than "maxconn" sessions may co-exist on the same server at
a given time.

This may be backported to all stable versions.
("CLEANUP: dns: remove unused dns_stream_server struct member") may be
backported to ease the backport.

(cherry picked from commit 5288b39011b2449bfa896f7932c7702b5a85ee77)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers

As reported by Lukas Tribus on the mailing list [1], trying to connect to
a nameserver with invalid network settings causes haproxy to retry a new
connection attempt immediately which eventually causes unexpected CPU usage
on the thread responsible for the applet (namely 100% on one CPU will be
observed).

This can be reproduced with the test config below:

resolvers default
  nameserver ns1 tcp4@8.8.8.8:53 source 192.168.99.99
listen listen
  mode http
  bind :8080
  server s1 www.google.com resolvers default init-addr none

To fix this the issue, we add a temporisation of one second between a new
connection attempt is retried. We do this in dns_session_create() when we
know that the applet was created in the release callback (when previous
query attempt was unsuccessful), which means initial connection is not
affected.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45665.html

This should fix GH #2909 and may be backported to all stable versions.
This patch depends on ("MINOR: applet: add appctx_schedule() macro")

(cherry picked from commit 27236f2218e57d839197e5b6479dc8e67e484e32)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CLEANUP: dns: remove unused dns_stream_server struct member

dns_stream_server "max_slots" is unused, let's get rid of it

(cherry picked from commit 14ebe95a10e7fdc003e369b58463a4744e88fa8e)
[wt: just to help with next backport]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: applet: add appctx_schedule() macro

Just like task_schedule() but for applets to wakeup an applet at a
specific time, leverages _task_schedule() internally

(cherry picked from commit 1ced5ef2fdf9a4b4ab0941849be0f4627066331f)
[wt: needed for next dns backport]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: listeners: transfer connection accounting when switching listeners

Since we made it possible for a bind_conf to listen to multiple thread
groups with shards in 2.8 with commit 9d360604bd ("MEDIUM: listener:
rework thread assignment to consider all groups"), the per-listener
connection count was not properly transferred to the target listener
with the connection when switching to another thread group. This results
in one listener possibly reaching high values and another one possibly
reaching negative values. Usually it's not visible, unless a maxconn is
set on the bind_conf, in which case comparisons will quickly put an end
to the willingness to accept new connections.

This problem only happens when thread groups are enabled, and it seems
very hard to trigger it normally, it only impacts sockets having a single
shard, hence currently the CLI (or any conf with "bind ... shards 1"),
where it can be reproduced with a config having a very low "maxconn" on
the stats socket directive (here, 4), and issuing a few tens of
socat <<< "show activity" in parallel, or sending HTTP connections to a
single-shared listener. Very quickly, haproxy stops accepting connections
and eats CPU in the poller which tries to get its connections accepted.

A BUG_ON(l->nbconn<0) after HA_ATOMIC_DEC() in listener_release() also
helps spotting them better.

Many thanks to Christian Ruppert who once again provided a very accurate
report in GH #2951 with the required data permitting this analysis.

This fix must be backported to 2.8.

(cherry picked from commit f1064c7382e170cd59dd7984808bc436e2b3de7f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: Issue an error when too many args are passed for a command

When a command is parsed to split it in an array of arguments, by default,
at most 64 arguments are supported. But no warning was emitted when there
were too many arguments. Instead, the arguments above the limit were
silently ignored. It could be an issue for some commands, like "add server",
because there was no way to know some arguments were ignored.

Now an error is issued when too many arguments are passed and the command is
not executed.

This patch should be backported to all stable versions.

(cherry picked from commit d3f92894471dd7306f857156a5820c53c10f8d88)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure

To handle out-of-order received CRYPTO frames, a ncbuf instance is
allocated. This is done via the helper quic_get_ncbuf().

Buffer allocation was improperly checked. In case b_alloc() fails, it
crashes due to a BUG_ON(). Fix this by removing it. The function now
returns NULL on allocation failure, which is already properly handled in
its caller qc_handle_crypto_frm().

This should fix the last reported crash from github issue #2935.

This must be backported up to 2.6.

(cherry picked from commit 4309a6fbf80240b0880c5adf091f0075c3bcd53f)
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>

[RELEASE] Released version 3.1.7

Released version 3.1.7 with the following main changes :
    - MINOR: log: support "raw" logformat node typecast
    - BUG/MINOR: peers: fix expire learned from a peer not converted from ms to ticks
    - BUG/MEDIUM: peers: prevent learning expiration too far in futur from unsync node
    - BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local
    - BUG/MINOR: mux-quic: remove extra BUG_ON() in _qcc_send_stream()
    - BUG/MINOR: log: fix gcc warn about truncating NUL terminator while init char arrays
    - BUG/MINOR: config: silence .notice/.warning/.alert in discovery mode
    - DOC: config: fix two missing "content" in "tcp-request" examples
    - BUILD: compiler: undefine the CONCAT() macro if already defined
    - BUG/MINOR: rhttp: fix incorrect dst/dst_port values
    - BUG/MINOR: backend: do not overwrite srv dst address on reuse
    - BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port
    - BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6
    - DOC: update INSTALL to reflect the minimum compiler version
    - BUG/MEDIUM: stream: Fix a possible freeze during a forced shut on a stream
    - TESTS: Fix build for filltab25.c
    - MINOR: task: add thread safe notification_new and notification_wake variants
    - BUG/MINOR: hlua_fcn: fix potential UAF with Queue:pop_wait()
    - CLEANUP: log: adjust _lf_cbor_encode_byte() comment
    - BUG/MINOR: log: fix CBOR encoding with LOG_VARTEXT_START() + lf_encode_chunk()
    - BUG/MEDIUM: sample: fix risk of overflow when replacing multiple regex back-refs
    - BUG/MINOR: backend: do not use the source port when hashing clientip
    - BUG/MINOR: hlua: fix invalid errmsg use in hlua_init()
    - BUG/MINOR: debug: remove the trailing \n from BUG_ON() statements
    - DOC: config: add the missing "profiling.memory" to the global kw index
    - DOC: config: add the missing "force-cfg-parser-pause" to the global kw index
    - BUG/MINOR: http-ana: Properly detect client abort when forwarding the response
    - BUG/MEDIUM: http-ana: Report 502 from req analyzer only during rsp forwarding
    - MINOR: quic: remove references to burst in quic-cc-algo parsing
    - MINOR: quic: allow BBR testing without pacing
    - MINOR: quic: transform pacing settings into a global option
    - MINOR: quic: define quic_tune
    - BUILD: quic: fix overflow in global tune
    - MINOR: tools: let dump_addr_and_bytes() support dumping before the offset
    - MINOR: debug: in call traces, dump the 8 bytes before the return address, not after
    - MINOR: debug: detect call instructions and show the branch target in backtraces
    - BUG/MEDIUM: h3: trim whitespaces when parsing headers value
    - BUG/MEDIUM: h3: trim whitespaces in header value prior to QPACK encoding
    - BUG/MINOR: h3: filter upgrade connection header
    - BUG/MINOR: h3: reject invalid :path in request
    - BUG/MINOR: h3: reject request URI with invalid characters
    - BUG/MEDIUM: hlua: fix hlua_applet_{http,tcp}_fct() yield regression (lost data)
    - MINOR: compiler: add a __has_builtin() macro to detect features more easily
    - MINOR: compiler: add a new "ASSUME" macro to help the compiler
    - MINOR: compiler: also enable __builtin_assume() for ASSUME()
    - MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid
    - BUILD: makefile: enable backtrace by default on musl
    - BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads
    - BUG/MINOR debug: fix !USE_THREAD_DUMP in ha_thread_dump_fill()
    - BUG/MINOR: wdt/debug: avoid signal re-entrance between debugger and watchdog
    - BUG/MINOR: debug: detect and prevent re-entrance in ha_thread_dump_fill()
    - MINOR: tools: also protect the library name resolution against concurrent accesses
    - MINOR: tools: protect dladdr() against reentrant calls from the debug handler
    - MINOR: debug: protect ha_dump_backtrace() against risks of re-entrance
    - MINOR: tinfo: keep a copy of the pointer to the thread dump buffer
    - MINOR: debug: always reset the dump pointer when done
    - MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one()
    - MINOR: pass a valid buffer pointer to ha_thread_dump_one()
    - MEDIUM: wdt: always make the faulty thread report its own warnings
    - MINOR: debug: make ha_stuck_warning() only work for the current thread
    - MINOR: debug: make ha_stuck_warning() print the whole message at once
    - MINOR: compiler: rely on builtin detection for __builtin_unreachable()
    - BUG/MINOR: mux-h2: prevent past scheduling with idle connections
    - BUG/MINOR: rhttp: fix reconnect if timeout connect unset
    - BUG/MINOR: rhttp: ensure GOAWAY can be emitted after reversal

BUG/MINOR: rhttp: ensure GOAWAY can be emitted after reversal

GOAWAY emission should not be emitted before preface. Thus, max_id field
from h2c acting as a server is initialized to -1, which prevents its
emission until preface is received from the peer. If acting as a client,
max_id is initialized to a valid value on the first h2s emission.

This causes an issue with reverse HTTP on the active side. First, it
starts as a client, so the peer does not emit a preface but instead a
simple SETTINGS frame. As role are switched, max_id is initialized much
later when the first h2s response is emitted. Thus, if the connection
must be terminated before any stream transfer, GOAWAY cannot be emitted.

To fix this, ensure max_id is initialized to 0 on h2_conn_reverse() for
active connect side. Thus, a GOAWAY indicating that no stream has been
handled can be generated.

Note that passive connect side is not impacted, as it max_id is
initialized thanks to preface reception.

This should be backported up to 2.9.

(cherry picked from commit 9e6f8ce3282fc5461e30adae3c76cbceb2bc327e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: rhttp: fix reconnect if timeout connect unset

Active connect on reverse http relies on connect timeout to detect
connection failure. Thus, if this timeout was unset, connection failure
may not be properly detected.

Fix this by fallback on hardcoded value of 1s for connect if timeout is
unset in the configuration. This is considered as a minor bug, as
haproxy advises against running with timeout unset.

This must be backported up to 2.9.

(cherry picked from commit 2b8da5f9ab8639f4213ee21fc5fd5ca12c53cddb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-h2: prevent past scheduling with idle connections

While reviewing HTTP/2 MUX timeout, it seems there is a possibility that
MUX task is requeued via h2c_update_timeout() with an already expired
date. This can happens with idle connections on two cases :
* first with shut timeout, as timer is not refreshed if already set
* second with http-request and keep-alive timers, which are based on
idle_start

Queuing an already expired task is an undefined behavior. Fix this by
using task_wakeup() instead of task_queue() at the end of
h2c_update_timeout() if such case occurs.

This should be backported up to 2.6.

(cherry picked from commit 3ebdd3ae509236af1f19d80cab98b385a751426b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: rely on builtin detection for __builtin_unreachable()

Due to __builtin_unreachable() only being associated to gcc 4.5 and
above, it turns out it was not enabled for clang. It's not used *that*
much but still a little bit, so let's enable it now. This reduces the
code size by 0.2% and makes it a bit more efficient.

(cherry picked from commit 41fc18b1d1a37e2a6645ea4a8fb4d2d5f009fc0c)
[wt: backported to shut a gcc warning in debug.c with USE_THREAD=0]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: make ha_stuck_warning() print the whole message at once

It has been noticed quite a few times during troubleshooting and even
testing that warnings can happen in avalanches from multiple threads
at the same time, and that their reporting it interleaved bacause the
output is produced in small chunks. Originally, this code inspired by
the panic code aimed at making sure to log whatever could be emitted
in case it would crash later. But this approach was wrong since writes
are atomic, and performing 5 writes in sequence in each dumping thread
also means that the outputs can be mixed up at 5 different locations
between multiple threads. The output of warnings is never very long,
and the stack-based buffer is 4kB so let's just concatenate everything
in the buffer and emit it at once using a single write(). Now there's
no longer this confusion on the output.

(cherry picked from commit 513397ac8215cdc1c0ab077a4a6809d7bd32f262)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: make ha_stuck_warning() only work for the current thread

Since we no longer call it with a foreign thread, let's simplify its code
and get rid of the special cases that were relying on ha_thread_dump_fill()
and synchronization with a remote thread. We're not only dumping the
current thread so ha_thread_dump_one() is sufficient.

(cherry picked from commit c16d5415a807f5a46d34ee6aac4c35fd42dac0c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: wdt: always make the faulty thread report its own warnings

Warnings remain tricky to deal with, especially for other threads as
they require some inter-thread synchronization that doesn't cope very
well with other parallel activities such as "show threads" for example.

However there is nothing that forces us to handle them this way. The
panic for example is already handled by bouncing the WDT signal to the
faulty thread.

This commit rearranges the WDT handler to make a better used of this
existing signal bouncing feature of the WDT handler so that it's no
longer limited to panics but can also deal with warnings. In order not
to bounce on all wakeups, we only bounce when there is a suspicion,
that is, when the warning timer has been crossed. We'll let the target
thread verify the stuck flag and context switch count by itself to
decide whether or not to panic, warn, or just do nothing and update
the counters.

As a bonus, now all warning traces look the same regardless of the
reporting thread:

   call trace(16):
   |       0x6bc733 <01 00 00 e8 6d e6 de ff]: ha_dump_backtrace+0x73/0x309 > main-0x2570
   |       0x6bd37a <00 00 00 e8 d6 fb ff ff]: ha_thread_dump_fill+0xda/0x104 > ha_thread_dump_one
   |       0x6bd625 <00 00 00 e8 7b fc ff ff]: ha_stuck_warning+0xc5/0x19e > ha_thread_dump_fill
   |       0x7b2b60 <64 8b 3b e8 00 aa f0 ff]: wdt_handler+0x1f0/0x212 > ha_stuck_warning
   | 0x7fd7e2cef3a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
   | 0x7ffc6af9e634 <85 a6 00 00 00 0f 01 f9]: linux-vdso:__vdso_gettimeofday+0x34/0x2b0
   |       0x6bad74 <7c 24 10 e8 9c 01 df ff]: sc_conn_io_cb+0x9fa4 > main-0x2400
   |       0x67c457 <89 f2 4c 89 e6 41 ff d0]: main+0x1cf147
   |       0x67d401 <48 89 df e8 8f ed ff ff]: cli_io_handler+0x191/0xb38 > main+0x1cee80
   |       0x6dd605 <40 48 8b 45 60 ff 50 18]: task_process_applet+0x275/0xce9

(cherry picked from commit a06c215f08703b2a2e303b2e23816d4ad09748f0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pass a valid buffer pointer to ha_thread_dump_one()

The goal is to let the caller deal with the pointer so that the function
only has to fill that buffer without worrying about locking. This way,
synchronous dumps from "show threads" are produced and emitted directly
without causing undesired locking of the buffer nor risking causing
confusion about thread_dump_buffer containing bits from an interrupted
dump in progress.

It's only the caller that's responsible for notifying the requester of
the end of the dump by setting bit 0 of the pointer if needed (i.e. it's
only done in the debug handler).

(cherry picked from commit b24d7f248e4aeaa154cd9527061c6d98af629c86)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one()

This function was initially designed to dump any threadd into the presented
buffer, but the way it currently works is that it's always called for the
current thread, and uses the distinction between coming from a sighandler
or being called directly to detect which thread is the caller.

Let's simplify all this by replacing thr with tid everywhere, and using
the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc).
The confusing "from_signal" argument is now replaced with "is_caller"
which clearly states whether or not the caller declares being the one
asking for the dump (the logic is inverted, but there are only two call
places with a constant).

(cherry picked from commit 5ac739cd0c9deb52774fdc07cb5330edf668668a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: always reset the dump pointer when done

We don't need to copy the old dump pointer to the thread_dump_pointer
area anymore to indicate a dump is collected. It used to be done as an
artificial way to keep the pointer for the post-mortem analysis but
since we now have this pointer stored separately, that's no longer
needed and it simplifies the mechanim to reset it.

(cherry picked from commit 5646ec4d40cd102733da12a6a2d191434b33060a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tinfo: keep a copy of the pointer to the thread dump buffer

Instead of using the thread dump buffer for post-mortem analysis, we'll
keep a copy of the assigned pointer whenever it's used, even for warnings
or "show threads". This will offer more opportunities to figure from a
core what happened, and will give us more freedom regarding the value of
the thread_dump_buffer itself. For example, even at the end of the dump
when the pointer is reset, the last used buffer is now preserved.

(cherry picked from commit 6d8a523d145a559d2ba4a14b46d27c73fe914c15)
[wt: dropped the doc which is not present in 3.1]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: protect ha_dump_backtrace() against risks of re-entrance

If a thread is dumping itself (warning, show thread etc) and another one
wants to dump the state of all threads (e.g. panic), it may interrupt the
first one during backtrace() and re-enter it from the signal handler,
possibly triggering a deadlock in the underlying libc. Let's postpone
the debug signal delivery at this point until the call ends in order to
avoid this.

(cherry picked from commit d20e9cad6733d9a525ccfad63ef482d6ea63a676)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: protect dladdr() against reentrant calls from the debug handler

If a thread is currently resolving a symbol while another thread triggers
a thread dump, the current thread may enter the debug handler and call
resolve_sym_addr() again, possibly deadlocking if the underlying libc
uses locking. Let's postpone the debug signal delivery in this area
during the call. This will slow the resolution a little bit but we don't
care, it's not supposed to happen often and it must remain rock-solid.

(cherry picked from commit 2dfb63313b0d2b9d4d636dbd23688d6a20cba2cf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: tools: also protect the library name resolution against concurrent accesses

This is an extension of eb41d768f ("MINOR: tools: use only opportunistic
symbols resolution"). It also makes sure we're not calling dladddr() in
parallel to dladdr_and_size(), as a preventive measure against some
potential deadlocks in the inner layers of the libc.

(cherry picked from commit 8d0c6336778fed316a174fd278c78a88f7dd6975)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: debug: detect and prevent re-entrance in ha_thread_dump_fill()

In the following trace trying to abuse the watchdog from the CLI's
"debug dev loop" command running in parallel to "show threads" loops,
it's clear that some re-entrance may happen in ha_thread_dump_fill().

A first minimal fix consists in using a test-and-set on the flag
indicating that the function is currently dumping threads, so that
the one from the signal just returns. However the caller should be
made more reliable to serialize all of this, that's for future
work.

Here's an example capture of 7 threads stuck waiting for each other:
  (gdb) bt
  #0  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #1  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #2  0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402
  #3  ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384
  #4  0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840
  #5  0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156
  #6  <signal handler called>
  #7  0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6
  #8  0x0000000000674a05 in ha_thread_relax () at src/thread.c:356
  #9  0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426
  #10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384
  #11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548
  #12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176
  #13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920
  #14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644
  #15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886
  #16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858
  #17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075
  #18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665

(cherry picked from commit 47f8397afb332882101935d6a9e3738c094bb03a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: wdt/debug: avoid signal re-entrance between debugger and watchdog

As seen in issue #2860, there are some situations where a watchdog could
trigger during the debug signal handler, and where similarly the debug
signal handler may trigger during the wdt handler. This is really bad
because it could trigger some deadlocks inside inner libc code such as
dladdr() or backtrace() since the code will not protect against re-
entrance but only against concurrent accesses.

A first attempt was made using ha_sigmask() but that's not always very
convenient because the second handler is called immediately after
unblocking the signal and before returning, leaving signal cascades in
backtrace. Instead, let's mark which signals to block at registration
time. Here we're blocking wdt/dbg for both signals, and optionally
SIGRTMAX if DEBUG_DEV is used as that one may also be used in this case.

This should be backported at least to 3.1.

(cherry picked from commit ebf1757dc26572578492d421e9905707b5e0ca7d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR debug: fix !USE_THREAD_DUMP in ha_thread_dump_fill()

The function must make sure to return NULL for foreign threads and
the local buffer for the current thread in this case, otherwise panics
(and sometimes even warnings) will segfault when USE_THREAD_DUMP is
disabled. Let's slightly re-arrange the function to reduce the #if/else
since we have to specifically handle the case of !USE_THREAD_DUMP anyway.

This needs to be backported wherever b8adef065d ("MEDIUM: debug: on
panic, make the target thread automatically allocate its buf") was
backported (at least 2.8).

(cherry picked from commit 0b56839455c7f45ae58954e1eb3873a3725899dc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads

Some signal handlers rely on these to decide about the level of detail to
provide in dumps, so let's properly fill the info about entering/leaving
idle. Note that for consistency with other tests we're using bitops with
t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes
the code more readable and the whole difference is only 88 bytes on a 3MB
executable.

This bug is not important, and while older versions are likely affected
as well, it's not worth taking the risk to backport this in case it would
wake up an obscure bug.

(cherry picked from commit 337017e2f96deea8e762d968820e575a1c2181b6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: makefile: enable backtrace by default on musl

The reason musl builds was not producing exploitable backtraces was
that the toolchain used appears to automatically omit the frame pointer
at -O2 but leaves it at -O0. This patch just makes sure to always append
-fno-omit-frame-pointer to the BACKTRACE cflags and enables the option
with musl where it now works. This will allow us to finally get
exploitable traces from docker images where core dumps are not always
available.

(cherry picked from commit f499fa3dcd24b5a17ed97842f5e867bd37739754)
[wt: this should be progressively backported to 3.0 or maybe even 2.8
since a few users have already reported hard-to-debug issues in Docker]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid

At plenty of places we have ALREADY_CHECKED() or DISGUISE() on a pointer
just to avoid "possibly null-deref" warnings. These ones have the side
effect of weakening optimizations by passing through an assembly step.
Using ASSUME_NONNULL() we can avoid that extra step. And when the
__builtin_unreachable() builtin is not present, we fall back to the old
method using assembly. The macro returns the input value so that it may
be used both as a declarative way to claim non-nullity or directly inside
an expression like DISGUISE().

(cherry picked from commit 63798088b300688af2fb480c5c7b91a1ca9970c5)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: also enable __builtin_assume() for ASSUME()

Clang apparently has __builtin_assume() which does exactly the same
as our macro, since at least v3.8. Let's enable it, in case it may
even better detect assumptions vs unreachable code.

(cherry picked from commit 2ce63b7b17c45e31ef8c3dec33b9b54aea656154)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: add a new "ASSUME" macro to help the compiler

This macro takes an expression, tests it and calls an unreachable
statement if false. This allows the compiler to know that such a
combination does not happen, and totally eliminate tests that would
be related to this condition. When the statement is not available
in the compiler, we just perform a break from a do {} while loop
so that the expression remains evaluated if needed (e.g. function
call).

(cherry picked from commit efc897484b8b875f908c144d82313923be8c3392)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compiler: add a __has_builtin() macro to detect features more easily

We already have a __has_attribute() macro to detect when the compiler
supports a specific attribute, but we didn't have the equivalent for
builtins. clang-3 and gcc-10 have __has_builtin() for this. Let's just
bring it using the same mechanism as __has_attribute(), which will allow
us to simply define the macro's value for older compilers. It will save
us from keeping that many compiler-specific tests that are incomplete
(e.g. the __builtin_unreachable() test currently doesn't cover clang).

(cherry picked from commit 96cfcb1df36fdf8fb963371ad5e167c733d842b1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: hlua: fix hlua_applet_{http,tcp}_fct() yield regression (lost data)

Jacques Heunis from bloomberg reported on the mailing list [1] that
with haproxy 2.8 up to master, yielding from a Lua tcp service while
data was still buffered inside haproxy would eat some data which was
definitely lost.

He provided the reproducer below which turned out to be really helpful:

  global
      log stdout format raw local0 info
      lua-load haproxy_yieldtest.lua

  defaults
      log global
      timeout connect         10s
      timeout client          1m
      timeout server          1m

  listen echo
      bind *:9090
      mode tcp
      tcp-request content use-service lua.print_input

haproxy_yieldtest.lua:

  core.register_service("print_input", "tcp", function(applet)
      core.Info("Start printing input...")
      while true do
          local inputs = applet:getline()
          if inputs == nil or string.len(inputs) == 0 then
              core.Info("closing input connection")
              return
          end
          core.Info("Received line: "..inputs)
          core.yield()
      end
  end)

And the script below:

  #!/usr/bin/bash
  for i in $(seq 1 9999); do
      for j in $(seq 1 50); do
          echo "${i}_foo_${j}"
      done
      sleep 2
  done

Using it like this:
  ./test_seq.sh | netcat localhost 9090

We can clearly see the missing data for every "foo" burst (every 2
seconds), as they are holes in the numbering.

Thanks to the reproducer, it was quickly found that only versions
>= 2.8 were affected, and that in fact this regression was introduced
by commit 31572229e ("MEDIUM: hlua/applet: Use the sedesc to report and
detect end of processing")

In fact in 31572229e 2 mistakes were made during the refaco.
Indeed, both in hlua_applet_tcp_fct() (which is involved in the reproducer
above) and hlua_applet_http_fct(), the request (buffer) is now
systematically consumed when returning from the function, which wasn't the
case prior to this commit: when HLUA_E_AGAIN is returned, it means a
yield was requested and that the processing is not done yet, thus we
should not consume any data, like we did prior to the refacto.

Big thanks to Jacques who did a great job reproducing and reporting this
issue on the mailing list.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg45778.html

It should be backported up to 2.8 with commit 31572229e

(cherry picked from commit b81ab159a613323a0e8cd625d138955acfcbb666)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: h3: reject request URI with invalid characters

Ensure that the HTX start-line generated after parsing an HTTP/3 request
does not contain any invalid character, i.e. control or whitespace
characters.

Note that for now path is used directly as URI. Thus, the check is
performed directly over it. A patch will change this to generate an
absolute-form URI in most cases, but it won't be backported to avoid
configuration breaking in stable versions.

This must be backported up to 2.6.

(cherry picked from commit 1faa1285aacaaacd2c2e062d87f04aa481843862)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: h3: reject invalid :path in request

RFC 9114 specifies some requirements for :path pseudo-header when using
http or https scheme. This commit enforces this by rejecting a request
if needed. Thus, path cannot be empty, and it must either start with a
'/' character or contains only '*'.

This must be backported up to 2.6.

(cherry picked from commit fc28fe7191701251115e817a7c4a673b88e49f65)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: h3: filter upgrade connection header

As specified in RFC 9114, connection headers required special care in
HTTP/3. When a request is received with connection headers, the stream
is immediately closed. Conversely, when translating the response from
HTX, such headers are not encoded but silently ignored.

However, "upgrade" was not listed in connection headers. This commit
fixes this by adding a check on it both on request parsing and response
encoding.

This must be backported up to 2.6.

(cherry picked from commit 6403bfbce8ea54ba83e23d34c5d52ff10fa7fe22)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: h3: trim whitespaces in header value prior to QPACK encoding

This commit does a similar job than the previous one, but it acts now on
the response path. Any leading or trailing whitespaces characters from a
HTX block header value are removed, prior to the header encoding via
QPACK.

This must be backported up to 2.6.

(cherry picked from commit bd3587574dae43cc019d8e9998e0996c75550de4)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: h3: trim whitespaces when parsing headers value

Remove any leading and trailing whitespace from header field values
prior to inserting a new HTX header block. This is done when parsing a
HEADERS frame, both as headers and trailers.

This must be backported up to 2.6.

(cherry picked from commit a17e5b27c03230cc65ebedc6dc28dbfce5181c79)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: debug: detect call instructions and show the branch target in backtraces

In backtraces, sometimes it's difficult to know what was called by a
given point, because some functions can be fairly long making one
doubt about the correct pointer of unresolved ones, others might
just use a tail branch instead of a call + return, etc. On common
architectures (x86 and aarch64), it's not difficult to detect and
decode a relative call, so let's do it on both of these platforms
and show the branch location after a '>'. Example:

x86_64:
   call trace(19):
   |       0x6bd644 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95 > ha_thread_dump_one
   | 0x7feb3e5383a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
   | 0x7feb3e53748b <c0 b8 03 00 00 00 0f 05]: libpthread:__close+0x3b/0x8b
   |       0x7619e4 <44 89 ff e8 fc 97 d4 ff]: _fd_delete_orphan+0x1d4/0x1d6 > main-0x2130
   |       0x743862 <8b 7f 68 e8 8e e1 01 00]: sock_conn_ctrl_close+0x12/0x54 > fd_delete
   |       0x5ac822 <c0 74 05 4c 89 e7 ff d0]: main+0xff512
   |       0x5bc85c <48 89 ef e8 04 fc fe ff]: main+0x10f54c > main+0xff150
   |       0x5be410 <4c 89 e7 e8 c0 e1 ff ff]: main+0x111100 > main+0x10f2c0
   |       0x6ae6a4 <28 00 00 00 00 ff 51 58]: cli_io_handler+0x31524
   |       0x6aeab4 <7c 24 08 e8 fc fa ff ff]: sc_destroy+0x14/0x2a4 > cli_io_handler+0x31430
   |       0x6c685d <48 89 ef e8 43 82 fe ff]: process_chk_conn+0x51d/0x1927 > sc_destroy

aarch64:
   call trace(15):
   | 0xaaaaad0c1540 <60 6a 60 b8 c3 fd ff 97]: debug_handler+0x9c/0xbc > ha_thread_dump_one
   | 0xffffa8c177ac <c2 e0 3b d5 1f 20 03 d5]: linux-vdso:__kernel_rt_sigreturn
   | 0xaaaaad0b0964 <c0 03 5f d6 d2 ff ff 97]: cli_io_handler+0x28e44 > sedesc_new
   | 0xaaaaad0b22a4 <00 00 80 d2 94 f9 ff 97]: sc_new_from_strm+0x1c/0x54 > cli_io_handler+0x28dd0
   | 0xaaaaad0167e8 <21 00 80 52 a9 6e 02 94]: stream_new+0x258/0x67c > sc_new_from_strm
   | 0xaaaaad0b21f8 <e1 03 13 aa e7 90 fd 97]: sc_new_from_endp+0x38/0xc8 > stream_new
   | 0xaaaaacfda628 <21 18 40 f9 e7 5e 03 94]: main+0xcaca8 > sc_new_from_endp
   | 0xaaaaacfdb95c <42 c0 00 d1 02 f3 ff 97]: main+0xcbfdc > main+0xc8be0
   | 0xaaaaacfdd3f0 <e0 03 13 aa f5 f7 ff 97]: h1_io_cb+0xd0/0xb90 > main+0xcba40
(cherry picked from commit 3cbbf41cd8772e7376d3f8473b171b9f8016079b)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: debug: in call traces, dump the 8 bytes before the return address, not after

In call traces, we're interested in seeing the code that was executed, not
the code that was not yet. The return address is where the CPU will return
to, so we want to see the bytes that precede this location. In the example
below on x86 we can clearly see a number of direct "call" instructions
(0xe8 + 4 bytes). There are also indirect calls (0xffd0) that cannot be
exploited but it gives insights about where the code branched, which will
not always be the function above it if that one used tail branching for
example. Here's an example dump output:

         call ------------,
                          v
       0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
       0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
       0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
       0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
       0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
       0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
       0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
       0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13

(cherry picked from commit 9740f152740b435ed604ecfe9b35aef3e2164ccf)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: tools: let dump_addr_and_bytes() support dumping before the offset

For code dumps, dumping from the return address is pointless, what is
interesting is to dump before the return address to read the machine
code that was executed before branching. Let's just make the function
support negative sizes to indicate that we're dumping this number of
bytes to the address instead of this number from the address. In this
case, in order to distinguish them, we're using a '<' instead of '[' to
start the series of bytes, indicating where the bytes expand and where
they stop. For example we can now see this:

       0x6bd634 <64 8b 38 e8 ac f7 ff ff]: debug_handler+0x84/0x95
0x7fa4ea2593a0 <00 00 00 00 0f 1f 40 00]: libpthread:+0x123a0
       0x752132 <00 00 00 00 00 90 41 55]: htx_remove_blk+0x2/0x354
       0x5b1a2c <4c 89 ef e8 04 07 1a 00]: main+0x10471c
       0x5b5f05 <48 89 df e8 8b b8 ff ff]: main+0x108bf5
       0x60b6f4 <89 ee 4c 89 e7 41 ff d0]: tcpcheck_eval_send+0x3b4/0x14b2
       0x610ded <00 00 00 e8 53 a5 ff ff]: tcpcheck_main+0x7dd/0xd36
       0x6c5ab4 <48 89 df e8 5c ab f4 ff]: wake_srv_chk+0xc4/0x3d7
       0x6c5ddc <48 89 f7 e8 14 fc ff ff]: srv_chk_io_cb+0xc/0x13

(cherry picked from commit 003f5168e47f6755990b8315757ecd71ef684c56)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUILD: quic: fix overflow in global tune

A new global option was recently introduced to disable pacing. However,
the value used (1<<31) caused issue with some compiler as options field
used for storage is declared as int. Move pacing deactivation flag
outside into the newly defined quic_tune to fix this.

This should be backported up to 3.1 after a period of observation. Note
that it relied on the previous patch which defined new quic_tune type.

(cherry picked from commit b849ee5fa35b7a909869db1dfd19f450f3172034)
[ada: patch was manually adapted since a19d9b0486 ("MAJOR: quic: mark
pacing as stable and enable it by default") was not backported]
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: quic: define quic_tune

Define a new structure quic_tune. It will be useful to regroup various
configuration settings and tunable related to QUIC, instead of defining
them into the global structure.

(cherry picked from commit 09e9c7d5b7a6efb4f7d0e491e2d6f46cf5facda7)
[ada: required by the following commit]
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: quic: transform pacing settings into a global option

Pacing support was previously activated on each bind line individually,
via an optional argument of quic-cc-algo keyword. Remove this optional
argument and introduce a global setting to enable/disable pacing. Pacing
activation is still flagged as experimental.

One important change is that previously BBR usage automatically
activated pacing support. This is not the case anymore, so users should
now always explicitely activate pacing if BBR is selected. A new warning
message will be displayed if this is not the case.

Another consequence of this change is that now pacing_inter callback is
always defined for every quic_cc_algo types. As such, QUIC MUX uses
global.tune.options to determine if pacing is required.

This should be backported up to 3.1, after a period of observation.

(cherry picked from commit 0c8b54b2d1cbf56ab696c376e145cdd2e944aadd)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: quic: allow BBR testing without pacing

Pacing is activated per bind line via an optional boolean argument of
quic-cc-algo keyword. Contrary to the default usage, pacing is
automatically activated when BBR is chosen. This is because this
algorithm is expected to run on top of pacing, else its behavior is
undefined.

Previously, pacing argument was thus ignored when BBR was selected.
Change this to support explicit deactivation of pacing with it. This
could be useful to test BBR without pacing when debugging some issues.

This should be backported up to 3.1, after a period of observation.

(cherry picked from commit d04e93bc2eb695f35b66233f0f5b80e2e5c2193a)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: quic: remove references to burst in quic-cc-algo parsing

Pacing activation configuration has been recently revamped. Previously,
pacing related quic-cc-algo argument was used to specify a burst size.
It evolved into a boolean value as burst size is dynamically calculated
now. As such, removes any references to the old burst value in config
parsing code for cleaner code.

This should be backported up to 3.1, after a period of observation.

(cherry picked from commit 6acf391e89853f2e74f67183f327ffa2e15112ac)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: http-ana: Report 502 from req analyzer only during rsp forwarding

A server abort must be handled by the request analyzers only when the
response forwarding was already started. Otherwise, it it the responsability
of the response analyzer to detect this event. L7-retires and conditions to
decide to silently close a client conneciotn are handled by this analyzer.

Because a reused server connections closed too early could be detected at
the wrong place, it was possible to get a 502/SH instead of a silent close,
preventing the client to safely retries its request.

Thanks to this patch, we are able to silently close the client connection in
this case and eventually to perform a L7 retry.

This patch must be backported as far as 2.8.

(cherry picked from commit d160046e2c9caae7deff5d59abc9694b6e664446)
[ada: ctx adjt since COUNT_IF counters are still there in 3.1]
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: http-ana: Properly detect client abort when forwarding the response

During the response payload forwarding, if the back SC is closed, we try to
figure out if it is because of a client abort or a server abort. However,
the condition was not accurrate, especially when abortonclose option is
set. Because of this issue, a server abort may be reported (SD-- in logs)
instead of a client abort (CD-- in logs).

The right way to detect a client abort when we try to forward the response
is to test if the back SC was shut down (SC_FL_SHUT_DOWN flag set) AND
aborted (SC_FL_ABRT_DONE flag set). When these both flags are set, it means
the back connection underwent the shutdown, which should be converted to a
client abort at this stage.

This patch should be backported as far as 2.8. It should fix last strange SD
report in the issue #2749.

(cherry picked from commit c672b2a297158bcd673feab2fd366709f9fc3d4f)
[ada: ctx adjt because COUNT_IF counters are still there in 3.1]
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

DOC: config: add the missing "force-cfg-parser-pause" to the global kw index

It was documented but missing from the index, let's add it. This can be
backported to 3.1.

(cherry picked from commit 9efc60c8872a44e3d456c9912b117c895b4a5b21)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

DOC: config: add the missing "profiling.memory" to the global kw index

It was in the description but not in the index. This can be backported to
all versions where it applies.

(cherry picked from commit 640a6998046c4a94d3fb59469b1cdcccedbeb01a)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: debug: remove the trailing \n from BUG_ON() statements

These ones were added by mistake during the change of the cfgparse
mechanism in 3.1, but they're corrupting the output of "debug counters"
by leaving stray ']' on their own lines. We could possibly check them
all once at boot but it doens't seem worth it.

This should be backported to 3.1.

(cherry picked from commit 23705564ae1bb95d7721bc921d58c508e76c7070)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: hlua: fix invalid errmsg use in hlua_init()

errmsg is used with memprintf and friends, thus it must be NULL
initialized before being passed to memprintf, else invalid read will
occur.

However in hlua_init() the errmsg value isn't initialized, let's fix that

This is really minor because it would only cause issue on error paths,
yet it may be backported to all stable versions, just in case.

(cherry picked from commit ea3c96369f4a5def90888c9207cd88010d473eb4)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: backend: do not use the source port when hashing clientip

The server's "usesrc" keyword supports among other options "client"
and "clientip". The former means we bind to the client's IP and port
to connect to the server, while the latter means we bind to its IP
only. It's done in two steps, first alloc_bind_address() retrieves
the IP address and port, and second, tcp_connect_server() decides
to either bind to the IP only or IP+port.

The problem comes with idle connection pools, which hash all the
parameters: the hash is calculated before (and ideally withouy) calling
tcp_connect_server(), and it considers the whole struct sockaddr_storage
for the hash, except that both client and clientip entirely fill it with
the client's address. This means that both client and clientip make use
of the source port in the hash calculation, making idle connections
almost not reusable when using "usesrc clientip" while they should for
clients coming from the same source. A work-around is to force the
source port to zero using "tcp-request session set-src-port int(0)" but
it's ugly.

Let's fix this by properly zeroing the port for AF_INET/AF_INET6 addresses.

This can be backported to 2.4. Thanks to Sebastien Gross for providing a
reproducer for this problem.

(cherry picked from commit 7b6df86a834883f27f6f7d18d0caa8a6ea128c14)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: sample: fix risk of overflow when replacing multiple regex back-refs

Aleandro Prudenzano of Doyensec and Edoardo Geraci of Codean Labs
reported a bug in sample_conv_regsub(), which can cause replacements
of multiple back-references to overflow the temporary trash buffer.

The problem happens when doing "regsub(match,replacement,g)": we're
replacing every occurrence of "match" with "replacement" in the input
sample, which requires a length check. For this, a max is applied, so
that a replacement may not use more than the remaining length in the
buffer. However, the length check is made on the replaced pattern and
not on the temporary buffer used to carry the new string. This results
in the remaining size to be usable for each input match, which can go
beyond the temporary buffer size if more than one occurrence has to be
replaced with something that's larger than the remaining room.

The fix proposed by Aleandro and Edoardo is the correct one (check on
"trash" not "output"), and is the one implemented in this patch.

While it is very unlikely that a config will replace multiple short
patterns each with a larger one in a request, this possibility cannot
be entirely ruled out (e.g. mask a known, short IP address using
"XXX.XXX.XXX.XXX"). However when this happens, the replacement pattern
will be static, and not be user-controlled, which is why this patch is
marked as medium.

The bug was introduced in 2.2 with commit 07e1e3c93e ("MINOR: sample:
regsub now supports backreferences"), so it must be backported to all
versions.

Special thanks go to Aleandro and Edoardo for reporting this bug with
a simple reproducer and a fix.

(cherry picked from commit 3e3b9eebf871510aee36c3a3336faac2f38c9559)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: log: fix CBOR encoding with LOG_VARTEXT_START() + lf_encode_chunk()

There have been some reports that using %HV logformat alias with CBOR
encoder would produce invalid CBOR payload according to some CBOR
implementations such as "cbor.me". Indeed, with the below log-format:

  log-format "%{+cbor}o %(protocol)HV"

And the resulting CBOR payload:

  BF6870726F746F636F6C7F48485454502F312E31FFFF

cbor.me would complain with: "bytes/text mismatch (ASCII-8BIT != UTF-8) in
streaming string") error message.

It is due to the version string being first announced as text, while CBOR
encoder actually encodes it as byte string later when lf_encode_chunk()
is used.

In fact it affects all patterns combining LOG_VARTEXT_START() with
lf_encode_chunk() which means  %HM, %HU, %HQ, %HPO and %HP are also
affected. To fix the issue, in _lf_encode_bytes() (which is
lf_encode_chunk() helper), we now check if we are inside a VARTEXT (we
can tell it if ctx->in_text is true), in which case we consider that we
already announced the current data as regular text so we keep the same
type to encode the bytes from the chunk to prevent inconsistencies.

It should be backported in 3.0

(cherry picked from commit 9e8444b730e9b24e73f1f78d269f5001a34bd98d)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

CLEANUP: log: adjust _lf_cbor_encode_byte() comment

_lf_cbor_encode_byte() comment was not updated in c33b857df ("MINOR: log:
support true cbor binary encoding") to reflect the new behavior.

Indeed, binary form is now supported. Updating the comment that says
otherwise.

(cherry picked from commit ce6951d6f9259f778c0271da54a615c892184691)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: hlua_fcn: fix potential UAF with Queue:pop_wait()

If Queue:pop_wait() excecuted from a stream context and pop_wait() is
aborted due to a Lua or ressource error, then the waiting object pointing
to the task will still be registered, so if the task eventually dissapears,
Queue:push() may try to wake invalid task pointer..

To prevent this bug from happening, we now rely on notification_* API to
deliver waiting signals. This way signals are properly garbage collected
when a lua context is destroyed.

It should be backported in 2.8 with 86fb22c55 ("MINOR: hlua_fcn: add Queue
class").
This patch depends on ("MINOR: task: add thread safe notification_new and
notification_wake variants")

(cherry picked from commit c6fa061f22e0409a9c1e0dbe9d4bd9a30eff6ba1)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

MINOR: task: add thread safe notification_new and notification_wake variants

notification_new and notification_wake were historically meant to be
called by a single thread doing both the init and the wakeup for other
tasks waiting on the signals.

In this patch, we extend the API so that notification_new and
notification_wake have thread-safe variants that can safely be used with
multiple threads registering on the same list of events and multiple
threads pushing updates on the list.

(cherry picked from commit b77b1a2c3ac70a719a2a06964e56a206ab9cc6ec)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

TESTS: Fix build for filltab25.c

Give a return type to main(), so that filltab25.c compiles with
modern compilers.

(cherry picked from commit 4715c557e98725e28faac7fce449f090e136ac0b)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: stream: Fix a possible freeze during a forced shut on a stream

When a forced shutdown is performed on a stream, it is possible to freeze it
infinitly because it is performed in an unexpected way from process_stream()
point of view, especially when the stream is waiting for a server
connection. The events sequence is a bit complex but at the end the stream
remains blocked in turn-around state and no event are trriggered to unblock
it.

By trying to fix the issue, we considered it was safer to rethink the
feature. The idea is to quickly shutdown a stream to release resources. For
instance to be able to delete a server. So, instead of scheduling a
shutdown, it is more efficient to trigger an error and detach the stream
from the server, if neecessary. The same code than the one used to deal with
connection errors in back_handle_st_cer() is used.

This patch must be slowly backported as far as 2.6.

(cherry picked from commit 51611a5b702e6dbf2e5ac56cbcef211326414282)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

DOC: update INSTALL to reflect the minimum compiler version

The mt_list update in 3.1 mandated the support for c11-like atomics that
arrived with gcc-4.7. As such, older versions are no longer supported.
For special cases in single-threaded environments, mt_lists could be
replaced with regular lists but it doesn't seem worth the hassle. It
was verified that gcc 4.7 to 14 and clang 3.0 and 19 do build fine.
That leaves us with 10 years of coverage of compiler versions, which
remains reasonable assuming that users of old ultra-stable systems are
unlikely to upgrade haproxy without touching the rest of the system.

This should be backported to 3.1.

(cherry picked from commit 1450b44bb9e5cdc1eafe894e1bcaf483edfc2748)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUILD: quic_sock: address a strict-aliasing build warning with gcc 5 and 6

The UDP GSO code emits a build warning with older toolchains (gcc 5 and 6):

  src/quic_sock.c: In function 'cmsg_set_gso':
  src/quic_sock.c:683:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    *((uint16_t *)CMSG_DATA(c)) = gso_size;
    ^

Let's just use the write_u16() function that's made for this purpose.
It was verified that for all versions from 5 to 13, gcc produces the
exact same code with the fix (and without the warning). It arrived in
3.1 with commit 448d3d388a ("MINOR: quic: add GSO parameter on quic_sock
send API") so this can be backported there.

(cherry picked from commit dd900aead8613af12ea23b18a3e77d28ad42729b)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MEDIUM: backend: fix reuse with set-dst/set-dst-port

On backend connection reuse, a hash is calculated from various
parameters, to ensure the selected connection match the requested
parameters. Notably, destination address is one of these parameters.
However, it is only taken into account if using a transparent server
(server address 0.0.0.0).

This may cause issue where an incorrect connection is reused, which is
not targetted to the correct destination address. This may be the case
if a set-dst/set-dst-port is used with a transparent proxy (proxy option
transparent).

The fix is simple enough. Destination address is now always used as
input to the connection reuse hash.

This must be backported up to 2.6. Note that for reverse HTTP to work,
it relies on the following patch, which ensures destination address
remains NULL in this case.

commit e94baf6ca71cb2319610baa74dbf17b9bc602b18
BUG/MINOR: rhttp: fix incorrect dst/dst_port values

(cherry picked from commit 5fda64e87e7963fa65812e0583338191e4cc7c8b)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: backend: do not overwrite srv dst address on reuse

Previously, destination address of backend connection was systematically
always reassigned. However, this step is unnecessary on connection
reuse. Indeed, reuse should only be conducted with connection using the
same destination address matching the stream requirements.

This patch removes this unnecessary assignment. It is now only performed
when reuse cannot be conducted and a new connection is instantiated.

Functionnally speaking, this patch should not change anything in theory,
as reuse is performed in conformance with the destination address.
However, it appears that it was not always properly enforced. The
systematic assignment of the destination address hides these issues, so
it is now remove. The identified bogus cases will then be fixed in the
following patches.would

This should be backported up to all stable versions.

(cherry picked from commit d7fa8e88c4cda6d115b05c65cded5acb4069ee97)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>

BUG/MINOR: rhttp: fix incorrect dst/dst_port values

With a @rhttp server, connect is not possible, transfer is only possible
via idle connection reuse. The server does not have any network address.

Thus, it is unnecessary to allocate the stream destination address prior
to connection reuse. This patch adjusts this by fixing
alloc_dst_address() to take this into account.

Prior to this patch, alloc_dst_address() would incorrectly assimilate a
@rhttp server with a transparent proxy mode. Thus stream destination
address would be copied from the destination address. Connection adress
would then be rewrote with this incorrect value. This did not impact
connect or reuse as destination addr is only used in idle conn hash
calculation for transparent servers. However, it causes incorrect values
for dst/dst_port samples.

This should be backported up to 2.9.

(cherry picked from commit c05bb8c967c395f88666ccc93c00726f59a690b6)
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>