Willy Tarreau [Thu, 26 Sep 2024 16:19:10 +0000 (18:19 +0200)]
MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
TASK_WOKEN_MSG only says "someone sent you a message" but doesn't convey
any info about the message. TASK_WOKEN_OTHER says "you're woken for another
reason" but doesn't tell which one. Most often they're used as-is by the
task handlers to report very specific situations.
For some important control notifications, having the ability to modulate
the message a little bit is useful, so let's define two user event types
UEVT1 and UEVT2 to be used in conjunction with TASK_WOKEN_MSG or _OTHER
so that the application can know that a specific condition was explicitly
requested. It will be used this way:
task_wakeup(s->task, TASK_WOKEN_MSG | TASK_F_UEVT1);
or:
task_wakeup(s->task, TASK_WOKEN_OTHER | TASK_F_UEVT2);
Since events are cumulative, keep in mind not to consider a 3rd value
as the combination of EVT1+EVT2; these really mean that the two events
appeared (though in unspecified order).
(cherry picked from commit
b5281283bb00248152ba361f8d7306b37febbee0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Sun, 29 Sep 2024 07:46:10 +0000 (09:46 +0200)]
MINOR: tools: do not attempt to use backtrace() on linux without glibc
The function is provided by glibc. Nothing prevents us from using our
own outside of glibc there (tested on aarch64 with musl). We still do
not enable it by default as we don't yet know if all archs work well,
but it's sufficient to pass USE_BACKTRACE=1 when building with musl to
verify it's OK.
(cherry picked from commit
7caf073faa6962de15a18dadcaf200df95ce7889)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Sun, 29 Sep 2024 07:37:16 +0000 (09:37 +0200)]
BUILD: tools: only include execinfo.h for the real backtrace() function
No need to include this possibly non-existing file when using our own
backtrace() implementation, it's only needed for the libc-provided one.
Because of this it's currently not possible to build musl with backtrace
enabled.
(cherry picked from commit
1c4776dbc3e7d0a73bd62bae78d509c3c54045d6)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Valentine Krasnobaeva [Mon, 30 Sep 2024 13:29:47 +0000 (15:29 +0200)]
BUG/MINOR: cfgparse-global: fix allowed args number for setenv
Keywords setenv and presetenv take 2 arguments: variable name and value.
So, the total number, that should be passed to alertif_too_many_args is 2
("setenv <name> <value>") instead of 3. For alertif_too_many_args the first
argument index is 0.
This should be backported in all stable versions.
(cherry picked from commit
df68f7ec96c0c9d0e8f50643b96ec9062b7aa658)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Fri, 27 Sep 2024 16:38:35 +0000 (18:38 +0200)]
BUG/MINOR: server: make sure the HMAINT state is part of MAINT
In 1.8 when adding "set server fqdn" with commit
b418c1228c ("MINOR:
server: cli: Add server FQDNs to server-state file and stats socket."),
the HMAINT flag was not made part of the MAINT ones, so technically
speaking when changing the FQDN, the server is not completely considered
as in maintenance mode.
In its defense, the code location around that was completely messy, with
the aggregator flag being hidden between other values and purposely but
discretely ignoring one of the flags, so the comments were updated to
make the intent clearer (particularly regarding CMAINT which looked like
it was also forgotten while it was on purpose).
This can be backported anywhere.
(cherry picked from commit
a4d04c649af448cf2a728ebb91837ed69eea6217)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Oliver Dala [Wed, 25 Sep 2024 09:37:25 +0000 (11:37 +0200)]
BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
The proxy lock state isn't passed down to relax_listener
through dequeue_proxy_listeners, which causes a deadlock
in relax_listener when it tries to get that lock.
Backporting: Older versions didn't have relax_listener and directly called
resume_listener in dequeue_proxy_listeners. lpx should just be passed directly
to resume_listener then.
The bug was introduced in commit
001328873c352e5e4b1df0dcc8facaf2fc1408aa
[cf: This patch should fix the issue #2726. It must be backported as far as
2.4]
(cherry picked from commit
a889413f5ec5e2c0c6ed30debfd16f666dbc0f99)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 24 Sep 2024 15:50:09 +0000 (17:50 +0200)]
BUG/MEDIUM: cli: Be sure to catch immediate client abort
A client abort while nothing was sent is properly handled except when this
immediately happens after the connection was accepted. The read0 event is
caught before the CLI applet is created. In that case, the shutdown is not
handled and the applet is no longer wakeup. In that case, the stream remains
blocked and no timeout are armed.
The bug was due to the fact that when the applet I/O handler was called for
the first time, the applet context was initialized and nothing more was
performed. A shutdown, if any, would be handled on the next call. In that
case, it was too late.
Now, afet the init step, we loop to eval the first command. There is no
command here but the shutdown will be tested.
This patch should fix the issue #2727. It must be backported to 3.0.
(cherry picked from commit
14a413033c15aa4622368348ac6b142e7ec2c989)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Wed, 18 Sep 2024 13:33:30 +0000 (15:33 +0200)]
BUG/MINOR: mux-quic: report glitches to session
Glitch counter was implemented for QUIC/HTTP3. The counter is stored in
the QCC MUX connection instance. However, this is never reported at the
session level which is necessary if glitch counter is tracked via a
stick-table.
To fix this, use session_add_glitch_ctr() in various QUIC MUX functions
which may increment glitch counter.
This should be backported up to 3.0.
(cherry picked from commit
fcd6d29acf108c55e1e1c17c04aeae621327adff)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 10 Sep 2024 08:36:59 +0000 (10:36 +0200)]
REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
Commit
d6c4ed9a96 ("REGTESTS: h1/h2: Update script testing H1/H2
protocol upgrades") introduced a 0.5 second delay which is higher
than those of most other tests (usually 0.05 or 0.2) and triggers
timeouts on my side. Let's just shorten it to 0.2 since its goal
is only to send data separately.
Note: maybe a barrier approach would be possible, though not
studied.
(cherry picked from commit
33deb4babe83b361f9f7423030cd8cc8318ca2bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 6 Sep 2024 12:18:01 +0000 (14:18 +0200)]
REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
"http-messaging/protocol_upgrade.vtc" script was updated to test upgrades
for requests with a payload. It should fail when the request is sent to a H2
server. When sent to a H1 server, it should succeed, except if the server
replies before the end of the request.
(cherry picked from commit
d6c4ed9a960d5125b3ca0b879eb127353a0b45c0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 6 Sep 2024 07:16:16 +0000 (09:16 +0200)]
BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
Since
1d2d77b27 ("MEDIUM: mux-h1: Return a 501-not-implemented for upgrade
requests with a body"), it is no longer possible to perform a protocol
upgrade for requests with a payload. The main reason was to be able to
support protocol upgrade for H1 client requesting a H2 server. In that case,
the upgrade request is converted to a CONNECT request. So, it is not
possible to convey a payload in that case.
But, it is a problem for anyone wanting to perform upgrades on H1 server
using requests with a payload. It is uncommon but valid. So, now, it is the
H2 multiplexer responsibility to reject upgrade requests, on server side, if
there is a payload. An INTERNAL_ERROR is returned for the H2S in that
case. On H1 side, the upgrade is now allowed, but only if the server waits
for the end of the request to return the 101-Switching-protocol
response. Indeed, it is quite hard to synchronise the frontend side and the
backend side in that case. Asking to servers to fully consume the request
payload before returned the response seems reasonable.
This patch should fix the issue #2684. It could be backported after a period
of observation, as far as 2.4 if possible. But only if it is not too
hard. It depends on "MINOR: mux-h1: Set EOI on SE during demux when both
side are in DONE state".
(cherry picked from commit
001fb1a5488b5059ca4c26a82dd4f00e9850f1b4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 6 Sep 2024 06:46:04 +0000 (08:46 +0200)]
MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
For now, this case is already handled for all requests except for those
waiting for a tunnel establishment (CONNECT and protocol upgrades). It is
not an issue because only bodyless requests are supported in these cases. So
the request is always finished at the end of headers and therefore before
the response.
However, to relax conditions for full H1 protocol upgrades (H1 client and
server), this case will be necessary. Indeed, the idea is to be able to
perform protocol upgrades for requests with a payload. Today, the "Upgrade:"
header is removed before sending the request to the server. But to support
this case, this patch is required to properly finish transaction when the
server does not perform the upgrade.
(cherry picked from commit
ad1ef94612fb6cba5b68f7d19c4646491b6da4af)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Thu, 1 Aug 2024 13:52:56 +0000 (15:52 +0200)]
BUG/MINOR: h2: reject extended connect for h2c protocol
This commit prevents forwarding of an HTTP/2 Extended CONNECT when "h2c"
or "h2" token is set as targetted protocol. Contrary to the previous
commit which deals with HTTP/1 mux, this time the request is rejected
and a RESET_STREAM is reported to the client.
This must be backported up to 2.4 after a period of observation.
(cherry picked from commit
7a5a30d28a3ee100c62603f7212cc1b313c53311)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Thu, 1 Aug 2024 13:35:06 +0000 (15:35 +0200)]
BUG/MINOR: h1: do not forward h2c upgrade header token
haproxy supports tunnel establishment through HTTP Upgrade mechanism.
Since the following commit, extended CONNECT is also supported for
HTTP/2 both on frontend and backend side.
commit
9bf957335e2c385b74901481f7a89c9565dfce53
MEDIUM: mux_h2: generate Extended CONNECT from htx upgrade
As specified by HTTP/2 rfc, "h2c" can be used by an HTTP/1.1 client to
request an upgrade to HTTP/2. In haproxy, this is not supported so it
silently ignores this. However, Connection and Upgrade headers are
forwarded as-is on the backend side.
If using HTTP/1 on the backend side and the server supports this upgrade
mechanism, haproxy won't be able to parse the HTTP response. If using
HTTP/2, mux backend tries to incorrectly convert the request to an
Extended CONNECT with h2c protocol, which may also prevent the response
to be transmitted.
To fix this, flag HTTP/1 request with "h2c" or "h2" token in an upgrade
header. On converting the header list to HTX, the upgrade header is
skipped if any of this token is present and the H1_MF_CONN_UPG flag is
removed.
This issue can easily be reproduced using curl --http2 argument to
connect to an HTTP/1 frontend.
This must be backported up to 2.4 after a period of observation.
(cherry picked from commit
7b89aa5b1943c1537a844cc375f7e8a86bac7942)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 4 Jul 2024 08:09:16 +0000 (10:09 +0200)]
MINOR: connection: No longer include stconn type header in connection-t.h
It is a small change, but it is cleaner to no include stconn-t.h header in
connection-t.h, mainly to avoid circular definitions.
The related issue is #2502.
(cherry picked from commit
4b8098bf4831c0dfca4a058bd3170a5ed7ae8bbf)
Signed-off-by: William Lallemand <wlallemand@haproxy.com>
Christopher Faulet [Thu, 19 Sep 2024 12:07:01 +0000 (14:07 +0200)]
[RELEASE] Released version 3.0.5
Released version 3.0.5 with the following main changes :
- BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
- BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
- BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
- BUILD: mux-pt: Use the right name for the sedesc variable
- BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
- BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
- BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
- BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
- BUG/MEDIUM: http-ana: Report error on write error waiting for the response
- BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
- BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
- BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
- BUG/MINOR: fcgi-app: handle a possible strdup() failure
- DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
- BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
- BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
- BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
- BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
- BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
- BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
- BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
- BUG/MINOR: proto_tcp: keep error msg if listen() fails
- MINOR: channel: implement ci_insert() function
- BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
- REGTESTS: mcli: test the pipelined commands on master CLI
- BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
- BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
- BUG/MINOR: h3: properly reject too long header responses
- BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
- BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
- DOC: config: correct the table for option tcplog
- BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
- BUILD: quic: 32bits build broken by wrong integer conversions for printf()
- BUG/MEDIUM: clock: also update the date offset on time jumps
- MINOR: tools: Implement ipaddrcpy().
- MINOR: quic: Implement quic_tls_derive_token_secret().
- MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
- MINOR: quic: Token for future connections implementation.
- BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
- MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
- MINOR: quic: Implement qc_ssl_eary_data_accepted().
- MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
- BUG/MEDIUM: quic: always validate sender address on 0-RTT
- BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
- BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
- DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
- REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
- BUG/MEDIUM: clock: detect and cover jumps during execution
- BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
- BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
- BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
- BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
- BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
- MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
- BUG/MINOR: polling: fix time reporting when using busy polling
- BUG/MINOR: clock: make time jump corrections a bit more accurate
- BUG/MINOR: clock: validate that now_offset still applies to the current date
- BUG/MEDIUM: queue: implement a flag to check for the dequeuing
- BUG/MINOR: peers: local entries updates may not be advertised after resync
- DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
- BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
- BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
- BUG/MEDIUM: promex: Wait to have the request before sending the response
- BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
- MINOR: quic: convert qc_stream_desc release field to flags
- MINOR: quic: implement function to check if STREAM is fully acked
- BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
- BUG/MINOR: quic: prevent freeze after early QCS closure
Amaury Denoyelle [Wed, 7 Aug 2024 16:01:51 +0000 (18:01 +0200)]
BUG/MINOR: quic: prevent freeze after early QCS closure
A connection freeze may occur if a QCS is released before transmitting
any data. This can happen when an error is detected early by the stream,
for example during HTTP response headers encoding, forcing the whole
connection closure.
In this case, a connection error is registered by the QUIC MUX to the
lower layer. MUX is then release and xprt layer is notified to prepare
CONNECTION_CLOSE emission. However, this is prevented because quic_conn
streams tree is not empty as it contains the qc_stream_desc previously
attached to the failed QCS instance. The connection will freeze until
QUIC idle timeout.
This situation is caused by an omission during qc_stream_desc release
operation. In the described situation, qc_stream_desc current buffer is
empty and can thus by removed, which is the purpose of this patch. This
unblocks this previously failed situation, with qc_stream_desc removal
from quic_conn tree.
This issue can be reproduced by modifying H3/QPACK code to return an
early error during HEADERS response processing.
This must be backported up to 2.6, after a period of observation.
(cherry picked from commit
3ef1ee477d3fef54633465b603206dd00bfefeba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 5 Aug 2024 16:58:49 +0000 (18:58 +0200)]
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
STREAM frames have dedicated handling on retransmission. A special check
is done to remove data already acked in case of duplicated frames, thus
only unacked data are retransmitted.
This handling is faulty in case of an empty STREAM frame with FIN set.
On retransmission, this frame does not cover any unacked range as it is
empty and is thus discarded. This may cause the transfer to freeze with
the client waiting indefinitely for the FIN notification.
To handle retransmission of empty FIN STREAM frame, qc_stream_desc layer
have been extended. A new flag QC_SD_FL_WAIT_FOR_FIN is set by MUX QUIC
when FIN has been transmitted. If set, it prevents qc_stream_desc to be
freed until FIN is acknowledged. On retransmission side,
qc_stream_frm_is_acked() has been updated. It now reports false if
FIN bit is set on the frame and qc_stream_desc has QC_SD_FL_WAIT_FOR_FIN
set.
This must be backported up to 2.6. However, this modifies heavily
critical section for ACK handling and retransmission. As such, it must
be backported only after a period of observation.
This issue can be reproduced by using the following socat command as
server to add delay between the response and connection closure :
$ socat TCP-LISTEN:<port>,fork,reuseaddr,crlf SYSTEM:'echo "HTTP/1.1 200 OK"; echo ""; sleep 1;'
On the client side, ngtcp2 can be used to simulate packet drop. Without
this patch, connection will be interrupted on QUIC idle timeout or
haproxy client timeout with ERR_DRAINING on ngtcp2 :
$ ngtcp2-client --exit-on-all-streams-close -r 0.3 <host> <port> "http://<host>:<port>/?s=32o"
Alternatively to ngtcp2 random loss, an extra haproxy patch can also be
used to force skipping the emission of the empty STREAM frame :
diff --git a/include/haproxy/quic_tx-t.h b/include/haproxy/quic_tx-t.h
index
efbdfe687..
1ff899acd 100644
--- a/include/haproxy/quic_tx-t.h
+++ b/include/haproxy/quic_tx-t.h
@@ -26,6 +26,8 @@ extern struct pool_head *pool_head_quic_cc_buf;
/* Flag a sent packet as being probing with old data */
#define QUIC_FL_TX_PACKET_PROBE_WITH_OLD_DATA (1UL << 5)
+#define QUIC_FL_TX_PACKET_SKIP_SENDTO (1UL << 6)
+
/* Structure to store enough information about TX QUIC packets. */
struct quic_tx_packet {
/* List entry point. */
diff --git a/src/quic_tx.c b/src/quic_tx.c
index
2f199ac3c..
2702fc9b9 100644
--- a/src/quic_tx.c
+++ b/src/quic_tx.c
@@ -318,7 +318,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
tmpbuf.size = tmpbuf.data = dglen;
TRACE_PROTO("TX dgram", QUIC_EV_CONN_SPPKTS, qc);
- if (!skip_sendto) {
+ if (!skip_sendto && !(first_pkt->flags & QUIC_FL_TX_PACKET_SKIP_SENDTO)) {
int ret = qc_snd_buf(qc, &tmpbuf, tmpbuf.data, 0, gso);
if (ret < 0) {
if (gso && ret == -EIO) {
@@ -354,6 +354,7 @@ static int qc_send_ppkts(struct buffer *buf, struct ssl_sock_ctx *ctx)
qc->cntrs.sent_bytes_gso += ret;
}
}
+ first_pkt->flags &= ~QUIC_FL_TX_PACKET_SKIP_SENDTO;
b_del(buf, dglen + QUIC_DGRAM_HEADLEN);
qc->bytes.tx += tmpbuf.data;
@@ -2066,6 +2067,17 @@ static int qc_do_build_pkt(unsigned char *pos, const unsigned char *end,
continue;
}
+ switch (cf->type) {
+ case QUIC_FT_STREAM_8 ... QUIC_FT_STREAM_F:
+ if (!cf->stream.len && (qc->flags & QUIC_FL_CONN_TX_MUX_CONTEXT)) {
+ TRACE_USER("artificially drop packet with empty STREAM frame", QUIC_EV_CONN_TXPKT, qc);
+ pkt->flags |= QUIC_FL_TX_PACKET_SKIP_SENDTO;
+ }
+ break;
+ default:
+ break;
+ }
+
quic_tx_packet_refinc(pkt);
cf->pkt = pkt;
}
(cherry picked from commit
e177cf341cf3665e340855312714fe002688b2db)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Tue, 6 Aug 2024 15:36:56 +0000 (17:36 +0200)]
MINOR: quic: implement function to check if STREAM is fully acked
When a STREAM frame is retransmitted, a check is performed to remove
range of data already acked from it. This is useful when STREAM frames
are duplicated and splitted to cover different data ranges. The newly
retransmitted frame contains only unacked data.
This process is performed similarly in qc_dup_pkt_frms() and
qc_build_frms(). Refactor the code into a new function named
qc_stream_frm_is_acked(). It returns true if frame data are already
fully acked and retransmission can be avoided. If only a partial range
of data is acknowledged, frame content is updated to only cover the
unacked data.
This patch does not have any functional change. However, it simplifies
retransmission for STREAM frames. Also, it will be reused to fix
retransmission for empty STREAM frames with FIN set from the following
patch :
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
As such, it must be backported prior to it.
(cherry picked from commit
714009b7bcf921836c2df7fc0d26b2ad257c8307)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 5 Aug 2024 16:52:27 +0000 (18:52 +0200)]
MINOR: quic: convert qc_stream_desc release field to flags
qc_stream_desc had a field <release> used as a boolean. Convert it with
a new <flags> field and QC_SD_FL_RELEASE value as equivalent.
The purpose of this patch is to be able to extend qc_stream_desc by
adding newer flags values. This patch is required for the following
patch
BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
As such, it must be backported prior to it.
(cherry picked from commit
bb9ac256a1e5468535b8242dc762e7bb0d9a8bf3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Aurelien DARRAGON [Tue, 17 Sep 2024 13:34:44 +0000 (15:34 +0200)]
BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
"option httpslog" override warning messaged used to be reported as
"option httplog", probably as a result of copy paste without adjusting
the context. Let's fix that to prevent emitting confusing warning messages
The issue exists since 98b930d ("MINOR: ssl: Define a default https log
format"), thus it should be backported up to 2.6
(cherry picked from commit
17e52c922b577e1b677098b34e47cd0a85f31e8b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 16 Sep 2024 20:29:24 +0000 (22:29 +0200)]
BUG/MEDIUM: promex: Wait to have the request before sending the response
It is similar to the previous fix about the stats applet ("BUG/MEDIUM:
cache/stats: Wait to have the request before sending the response").
However, for promex, there is no crash and no obvious issue. But it depends
on the filter. Indeed, the request is used by promex, independantly if it
was considered as forwarded or not. So if it is modified by the filter,
modification are just ignored.
Same bug, same fix. We now wait the request was forwarded before processing
it and produce the response.
(cherry picked from commit
bb2a2bc5f2f6d864cd8770cbd2533d3df1878ad1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 16 Sep 2024 17:17:33 +0000 (19:17 +0200)]
BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
It seems obvious. On a classical workflow, the request headers analysis is
finished when these applets are woken up for the first time. So they don't
take care to really have the request to start to process it and to send the
response. But with a filter, it is possible to stop the request analysis
after the applet creation.
If this happens for the stats applet, this leads to a crash because we
retrieve the request start-line without checking if it is available. For the
cache applet, the response is just immediatly sent. And here it is a problem
if the compression is enabled. In that case too, this may lead to a crash
because the compression may be enabled but not initialized.
For a true server, there is no issue because the connection cannot be
established. The server is chosen only after the request analysis. The issue
with applets is that once created, an applet is quickly switched to the
established state. So it is probably a point that must be carefully reviewed
and probably reworked.
In the mean time, as a fix, in the cache and the stats applet, we just take
care to have the request before sending the response. This will do the
trick.
The patch must be backported as far as 2.6. On 2.6, the patch must be adapted.
(cherry picked from commit
afc50f2445e02c778095cf4d7392c505f5af5278)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 16 Sep 2024 17:12:02 +0000 (19:12 +0200)]
BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
On a synchronous send from the stream to an applet, if some data were sent,
we must take care to wake the applet up. It is important because if
everything was sent at this stage, there is no other chance to wake the
applet up, mainly because SE_FL_WAIT_DATA flag is set on the applet's sedesc
in sc_update_tx() at the end of process_stream(). This flag prevent any
wakeup of the applet for a send event.
It is not necessary for a mux because the mux stream is called when a
syncrhonous send from the stream is performed. So it is reponsible to wake
the mux connection if necessary.
This patch must be backport to 3.0.
(cherry picked from commit
5fc12b0afd97ac68ef77335b9b5b9a0288f439f5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 12 Sep 2024 07:15:37 +0000 (09:15 +0200)]
DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.
(cherry picked from commit
0f4fad5291027a7dfc8109fbbe2acd0bac8affd0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Aurelien DARRAGON [Mon, 16 Sep 2024 09:13:04 +0000 (11:13 +0200)]
BUG/MINOR: peers: local entries updates may not be advertised after resync
Since commit
864ac3117 ("OPTIM: stick-tables: check the stksess without
taking the read lock"), when entries for a local table are learned from
another peer upon resynchro, and this is the only peer haproxy speaks to,
local updates on such entries are not advertised to the peer anymore,
until they eventually expire and can be recreated upon local updates.
This is due to the fact that ts->seen is always set to 0 when creating
new entry, and also when touch_remote is performed on the entry.
Indeed, while
864ac3117 attempts to avoid useless updates, it didn't
consider entries learned from a remote peer. Such entries are exclusively
learned in peer_treat_updatemsg(): once the entry is created (or updated)
with new data, touch_remote is used to commit the change. However, unlike
touch_local, entries committed using touch_remote will not be advertised
to the peer from which the entry was just learned (otherwise we would
enter a looping situation). Due to the above patch, once an entry is
learned from the (unique) remote peer, 'seen' will be stuck to 0 so it
will never be advertised for its whole lifetime.
Instead, when entries are learned from a peer, we should consider that
the peer that taught us the entry has seen it.
To do this, let's set seen=1 in peer_treat_updatemsg() after calling
touch_remote(). This way, if we happen to perform updates on this entry,
it will be properly advertized to relevant peers. This patch should not
affect the performance gain documented in
864ac3117 given that the test
scenario didn't involved entries learned by remote peers, but solely
locally created entries advertised to remote peers upon updates.
This should be backported in 3.0 with
864ac3117.
(cherry picked from commit
1e0920f85542d63755cb8824d1000c3a6f22bb9c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Wed, 11 Sep 2024 07:37:53 +0000 (09:37 +0200)]
BUG/MEDIUM: queue: implement a flag to check for the dequeuing
As unveiled in GH issue #2711, commit
5541d4995d ("BUG/MEDIUM: queue:
deal with a rare TOCTOU in assign_server_and_queue()") does have some
side effects in that it can occasionally cause an endless loop.
As Christopher analysed it, the problem is that process_srv_queue(),
which uses a trylock in order to leave only one thread in charge of
the dequeueing process, can lose the lock race against pendconn_add().
If this happens on the last served request, then there's no more thread
to deal with the dequeuing, and assign_server_and_queue() will loop
forever on a condition that was initially exepected to be extremely
rare (and still is, except that now it can become sticky). Previously
what was happening is that such queued requests would just time out
and since that was very rare, nobody would notice.
The root of the problem really is that trylock. It was added so that
only one thread dequeues at a time but it doesn't offer only that
guarantee since it also prevents a thread from dequeuing if another
one is in the process of queuing. We need a different criterion.
What we're doing now is to set a flag "dequeuing" in the server, which
indicates that one thread is currently in the process of dequeuing
requests. This one is atomically tested, and only if no thread is in
this process, then the thread grabs the queue's lock and dequeues.
This way it will be serialized with pendconn_add() and no request
addition will be missed.
It is not certain whether the original race covered by the fix above
can still happen with this change, so better keep that fix for now.
Thanks to @Yenya (Jan Kasprzak) for the precise and complete report
allowing to spot the problem.
This patch should be backported wherever the patch above was backported.
(cherry picked from commit
b11495652e724d71f1f4247332f060fe48577664)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Mon, 9 Sep 2024 11:56:18 +0000 (13:56 +0200)]
BUG/MINOR: clock: validate that now_offset still applies to the current date
We want to make sure that now_offset is still valid for the current
date: another thread could very well have updated it by detecting a
backwards jump, and at the very same moment the time got fixed again,
that we retrieve and add to the new offset, which results in a larger
jump. Normally, for this to happen, it would mean that before_poll
was also affected by the jump and was detected before and bounded
within 2 seconds, resulting in max 2 seconds perturbations.
Here we try to detect this situation and fall back to re-adjusting the
offset instead.
It's more of a strengthening of what's done by commit
e8b1ad4c2b
("BUG/MEDIUM: clock: also update the date offset on time jumps") than a
pure fix, in that the issue was not direclty observed but it's visibly
possible by reading the code, so this should be backported along with
the patch above. This is related to issue GH #2704.
Note that this could be simplified in terms of operations by migrating
the deadlines to nanoseconds, but this was the path to least intrusive
changes.
(cherry picked from commit
adaba6f904d11424327f90c3ab4df085c26fc8c4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Thu, 12 Sep 2024 15:58:57 +0000 (17:58 +0200)]
BUG/MINOR: clock: make time jump corrections a bit more accurate
Since commit
e8b1ad4c2b ("BUG/MEDIUM: clock: also update the date offset
on time jumps") we try to update the now_offet based on the last known
valid date. But if it's off compared to the global_now_ns date shared
by other threads, we'll get the time off a little bit. When this happens,
we should consider the most recent of these dates so that if the global
date was already known to be more recent, we should use it and stick to
it. This will avoid setting too large an offset that could in turn provoke
a larger jump on another thread.
This is related to issue GH #2704.
This can be backported to other branches having the patch above.
(cherry picked from commit
af48e4cc6b315cadb80d6980586af5c3c8368826)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Thu, 12 Sep 2024 15:47:13 +0000 (17:47 +0200)]
BUG/MINOR: polling: fix time reporting when using busy polling
Since commit
beb859abce ("MINOR: polling: add an option to support
busy polling") the time and status passed to clock_update_local_date()
were incorrect. Indeed, what is considered is the before_poll date
related to the configured timeout which does not correspond to what
is passed to the poller. That's not correct because before_poll+the
syscall's timeout will be crossed by the current date 100 ms after
the start of the poller. In practice it didn't happen when the poller
was limited to 1s timeout but at one minute it happens all the time.
That's particularly visible when running a multi-threaded setup with
busy polling and only half of the threads working (bind ... thread even).
In this case, the fixup code of clock_update_local_date() is executed
for each round of busy polling. The issue was made really visible
starting with recent commit
e8b1ad4c2b ("BUG/MEDIUM: clock: also
update the date offset on time jumps") because upon a jump, the
shared offset is reset, while it should not be in this specific
case.
What needs to be done instead is to pass the configured timeout of
the poller (and not of the syscall), and always pass "interrupted"
set so as to claim we got an event (which is sort of true as it just
means the poller returned instantly). In this case we can still
detect backwards/forward jumps and will use a correct boundary
for the maximum date that covers the whole loop.
This can be backported to all versions since the issue was introduced
with busy-polling in 1.9-dev8.
(cherry picked from commit
ad98edd00accdf5529da77eb79b72cf9fc025a7d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 11 Sep 2024 13:02:07 +0000 (15:02 +0200)]
MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
Since the 2.6, A parsing error is reported when the chunked encoding is
found twice. As stated in RFC9112, A sender must not apply the chunked
transfer coding more than once to a message body. It means only one chunked
coding must be found. In addition, empty values are also rejected becaues it
is forbidden by RFC9110.
However, in both cases, it may be useful to relax the rules for trusted
legacy servers when accept-invalid-http-response option is set. Especially
because it was accepted on 2.4 and older. In addition, T-E header is now
sanitized before sending it. It is not a problem Because it is a hop-by-hop
header
Note that it remains invalid on client side because there is no good reason
to relax the parsing on this side. We can argue a server is trusted so we
can decide to support some legacy behavior. It is not true on client side
and it is highly suspicious if a client is sending an invalid T-E header.
Note also we continue to reject unsupported T-E values (so all codings except
"chunked"). Because the "TE" header is sanitized and cannot contain other value
than "Trailers", there is absolutely no reason for a server to use something
else.
This patch should fix the issue #2677. It could probably be backported as
far as 2.6 if necessary.
(cherry picked from commit
1900ca475fa005b8000c4652457fedf282fa864d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 10 Sep 2024 06:55:29 +0000 (08:55 +0200)]
BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
Commit
4f2493f355 ("BUG/MINOR: pattern: pat_ref_set: fix UAF reported by
coverity") dropped the condition to concatenate error messages and as
such introduced a leading comma in front of all of them. Then commit
911f4d93d4 ("BUG/MINOR: pattern: pat_ref_set: return 0 if err was found")
changed the behavior to stop at the first error anyway, so all the
mechanics dedicated to the concatenation of error messages is no longer
needed and we can simply return the error as-is, without inserting any
comma.
This should be backported where the patches above are backported.
(cherry picked from commit
9f8d9c9e8bdef8e7b8f7d1d641987b4d0dcae988)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 9 Sep 2024 16:16:49 +0000 (18:16 +0200)]
BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
This reverts commit
225a4d02e1f6a12c0b4f3584949fad3339d71708.
When a 200-OK response is replied to a CONNECT request or a
101-Switching-protocol, a tunnel is considered as established between the
client and the server. However, we must not declare the reponse as
bodyless. Of course, there is no payload, but tunneled data are expected.
Because of this bug, the zero-copy forwarding is disabled on the server
side.
This patch must be backported as far as 2.9.
(cherry picked from commit
a99d58819f8184fb6f500807ddaf2371c5f1c110)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 9 Sep 2024 16:19:54 +0000 (18:19 +0200)]
BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
When the mux is woken up on I/O events, if the zero-copy forwarding is
enabled, receives are blocked. In this case, the SC is woken up to be able
to perform 0-copy forwarding to the other side. This works well, except for
the H1C in CLOSING state.
Indeed, in that case, in h1_process(), the SC is not woken up because only
RUNNING H1 connections are considered. As consequence, the mux will ignore
connection closure. The H1 connection remains blocked, waiting for the
shutdown timeout. If no timeout is configured, the H1 connection is never
closed leading to a leak.
This patch should fix leak reported by Damien Claisse in the issue #2697. It
should be backported as far as 2.8.
(cherry picked from commit
f6e193f1b05bc3a73090b1efeb3f0440a0c776b0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Aurelien DARRAGON [Mon, 9 Sep 2024 12:59:19 +0000 (14:59 +0200)]
BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
Since c5959fd ("MEDIUM: pattern: merge same pattern"), UAF (leading to
crash) can be experienced if the same pattern file (and match method) is
used in two default sections and the first one is not referenced later in
the config. In this case, the first default section will be cleaned up.
However, due to an unhandled case in the above optimization, the original
expr which the second default section relies on is mistakenly freed.
This issue was discovered while trying to reproduce GH #2708. The issue
was particularly tricky to reproduce given the config and sequence
required to make the UAF happen. Hopefully, Github user @asmnek not only
provided useful informations, but since he was able to consistently
trigger the crash in his environment he was able to nail down the crash to
the use of pattern file involved with 2 named default sections. Big thanks
to him.
To fix the issue, let's push the logic from c5959fd a bit further. Instead
of relying on "do_free" variable to know if the expression should be freed
or not (which proved to be insufficient in our case), let's switch to a
simple refcounting logic. This way, no matter who owns the expression, the
last one attempting to free it will be responsible for freeing it.
Refcount is implemented using a 32bit value which fills a previous 4 bytes
structure gap:
int mflags; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int lock; /* 88 8 */
(output from pahole)
Even though it was not reproduced in 2.6 or below by @asmnek (the bug was
revealed thanks to another bugfix), this issue theorically affects all
stable versions (up to c5959fd), thus it should be backported to all
stable versions.
(cherry picked from commit
68cfb222b559b909415c9aa92114ca42c9a93459)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Aurelien DARRAGON [Fri, 6 Sep 2024 14:21:02 +0000 (16:21 +0200)]
BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
This is a complementary patch to
a68affeaa ("BUG/MINOR: pattern: a sample
marked as const could be written"). Indeed the same logic from
pat_match_str() is used there, but we lack the check to ensure that the
sample is not const before writing data to it.
It could be backported to all stable versions.
(cherry picked from commit
3449525a0204841a62a9fa41119ec8c47f21fde8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Sun, 8 Sep 2024 17:15:38 +0000 (19:15 +0200)]
BUG/MEDIUM: clock: detect and cover jumps during execution
After commit
e8b1ad4c2 ("BUG/MEDIUM: clock: also update the date offset
on time jumps"), @firexinghe mentioned that the issue was still present
in their case. In fact it depends on the load, which affects the
probability that the time changes between two poll() calls vs that it
changes during poll(). The time correction code used to only deal with
the latter. But under load if it changes between two poll() calls, what
happens then is that before_poll is off, and after returning from poll(),
the date is within bounds defined by before_poll, so no correction is
applied.
After many tests, it turns out that the most reliable solution without
using CLOCK_MONOTONIC is to prevent before_poll from being earlier than
the previous after_poll (trivial), and to cover forward jumps, we need
to enforce a margin. Given that the watchdog kills a looping task within
2 seconds and that no sane setup triggers it, it seems that 2 seconds
remains a safe enough margin. This means that in the worst case, some
forward jumps of up to 2 seconds will not be corrected, leading to an
apparent fast time and low rates. But this is supposed to be an exceptional
event anyway (typically an admin or crontab running ntpdate).
For future versions, given that we now opportunistically call
now_mono_time() before and after poll(), that returns zero if not
supported, we could imagine relying on this one for the thread's local
time when it's non-null.
(cherry picked from commit
ef8d8215de2ed03a08172bbccb1146e5b867ce74)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Mon, 9 Sep 2024 17:31:14 +0000 (19:31 +0200)]
REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
This test has an expect rule for syslog that looks for [cC]D, to
indicate a client abort or timeout during the data phase. The purpose
was to say that when it fails it must be this, but the very low timeout
(1ms) still makes it prone to succeeding if the machine is highly loaded.
This has become more visible since commit
e8b1ad4c2b ("BUG/MEDIUM: clock:
also update the date offset on time jumps") because the clock drift
adjustments are more systematic. Since this commit, running 50 such tests
at twice more than the number of CPUs in parallel is sufficient to yield
errors due to some lines appearing as succeeding:
make reg-tests -- --j $((($(nproc)+1)*2)) --vtestparams -n50 reg-tests/log/wrong_ip_port_logging.vtc
It was observed that pauses up to 300ms were observed in epoll_wait() in
such circumstances, which were properly fixed by the time drift detection..
Another approach would consist in increasing the permitted margin during
which we don't fix the clock drift but that would not be logical since the
base time had really been awaited for.
This should be backported to all stable releases since the commit above
will trigger the issue more often.
(cherry picked from commit
036ab62231003e7e7072986626597682725edff8)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Fri, 6 Sep 2024 05:39:59 +0000 (07:39 +0200)]
DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
When HAPROXY_HTTP_LOG_FMT was added by commit
537b9e7f36 ("MINOR: config:
add environment variables for default log format"), the example was placed
by accident after the clf log format instead of the HTTP log format,
causing a bit of confusion.
This can be backported to 2.8.
(cherry picked from commit
c22fc591d428c55d76b2e18799fca804fafaf558)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Frederic Lecaille [Mon, 5 Aug 2024 11:40:51 +0000 (13:40 +0200)]
BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
This issue was reported by Ilya (@Chipitsine) when building haproxy against
aws-lc in GH #2663 where handshakeloss and handshakecorruption interop tests could
lead haproxy to crash after having built too short datagrams:
FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163
call trace(13):
| 0x55f4ee4dcc02 [ba d9 00 00 00 48 8d 35]: main-0x195bf2
| 0x55f4ee4e3112 [83 3d 2f 16 35 00 00 0f]: qc_send+0x11f3/0x1b5d
| 0x55f4ee4e9ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c
| 0x55f4ee6efa82 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2
| 0x55f4ee6f05d3 [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6
| 0x55f4ee671bb7 [83 3d 86 72 44 00 01 0f]: run_poll_loop+0x6e/0x57b
| 0x55f4ee672367 [48 8b 1d 22 d4 1d 00 48]: main-0x48d
| 0x55f4ee6755e0 [b8 00 00 00 00 e8 08 61]: main+0x2dec/0x335d
This could happen after Handshake packet building failures which follow a successful
Initial packet into the same datagram. In this case, the datagram could be emitted
with a too short length (<1200 bytes).
To fix this, store the datagram only if the first packet is not an Initial packet
or if its length is big enough (>=1200 bytes).
Must be backported as far as 2.6.
(cherry picked from commit
eb1a097a6687cd8b93d981eff3d7e257c448ff3e)
[fl: same patch wich <wrlen> replaced by <dlen> (datagram length)]
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Mon, 2 Sep 2024 07:36:19 +0000 (09:36 +0200)]
BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
This bug follows this patch:
MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
where a new third variable was added to be dumped from QUIC_EV_CONN_IO_CB trace
event. The quic_trace() code did not reveal there was already another variable
passed as third argument but not dumped. This leaded to crash when dereferencing
a point to an int in place of a point to an SSL object.
This issue was reproduced only by handshakecorruption aws-lc interop test with
s2n-quic as client.
Note that this patch must be backported with this one:
BUG/MEDIUM: quic: always validate sender address on 0-RTT
which depends on the commit mentionned above.
(cherry picked from commit
db13df3d6e64e9993b90f048734538491082cbed)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 13:38:54 +0000 (15:38 +0200)]
BUG/MEDIUM: quic: always validate sender address on 0-RTT
It has been reported by Wedl Michael, a student at the University of Applied
Sciences St. Poelten, a potential vulnerability into haproxy as described below.
An attacker could have obtained a TLS session ticket after having established
a connection to an haproxy QUIC listener, using its real IP address. The
attacker has not even to send a application level request (HTTP3). Then
the attacker could open a 0-RTT session with a spoofed IP address
trusted by the QUIC listen to bypass IP allow/block list and send HTTP3 requests.
To mitigate this vulnerability, one decided to use a token which can be provided
to the client each time it successfully managed to connect to haproxy. These
tokens may be reused for future connections to validate the address/path of the
remote peer as this is done with the Retry token which is used for the current
connection, not the next one. Such tokens are transported by NEW_TOKEN frames
which was not used at this time by haproxy.
So, each time a client connect to an haproxy QUIC listener with 0-RTT
enabled, it is provided with such a token which can be reused for the
next 0-RTT session. If no such a token is presented by the client,
haproxy checks if the session is a 0-RTT one, so with early-data presented
by the client. Contrary to the Retry token, the decision to refuse the
connection is made only when the TLS stack has been provided with
enough early-data from the Initial ClientHello TLS message and when
these data have been accepted. Hopefully, this event arrives fast enough
to allow haproxy to kill the connection if some early-data have been accepted
without token presented by the client.
quic_build_post_handshake_frames() has been modified to build a NEW_TOKEN
frame with this newly implemented token to be transported inside.
quic_tls_derive_retry_token_secret() was renamed to quic_do_tls_derive_token_secre()
and modified to be reused and derive the secret for the new token implementation.
quic_token_validate() has been implemented to validate both the Retry and
the new token implemented by this patch. When this is a non-retry token
which could not be validated, the datagram received is marked as requiring
a Retry packet to be sent, and no connection is created.
When the Initial packet does not embed any non-retry token and if 0-RTT is enabled
the connection is marked with this new flag: QUIC_FL_CONN_NO_TOKEN_RCVD. As soon
as the TLS stack detects that some early-data have been provided and accepted by
the client, the connection is marked to be killed (QUIC_FL_CONN_TO_KILL) from
ha_quic_add_handshake_data(). This is done calling qc_ssl_eary_data_accepted()
new function. The secret TLS handshake is interrupted as soon as possible returnin
0 from ha_quic_add_handshake_data(). The connection is also marked as
requiring a Retry packet to be sent (QUIC_FL_CONN_SEND_RETRY) from
ha_quic_add_handshake_data(). The the handshake I/O handler (quic_conn_io_cb())
knows how to behave: kill the connection after having sent a Retry packet.
About TLS stack compatibility, this patch is supported by aws-lc. It is
disabled for wolfssl which does not support 0-RTT at this time thanks
to HAVE_SSL_0RTT_QUIC.
This patch depends on these commits:
MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
MINOR: quic: Implement qc_ssl_eary_data_accepted().
MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
MINOR: quic: Token for future connections implementation.
MINOR: quic: Implement quic_tls_derive_token_secret().
MINOR: tools: Implement ipaddrcpy().
Must be backported as far as 2.6.
(cherry picked from commit
f627b9272bd8ffca6f2f898bfafc6bf0b84b7d46)
[fl: Add ->flags to quic_dgram struct (would arrive with quic_initial feature).
Add QUIC_DGRAM_FL_ quic_dgram flags (would arrive with quic_initial feature).
Modify quic_rx_pkt_retrieve_conn() to fix a compilation issue and correctly
handle the "if (pkt->token_len) {}" else block to do so with quic_initial
feature]
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 13:25:16 +0000 (15:25 +0200)]
MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
Dump the early data status from QUIC_EV_CONN_IO_CB trace event.
This is very helpful to know if the QUIC server has accepted the
early data received from clients.
(cherry picked from commit
8854cef03672235addec6f3baafcc44f0e0441f4)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 13:16:01 +0000 (15:16 +0200)]
MINOR: quic: Implement qc_ssl_eary_data_accepted().
This function is a wrapper around SSL_get_early_data_status() for
OpenSSL derived stack and SSL_early_data_accepted() boringSSL derived
stacks like AWS-LC. It returns true for a TLS server if it has
accepted the early data received from a client.
Also implement quic_ssl_early_data_status_str() which is dedicated to be used
for debugging purposes (traces). This function converts the enum returned
by the two function mentionned above to a human readable string.
(cherry picked from commit
609b1245610c8f3e219a1eff12d13559cae109cd)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 12:47:08 +0000 (14:47 +0200)]
MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
Modify qf_new_token structure to use a static buffer with QUIC_TOKEN_LEN
as size as defined by the token for future connections (quic_token.c).
Modify consequently the NEW_TOKEN frame parser (see quic_parse_new_token_frame()).
Also add comments to denote that the NEW_TOKEN parser function is used only by
clients and that its builder is used only by servers.
(cherry picked from commit
e926378375bcf579daadea071c600651eb7dce0d)
[fl: remove useless openssl/chacha.h header inclusion when moving
openssl-compat.h at the start of the header inclusions as expected by this
patch]
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 12:25:26 +0000 (14:25 +0200)]
BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
quic_build_new_token_frame() is the function which is called to build
a NEW_TOKEN frame into a buffer. The position pointer for this buffer
was not updated, leading the NEW_TOKEN frame to be malformed.
Must be backported as far as 2.6.
(cherry picked from commit
76c80605a6cc7880e4184c52fb1a314367410271)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 07:13:30 +0000 (09:13 +0200)]
MINOR: quic: Token for future connections implementation.
There exist two sorts of token used by QUIC. They are both used to validate
the peer address (path validation). Retry are used for the current
connection the client want to open. This patch implement the other
sort of tokens which after having been received from a connection, may
be provided for the next connection from the same IP address to validate
it (or validate the network path between the client and the server).
The token generation is implemented by quic_generate_token(), and
the token validation by quic_token_chek(). The same method
is used as for Retry tokens to build such tokens to be reused for
future connections. The format is very simple: one byte for the format
identifier to distinguish these new tokens for the Retry token, followed
by a 32bits timestamps. As this part is ciphered with AEAD as cryptographic
algorithm, 16 bytes are needed for the AEAD tag. 16 more random bytes
are added to this token and a salt to derive the AEAD secret used
to cipher the token. In addition to this salt, this is the client IP address
which is used also as AAD to derive the AEAD secret. So, the length of
the token is fixed: 37 bytes.
(cherry picked from commit
f5b09dc452f582eb876527fd28103bc29c51afad)
[fl: very minor Makefile modif to correctly add quic_token.o object to be compiled]
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
William Lallemand [Wed, 10 Jul 2024 08:28:27 +0000 (10:28 +0200)]
MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
The QUIC crypto is using the EVP_CIPHER API in order to achieve
authenticated encryption, this was the API which was used with OpenSSL.
With libraries that inspires from BoringSSL (libreSSL and AWS-LC), the
AEAD algorithms are implemented using the EVP_AEAD API.
This patch converts the call to the EVP_CIPHER API when called in the
contex of AEAD cryptography for QUIC.
The patch defines some QUIC_AEAD macros that can be either EVP_CIPHER or
EVP_AEAD depending on the library.
This was mainly done for AWS-LC but this could be useful for other
libraries. This should finally allow to use CHACHA20_POLY1305 with
AWS-LC.
This patch allows to use the following ciphers with the EVP_AEAD API:
- TLS1_3_CK_AES_128_GCM_SHA256
- TLS1_3_CK_AES_256_GCM_SHA384
AWS-LC does not implement TLS1_3_CK_AES_128_CCM_SHA256 and
TLS1_3_CK_CHACHA20_POLY1305_SHA256 requires some hack for headers
protection which will come in another patch.
(cherry picked from commit
31c831e29b432f0a9958be63948e8f4cb278e9f8)
[fl: required to support NEW_TOKEN which depends on QUIC_AEAD_* definitions]
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 12:14:24 +0000 (14:14 +0200)]
MINOR: quic: Implement quic_tls_derive_token_secret().
This is function is similar to quic_tls_derive_retry_token_secret().
Its aim is to derive the secret used to cipher the token to be used
for future connections.
This patch renames quic_tls_derive_retry_token_secret() to a more
and reuses its code to produce a more generic one: quic_do_tls_derive_token_secret().
Two arguments are added to this latter to produce both quic_tls_derive_retry_token_secret()
and quic_tls_derive_token_secret() new function which calls
quic_do_tls_derive_token_secret().
(cherry picked from commit
74caa0eece1cc3a8b35f1d34674ea5f357819314)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Frederic Lecaille [Fri, 30 Aug 2024 11:56:15 +0000 (13:56 +0200)]
MINOR: tools: Implement ipaddrcpy().
Implement ipaddrcpy() new function to copy only the IP address from
a sockaddr_storage struct object into a buffer.
(cherry picked from commit
fb7a0922038932a6b82f1827a0214c5d2e8da32e)
Signed-off-by: Frederic Lecaille <flecaille@haproxy.com>
Willy Tarreau [Wed, 4 Sep 2024 14:55:43 +0000 (16:55 +0200)]
BUG/MEDIUM: clock: also update the date offset on time jumps
In GH issue #2704, @swimlessbird and @xanoxes reported problems handling
time jumps. Indeed, since 2.7 with commit
4eaf85f5d9 ("MINOR: clock: do
not update the global date too often") we refrain from updating the global
offset in case it didn't change. But there's a catch: in case of a large
time jump, if the poller was interrupted, the local time remains the same
and we return immediately from there without updating the offset. It then
becomes incorrect regarding the "date" value, and upon subsequent call to
the poller, there's no way to detect a jump anymore so we apply the old,
incorrect offset and the date becomes wrong. Worse, going back to the
original time (then in the past), global_now_ns remains higher than the
local time and neither get updated anymore.
What is missing in practice is to immediately update the offset when
detecting a time jump. In an ideal world, the offset would be updated
upon every call, that's what was being done prior to commit above but
it's extremely CPU intensive on large systems. However we can perfectly
afford to update the offset every time we detect a time jump, as it's
not as common.
This needs to be backported as far as 2.8. Thanks to both participants
above for providing very helpful details.
(cherry picked from commit
e8b1ad4c2b3985eb9e826fd279e419719a2c03ce)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Frederic Lecaille [Mon, 26 Aug 2024 09:18:15 +0000 (11:18 +0200)]
BUILD: quic: 32bits build broken by wrong integer conversions for printf()
Since these commits the 32bits build is broken due to several errors as follow:
CC src/quic_cli.o
src/quic_cli.c: In function ‘dump_quic_full’:
src/quic_cli.c:285:94: error: format ‘%ld’ expects argument of type ‘long int’,
but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
285 | chunk_appendf(&trash, " [initl] rx.ackrng=%-6zu tx.inflight=%-6zu(%ld%%)\n",
| ~~^
| |
| long int
| %lld
286 | pktns->rx.arngs.sz, pktns->tx.in_flight,
287 | pktns->tx.in_flight * 100 / qc->path->cwnd);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| uint64_t {aka long long unsigned int}
Replace several %ld by %llu with ull as printf conversion in quic_clic.c and a
%ld by %lld with (long long) as printf conversion in quic_cc_cubic.c.
Thank you to Ilya (@chipitsine) for having reported this issue in GH #2689.
Must be backported to 3.0.
(cherry picked from commit
414e3aa6bc80d66a448dc25d8e50f4e457dc8711)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Wed, 14 Aug 2024 18:05:58 +0000 (20:05 +0200)]
BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
Remove tune.fast-forward from common_kw_list. It was replaced by
'tune.disable-fast-forward' and it's no longer present in "if..else if.."
parser from cfg_parse_global(). Otherwise, it may be shown as the best-match
keyword for some tune options, which is now wrong.
Should be backported in versions 2.9 and 3.0.
(cherry picked from commit
2e6e159ac47468b6b65e9678c9e4d1fe746165e8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Nathan Wehrman [Tue, 13 Aug 2024 17:36:28 +0000 (10:36 -0700)]
DOC: config: correct the table for option tcplog
option tcplog was reported as functional in the backend section in
error. This can be back ported as needed but it simply corrects
that.
(cherry picked from commit
9788ae1d19ea159f2a87a8ef0a02ff57a480b703)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Mon, 12 Aug 2024 17:21:00 +0000 (19:21 +0200)]
BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
pat_ref_set_elt() returns 0, if we are run out of memory or can't parse a new
map value. Any arror message emitted by pat_ref_set_elt() is saved in err
buffer, if its provided by caller. These error messages are cumulated during
the loop.
pat_ref_set() is used to update values in map, referred to the same given key.
If during the update pat_ref_set_elt() fails, let's retun 0 to caller
immediately. We have the same non-unique key and the same new value in each
loop. So it seems quite odd to cumulate the same error messages and print it in
CLI:
> add map @1 mytest.map <<
+ 1.0.1.11 TestA
+ 1.0.1.11 TESTA
+ 1.0.1.11 test_a
+
> set map mytest.map 1.0.1.11 15
unable to parse '15' unable to parse '15' unable to parse '15'.
cli_parse_set_map(), which calls pat_ref_set() to update map, will return only
one error message with this patch:
> set map mytest.map 1.0.1.11 15
unable to parse '15'.
hlua_set_map() and http_action_set_map() don't provide error buffer and will
just exit on the first error.
This should be backported in all stable versions.
(cherry picked from commit
911f4d93d436a9067fb45168f70f79b5916489b0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Mon, 12 Aug 2024 13:32:00 +0000 (15:32 +0200)]
BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
memprintf() performs realloc and updates then the pointer to an output buffer,
where it has written the data. So free() is called on the previous buffer
address, if it was provided.
pat_ref_set_elt() uses memprintf() to write its error message as well as
pat_ref_set(). So, when we re-enter into the while loop the second time and
pat_ref_set_elt() has returned, the *err ptr (previous value of *merr) is
already freed by memprintf() from pat_ref_set_el().
'if (!found)' condition is false at this point, because we've found a node at
the first loop. So, the second memprintf(), in order to write error messages,
does again free(*err).
This should be backported in all stable versions.
(cherry picked from commit
4f2493f3551f8c344485cccf075c3b15f9825181)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Thu, 1 Aug 2024 16:20:08 +0000 (18:20 +0200)]
BUG/MINOR: h3: properly reject too long header responses
When encoding HTX to HTTP/3 headers on the response path, a bunch of
ABORT_NOW() where used when buffer room was not enough. In most cases
this is safe as output buffer has just been allocated and so is empty at
the start of the function. However, with a header list longer than a
whole buffer, this would cause an unexpected crash.
Fix this by removing ABORT_NOW() statement with proper error return
path. For the moment, this would cause the whole connection to be close
rather than the stream only. This may be further improved in the future.
Also remove ABORT_NOW() when encoding frame length at the end of headers
or trailers encoding. Buffer room is sufficient as it was already
checked prior in the same function.
This should be backported up to 2.6. Special care should be handled
however as this code path has changed frequently :
* for 2.9 and older, the extra following statement must be inserted
prior each newly added goto statement :
h3c->err = H3_INTERNAL_ERROR;
* for 2.6, trailers support is not implemented. As such, related chunks
should just be ignored when backporting.
(cherry picked from commit
48514c118ce881e2ae20aeb1edda929663a8397d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Fri, 9 Aug 2024 09:05:41 +0000 (11:05 +0200)]
BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
This patch is done mostly as a safeguard in order not to trigger
BUG_ON(fdtab[fd].owner != NULL) check, if listen() will fail on UNIX domain
socket.
In uxst_bind_listener(), the pretty same logic of closing socket on error path
was kept, as it was in tcp_bind_listener() before. The use of fd_delete() was
not generalized, when the support of UNIX sock_stream protocol was implemented.
So, let's remove fd from fdtab on failure, instead of closing it. Otherwise,
uxst_bind_listener(), which could be called in loop for each receiver, will
obtain the same fd via socket() for the next receiver. Then, it will bind it
again and it will try to re-insert it in fdtab.
This can be backported to all stable versions.
(cherry picked from commit
eb8235869027fe6f472160febb6edb169f38d1ee)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Thu, 8 Aug 2024 10:04:47 +0000 (12:04 +0200)]
BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
QUIC stream IDs are expressed as QUIC variable integer which cover the
range for 0 to 2^62 - 1. As such, it is forbidden to send an ID for
MAX_STREAMS flow-control frame which would allow to overcome this value.
This patch fixes MAX_STREAMS emission to ensure sent value is valid.
This also ensures that the peer cannot open a stream with an invalid ID
as this would cause a flow-control violation instead.
This must be backported up to 2.6.
(cherry picked from commit
f3c75a52df29247e5d502344127d42efb2c12b82)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Thu, 8 Aug 2024 15:21:12 +0000 (17:21 +0200)]
REGTESTS: mcli: test the pipelined commands on master CLI
A recent fix broke the pipelined command on the master CLI, this
reg-tests implement a simple test that allow to check its right
behavior.
This could be backported as far as 2.6.
(cherry picked from commit
fe5ddcc4901e3b43e58f5cf903c500a69d091b57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Thu, 8 Aug 2024 15:09:15 +0000 (17:09 +0200)]
BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
Since commit 3d93ecc ("BUG/MAJOR: cli: Restore non-interactive mode
behavior with pipelined commands") and commit
598c7f16 ("BUG/MEDIUM:
cli: Warn if pipelined commands are delimited by a \n"), the pipelined
command on the master CLI are either broken or emit warnings depending
on which version.
The reason is that mode applied on the master CLI are saved on the in
the current CLI session, and then reinserted for each pipelined command,
however, these commande were inserted as new lines.
For example:
"@1; expert-mode on; debug dev log foo; debug dev log bar"
Would be sent as:
"expert mode on\ndebug dev log foo"
"expert mode on\ndebug dev log bar"
This patch fixes the issue by using the new ci_insert() function which
inserts a string instead of a newline, and the command are now suffixed
by ';' upon insertion allowing a correct pipelined command chain.
This must be backported with the previous commit introducing ci_insert()
in every stable version.
This is broken since the 3.0 version, but it emits a warning in every
version below, because
598c7f164 was backported.
(cherry picked from commit
b75edf2f11486bc2f2cc92c2b5273219a5e728c6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Thu, 8 Aug 2024 15:05:45 +0000 (17:05 +0200)]
MINOR: channel: implement ci_insert() function
ci_insert() is a function which allows to insert a string <str> of size
<len> at <pos> of the input buffer. This is the equivalent of
ci_insert_line2() but without inserting '\r\n'
(cherry picked from commit
b2a8e8731da82b8bbd9dfff6d5a0d71f25a5ee49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Wed, 7 Aug 2024 17:34:07 +0000 (19:34 +0200)]
BUG/MINOR: proto_tcp: keep error msg if listen() fails
If listen() fails, we need to keep the message about it, which is copied then
in errmsg buffer on the error path. This buffer is properly provided by the
caller (protocol_bind_all()) and reallocated if needed in memprintf(), but
it was deleted without being returned.
This can be backported to all stable versions.
(cherry picked from commit
81f48395b325b9875d215ec2743e75f7a56e1e5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Valentine Krasnobaeva [Wed, 7 Aug 2024 17:31:09 +0000 (19:31 +0200)]
BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
If listen() fails, fd should be deleted from fdtab, not just closed. Otherwise,
sock_inet_bind_receiver(), which is called in loop for each receiver, will
obtain the same fd via socket() for the next receiver, registered in the
receivers list. Then, it will bind it again and it will try to re-insert it in
fdtab, and fd_insert() will trigger the BUG_ON(fdtab[fd].owner != NULL) check.
When tcp_bind_listener() code was implemented, the use of fd_delete() was
not generalized and this one remained overlooked.
This can be backported to all stable versions.
(cherry picked from commit
308c6881c03b6302afd5cc48781d73a11ef994d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 13:22:50 +0000 (15:22 +0200)]
BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
The event emitted by this trace was of type CLOSE instead of NEW, which
would somtimes temporarily pause a started trace.
This can be backported to 3.0, probably 2.6.
(cherry picked from commit
6bf50dfccca992d7f05febb5819e57b601ef94c0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 13:21:00 +0000 (15:21 +0200)]
BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
The test was was performed but there's no way to set the option! Let's
just add "qconn" to select the quic conn when the source supports it.
This can be backported at least to 3.0, probably 2.6.
(cherry picked from commit
7a22fbd453d7ef732b72c1d45d0e2c4f89b43fcb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 09:45:54 +0000 (11:45 +0200)]
BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
The doc clearly says that "start <evt>" should leave the trace in pause
mode until the indicated event appears. However it's not what's happening,
the state is not changed until one command uses "now", so it's typically
needed to configure the events with "start <evt>" then enable the waiting
mode using "pause now". This is counter-intuitive and does not match the
doc, so let's fix it so that "start <evt>" switches from stopped to waiting
as long as at least one event is enabled.
This can be backported to all versions.
(cherry picked from commit
0406efe9ad129e91f8a6c93b780064b3c27ccaa0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 09:32:10 +0000 (11:32 +0200)]
BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
When calling TRACE_ENABLED(), which is called by TRACE_PRINTF(), we pass
a NULL plockptr to __trace_enabled(). This argument is used when lockon
is active, and may update the pointer. This is an overlook which also
broke the lockon mechanism because now for calls from __trace(), it
dereferences a pointer pointing to NULL, and never updates it due to the
broken condition, so that trace() never sets up src->lockon_ptr.
The bug was introduced in 2.8 by commit
8f9a9704bb ("MINOR: trace: add a
TRACE_ENABLED() macro to determine if a trace is active"), so the fix must
be backported there.
(cherry picked from commit
b5df6b5a31b86b4403f00b7e0230c97883eca0f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 09:09:03 +0000 (11:09 +0200)]
BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
These ones were not proposed in the list of trackable elements. Note
that this depends on previous commit:
BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
This should be backported to at least 3.0, maybe even 2.6.
(cherry picked from commit
88a752ca789e6a2f863e16a35ddaa4fade5ccecd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 6 Aug 2024 09:07:13 +0000 (11:07 +0200)]
BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
In __trace_enabled(), a quic_conn was detected, but it was not possible
to derive the connection nor the session from it, which was quite limiting
in terms of ability to track a same instance.
This should be backported to at least 3.0, maybe even 2.6.
(cherry picked from commit
aa1915a9f559724cb3fc2be39a2d928d99556e5a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Wed, 7 Aug 2024 11:51:52 +0000 (13:51 +0200)]
DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
These must be before {bs,fs}.id, not after. Should be backported wherever
068ce2d5d2 ("MINOR: stconn: Add samples to retrieve about stream aborts")
is (normally 3.0).
(cherry picked from commit
b681a9e48813742850299fb5207766ac6f15007d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Ilia Shipitsin [Mon, 5 Aug 2024 18:59:09 +0000 (20:59 +0200)]
BUG/MINOR: fcgi-app: handle a possible strdup() failure
This defect was found by the coccinelle script "unchecked-strdup.cocci".
It can be backported to 2.2.
(cherry picked from commit
aaaacaaf4b558f4e0206b98c787cdcef773b1d76)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 1 Aug 2024 15:09:06 +0000 (17:09 +0200)]
BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
When the peer applet is waiting for a synchronisation with the global sync
task, we must notify it won't consume data. Otherwise, if some data are
already waiting in the input buffer, the applet will be woken up in loop and
this wil trigger the watchdog. Once synchronized, the applet is woken up. In
that case, the peer applet must indicate it is going to consume data again.
This patch should fix the issue #2656. It must be backported to 3.0.
(cherry picked from commit
78b8b6003082b54a24fa7b13be954f73248cc9d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 1 Aug 2024 14:22:41 +0000 (16:22 +0200)]
BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
When a stream is explicitly woken up by the H2 conneciton, if an error
condition is detected, the corresponding error flag is set on the SE. So
SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was
reported or not.
However, there is no attempt to propagate other termination flags. We must
be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able
to switch a pending error to a fatal error.
Because of this bug, the SE remains with a pending error and no end of
stream, preventing the applicative stream to trully abort it. It means on
some abort scenario, it is possible to block a stream infinitely.
This patch must be backported at least as far as 2.8. No bug was observed on
older versions while the same code is inuse.
(cherry picked from commit
184f16ded7a0274bffe99a4795d0a27f8be7c006)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 1 Aug 2024 14:01:50 +0000 (16:01 +0200)]
BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
For regular H2 messages, the HTX EOM flag is synonymous the end of input. So
SE_FL_EOI flag must also be set on the stream-endpoint descriptor. However,
there is an exception. For tunneled streams, the end of message is reported
on the HTX message just after the headers. But in that case, no end of input
is reported on the SE.
But here, there is a bug. The "early" EOM is also report on the HTX messages
when there is no payload (for instance a content-length set to 0). If there
is no ES flag on the H2 HEADERS frame, it is an unexpected case. Because for
the applicative stream and most probably for the opposite endpoint, the
message is considered as finihsed. It is switched in its DONE state (or the
equivalent on the endpoint). But, if an extra H2 frame with the ES flag is
received, a TRAILERS frame or an emtpy DATA frame, an extra EOT HTX block is
pushed to carry the HTX EOM flag. So an extra HTX block is emitted for a
regular HTX message. It is totally invalid, it must never happen.
Because it is an undefined behavior, it is difficult to predict the result.
But it definitly prevent the applicative stream to properly handle aborts
and errors because data remain blocked in the channel buffer. Indeed, the
end of the message was seen, so no more data are forwarded.
It seems to be an issue for 2.8 and upper. Harder to evaluate for older
versions.
This patch must be backported as far as 2.4.
(cherry picked from commit
6743e128f34fba297f2cac836a4f11b84acd503a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 1 Aug 2024 13:42:09 +0000 (15:42 +0200)]
BUG/MEDIUM: http-ana: Report error on write error waiting for the response
When we are waiting for the server response, if an error is pending on the
frontend side (a write error on client), it is handled as an abort and all
regular response analyzers are removed, except the one responsible to
release the filters, if any. However, while it is handled as an abort, the
error is not reported, as usual, via http_reply_and_close() function. It is
an issue because in that, the channels buffers are not reset.
Because of this bug, it is possible to block a stream infinitely. The
request side is waiting for the response side and the response side is
blocked because filters must be released and this cannot be done because
data remain blocked in channels buffers.
So, in that case, calling http_reply_and_close() with no message is enough
to unblock the stream.
This patch must be backported as far as 2.8.
(cherry picked from commit
0ba6202796fe24099aeff89a5a4b83af99fc027b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 29 Jul 2024 08:42:50 +0000 (10:42 +0200)]
BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
Received QUIC packets are stored in quic_conn Rx buffer after header
protection removal in qc_rx_pkt_handle(). These packets are then removed
after quic_conn IO handler via qc_treat_rx_pkts().
If HP cannot be removed, packets are still copied into quic_conn Rx
buffer. This can happen if encryption level TLS keys are not yet
available. The packet remains in the buffer until HP can be removed and
its content processed.
An issue occurs if client emits a 0-RTT packet but haproxy does not have
the shared secret, for example after a haproxy process restart. In this
case, the packet is copied in quic_conn Rx buffer but its HP won't ever
be removed. This prevents the buffer to be purged. After some time, if
the client has emitted enough packets, Rx buffer won't have any space
left and received packets are dropped. This will cause the connection to
freeze.
To fix this, remove any 0-RTT buffered packets on handshake completion.
At this stage, 0-RTT packets are unnecessary anymore. The client is
expected to reemit its content in 1-RTT packet which are properly
deciphered.
This can easily reproduce with HTTP/3 POST requests or retrieving a big
enough object, which will fill the Rx buffer with ACK frames. Here is a
picoquic command to provoke the issue on haproxy startup :
$ picoquicdemo -Q -v
00000001 -a h3 <hostname> 20443 "/?s=1g"
Note that allow-0rtt must be present on the bind line to trigger the
issue. Else haproxy will reject any 0-RTT packets.
This must be backported up to 2.6.
This could be one of the reason for github issue #2549 but it's unsure
for now.
(cherry picked from commit
bba6baff306820a01ea66fddde5acbad11c601b6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Tue, 30 Jul 2024 12:54:44 +0000 (14:54 +0200)]
BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
Revert patch fcc8255 "MINOR: ssl_sock: Early data disabled during
SSL_CTX switching (aws-lc)". The patch was done in the wrong callback
which is never built for AWS-LC, and applies options on the SSL_CTX
instead of the SSL, which should never be done elsewhere than in the
configuration parsing.
This was probably triggered by successfully linking haproxy against
AWS-LC without using USE_OPENSSL_AWSLC.
The patch also reintroduced SSL_CTX_set_early_data_enabled() in the
ssl_quic_initial_ctx() and ssl_sock_initial_ctx(). So the initial_ctx
does have the right setting, but it still needs to be applied to the
selected SSL_CTX in the clienthello, because we need it on the selected
SSL_CTX.
Must be backported to 3.0. (ssl_clienthello.c part was in ssl_sock.c)
(cherry picked from commit
1889b86561ee67696760111c6df5759c628430dc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Mon, 29 Jul 2024 13:42:47 +0000 (15:42 +0200)]
BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
Then reactivate HAVE_SSL_0RTT and HAVE_SSL_0RTT_QUIC for AWS-LC, which
were wrongly deactivated in
f5353f2c ("MINOR: ssl: add HAVE_SSL_0RTT
constant").
Must be backported to 3.0.
(cherry picked from commit
56eefd6827b42afcefed7cc41d2cc38f5c1a2172)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 30 Jul 2024 15:54:26 +0000 (17:54 +0200)]
BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
The backend depends on the response and the frontend on the request, not
the other way around. In addition, they used to depend on L6 (hence
contents in the channel buffers) while they should only depend on L5
(permanent info known in the mux).
This came in 2.9 with commit
24059615a7 ("MINOR: Add sample fetches to
get the frontend and backend stream ID") so this can be backported there.
(cherry picked from commit
61dd0156c82ea051779e6524cad403871c31fc5a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
376b147ffffef0a1f898d72d1d70f10f07d2e5a4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 30 Jul 2024 08:43:55 +0000 (10:43 +0200)]
BUILD: mux-pt: Use the right name for the sedesc variable
A typo was introduced in
760d26a86 ("BUG/MEDIUM: mux-pt/mux-h1: Release the
pipe on connection error on sending path"). The sedesc variable is 'sd', not
'se'.
This patch must be backported with the commit above.
(cherry picked from commit
d9f41b1d6e811022372ce541e67b047bd18630a9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 30 Jul 2024 06:41:03 +0000 (08:41 +0200)]
BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
When data are sent using the kernel splicing, if a connection error
occurred, the pipe must be released. Indeed, in that case, no more data can
be sent and there is no reason to not release the pipe. But it is in fact an
issue for the stream because the channel will appear are not empty. This may
prevent the stream to be released. This happens on 2.8 when a filter is also
attached on it. On 2.9 and upper, it seems there is not issue. But it is
hard to be sure and the current patch remains valid is all cases. On 2.6 and
lower, the code is not the same and, AFAIK, there is no issue.
This patch must be backported to 2.8. However, on 2.8, there is no zero-copy
data forwarding. The patch must be adapted. There is no done_ff/resume_ff
callback functions for muxes. The pipe must released in sc_conn_send() when
an error flag is set on the SE, after the call to snd_pipe callback
function.
(cherry picked from commit
760d26a8625f3af2b6939037a40f19b5f8063be1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 29 Jul 2024 15:48:16 +0000 (17:48 +0200)]
BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
When a send on a connection is performed, if a SE error (or a pending error)
was already reported earlier, we leave immediately. No send is performed.
However, we must be sure to report the error at the SC level if necessary.
Indeed, the SE error may have been reported during the zero-copy data
forwarding. So during receive on the opposite side. In that case, we may
have missed the opportunity to report it at the SC level.
The patch must be backported as far as 2.8.
(cherry picked from commit
5dc45445ff18207dbacebf1f777e1f1abcd5065d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Aurelien DARRAGON [Tue, 6 Aug 2024 12:29:56 +0000 (14:29 +0200)]
BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
An issue has been introduced with cd99440 ("BUG/MAJOR: server/addr: fix
a race during server addr:svc_port updates").
Indeed, in the above commit we implemented the atomic_sync task which is
responsible for consuming pending server events to apply the changes
atomically. For now only server's addr updates are concerned.
To prevent the task from causing contention, a budget was assigned to it.
It can be controlled with the global tunable
'tune.events.max-events-at-once': the task may not process more than this
number of events at once.
However, a bug was introduced with this budget logic: each time the task
has to be interrupted because it runs out of budget, we reschedule the
task to finish where it left off, but the current event which was already
removed from the queue wasn't processed yet. This means that this pending
event (each tune.events.max-events-at-once) is effectively lost.
When the atomic_sync task deals with large number of concurrent events,
this bug has 2 known consequences: first a server's addr/port update
will be lost every 'tune.events.max-events-at-once'. This can of course
cause reliability issues because if the event is not republished
periodically, the server could stay in a stale state for indefinite amount
of time. This is the case when the DNS server flaps for instance: some
servers may not come back UP after the incident as described in GH #2666.
Another issue is that the lost event was not cleaned up, resulting in a
small memory leak. So in the end, it means that the bug is likely to
cause more and more degradation over time until haproxy is restarted.
As a workaround, 'tune.events.max-events-at-once' may be set to the
maximum number of events expected per batch. Note however that this value
cannot exceed 10 000, otherwise it could cause the watchdog to trigger due
to the task being busy for too long and preventing other threads from
making any progress. Setting higher values may not be optimal for common
workloads so it should only be used to mitigate the bug while waiting for
this fix.
Since tune.events.max-events-at-once defaults to 100, this bug only
affects configs that involve more than 100 servers whose addr:port
properties are likely to be updated at the same time (batched updates
from cli, lua, dns..)
To fix the bug, we move the budget check after the current event is fully
handled. For that we went from a basic 'while' to 'do..while' loop as we
assume from the config that 'tune.events.max-events-at-once' cannot be 0.
While at it, we reschedule the task once thread isolation ends (it was not
required to perform the reschedule while under isolation) to give the hand
back faster to waiting threads.
This patch should be backported up to 2.9 with cd99440. It should fix
GH #2666.
(cherry picked from commit
8f1fd96d17588fb571959901bd20d4239b1a96af)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Tue, 3 Sep 2024 13:37:09 +0000 (15:37 +0200)]
[RELEASE] Released version 3.0.4
Released version 3.0.4 with the following main changes :
- MINOR: proto: extend connection thread rebind API
- BUILD: listener: silence a build warning about unused value without threads
- BUG/MEDIUM: quic: prevent crash on accept queue full
- CLEANUP: proto: rename TID affinity callbacks
- CLEANUP: quic: rename TID affinity elements
- BUG/MINOR: session: Eval L4/L5 rules defined in the default section
- BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
- DOC: install: don't reference removed CPU arg
- BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
- BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
- DOC: configuration: issuers-chain-path not compatible with OCSP
- DOC: config: improve the http-keep-alive section
- BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
- BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
- BUG/MINOR: cli: Atomically inc the global request counter between CLI commands
- BUG/MINOR: quic: Non optimal first datagram.
- MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
- BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
- BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature
- MINOR: quic: Dump TX in flight bytes vs window values ratio.
- MINOR: quic: Add information to "show quic" for CUBIC cc.
- MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
- MINOR: queue: add a function to check for TOCTOU after queueing
- BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
- MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD (take #2)
- BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn
- Revert "MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface"
- MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck()
- DOC: quic: fix default minimal value for max window size
- MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
- BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
- BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
- BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
- BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
- BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
- BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
- BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
- MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
- BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
- BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
- BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
Christopher Faulet [Tue, 3 Sep 2024 13:19:51 +0000 (15:19 +0200)]
BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
A regression was introduced in the commit
76fa71f7a ("BUG/MEDIUM: mux-pt:
Never fully close the connection on shutdown") because of a typo on the
connection flags. CO_FL_SOCK_WR_SH flag must be tested to prevent a call to
conn_sock_shutw() and not CO_FL_SOCK_RD_SH.
Concretly, most of time, it is harmeless because shutdown for writes is
always performed before any shutdown for reads. Except in case describe by
the commit above. But it is not clear if it has an impact or not.
This patch must be backported with the commit above, so as far as 2.9.
(cherry picked from commit
e1cae428791abf4e4fdf3969761eaaafd45df636)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Frederic Lecaille [Tue, 3 Sep 2024 13:10:25 +0000 (15:10 +0200)]
BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
This bug arrived with this naive commit:
BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
which omitted to consider the case where the Initial packet number space
could be discarded before receiving 0-RTT packets.
To fix this, append/insert the O-RTT (early-data) packet number space
into the encryption level list depending on the presence or not of
the Initial packet number space.
This issue was revealed when using aws-lc as TLS stack in GH #2701 issue.
Thank you to @Tristan971 for having reported this issue.
Must be backported where the commit mentionned above is supposed to be
backported: as far as 2.9.
(cherry picked from commit
7e19432fd41e9c0146f0227b43d0dd3dc740e20b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Frederic Lecaille [Fri, 2 Aug 2024 09:05:45 +0000 (11:05 +0200)]
BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
By "aws-lc only", one means that this bug was first revealed by aws-lc stack.
This does not mean it will not appeared for new versions of other TLS stacks which
have never revealed this bug.
This bug was reported by Ilya (@chipitsine) in GH #2657 where some QUIC interop
tests (resumption, zerortt) could lead to crash with haproxy compiled against
aws-lc TLS stack. These crashed were triggered by this BUG_ON() which detects
that too short datagrams with at least one ack-eliciting Initial packet inside
could be built.
<0>2024-07-31T15:13:42.562717+02:00 [01|quic|5|quic_tx.c:739] qc_prep_pkts():
next encryption level : qc@0x61d000041080 idle_timer_task@0x60d000006b80 flags=0x6000058
FATAL: bug condition "first_pkt->type == QUIC_PACKET_TYPE_INITIAL && (first_pkt->flags & (1UL << 0)) && length < 1200" matched at src/quic_tx.c:163
call trace(12):
| 0x563ea447bc02 [ba d9 00 00 00 48 8d 35]: main-0x1958ce
| 0x563ea4482703 [e9 73 fe ff ff ba 03 00]: qc_send+0x17e4/0x1b5d
| 0x563ea4488ab4 [85 c0 0f 85 00 f6 ff ff]: quic_conn_io_cb+0xab1/0xf1c
| 0x563ea468e6f9 [48 c7 c0 f8 55 ff ff 64]: run_tasks_from_lists+0x173/0x9c2
| 0x563ea468f24a [8b 7d a0 29 c7 85 ff 0f]: process_runnable_tasks+0x302/0x6e6
| 0x563ea4610893 [83 3d aa 65 44 00 01 0f]: run_poll_loop+0x6e/0x57b
| 0x563ea4611043 [48 8b 1d 46 c7 1d 00 48]: main-0x48d
| 0x7f64d05fb609 [64 48 89 04 25 30 06 00]: libpthread:+0x8609
| 0x7f64d0520353 [48 89 c7 b8 3c 00 00 00]: libc:clone+0x43/0x5e
That said everything was correctly done by qc_prep_ptks() to prevent such a case.
But this relied on the hypothesis that the list of encryption levels it used
was always built in the same order as follows for 0-RTT sessions:
initial, early-data, handshake, application
But this order is determined but the order the TLS stack derives the secrets
for these encryption levels. For aws-lc, this order is not the same but
as follows:
initial, handshake, application, early-data
During 0-RTT sessions, the server may have to build three ack-eliciting packets
(with CRYPTO data inside) to reply to the first client packet: initial, hanshake,
application. qc_prep_pkts() adds a PADDING frame to the last built packet
for the last encryption level in the list. But after application level encryption,
there is early-data encryption level. This prevented qc_prep_pkts() to build
a padded applicaiton level last packet to send a 1200-bytes datagram.
To fix this, always insert early-data encryption level after the initial
encryption level into the encryption levels list when initializing this encryption
level from quic_conn_enc_level_init().
Must be backported as far as 2.9.
(cherry picked from commit
e12620a8a909805755ae1e8b8552c187202a9f3f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Mon, 2 Sep 2024 13:18:51 +0000 (15:18 +0200)]
BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
There exists an extremely tricky code path that was revealed in 3.0 by
the glitches feature, though it might theoretically have existed before.
TL;DR: a mux mbuf may be full after successfully sending GOAWAY, and
discard its remaining contents without clearing H2_CF_MUX_MFULL and
H2_CF_DEM_MROOM, then endlessly loop in h2_send(), until the watchdog
takes care of it.
What can happen is the following: Some data are received, h2_io_cb() is
called. h2_recv() is called to receive the incoming data. Then
h2_process() is called and in turn calls h2_process_demux() to process
input data. At some point, a glitch limit is reached and h2c_error() is
called to close the connection. The input frame was incomplete, so some
data are left in the demux buffer. Then h2_send() is called, which in
turn calls h2_process_mux(), which manages to queue the GOAWAY frame,
turning the state to H2_CS_ERROR2. The frame is sent, and h2_process()
calls h2_send() a last time (doing nothing) and leaves. The streams
are all woken up to notify about the error.
Multiple backend streams were waiting to be scheduled and are woken up
in turn, before their parents being notified, and communicate with the
h2 mux in zero-copy-forward mode, request a buffer via h2_nego_ff(),
fill it, and commit it with h2_done_ff(). At some point the mux's output
buffer is full, and gets flags H2_CF_MUX_MFULL.
The io_cb is called again to process more incoming data. h2_send() isn't
called (polled) or does nothing (e.g. TCP socket buffers full). h2_recv()
may or may not do anything (doesn't matter). h2_process() is called since
some data remain in the demux buf. It goes till the end, where it finds
st0 == H2_CS_ERROR2 and clears the mbuf. We're now in a situation where
the mbuf is empty and MFULL is still present.
Then it calls h2_send(), which doesn't call h2_process_mux() due to
MFULL, doesn't enter the for() loop since all buffers are empty, then
keeps sent=0, which doesn't allow to clear the MFULL flag, and since
"done" was not reset, it loops forever there.
Note that the glitches make the issue more reproducible but theoretically
it could happen with any other GOAWAY (e.g. PROTOCOL_ERROR). What makes
it not happen with the data produced on the parsing side is that we
process a single buffer of input at once, and there's no way to amplify
this to 30 buffers of responses (RST_STREAM, GOAWAY, SETTINGS ACK,
WINDOW_UPDATE, PING ACK etc are all quite small), and since the mbuf is
cleared upon every exit from h2_process() once the error was sent, it is
not possible to accumulate response data across multiple calls. And the
regular h2_snd_buf() path checks for st0 >= H2_CS_ERROR so it will not
produce any data there either.
Probably that h2_nego_ff() should check for H2_CS_ERROR before accepting
to deliver a buffer, but this needs to be carefully studied. In the mean
time the real problem is that the MFULL flag was kept when clearing the
buffer, making the two inconsistent.
Since it doesn't seem possible to trigger this sequence without the
zero-copy-forward mechanism, this fix needs to be backported as far as
2.9, along with previous commit "MINOR: mux-h2: try to clear DEM_MROOM
and MUX_MFULL at more places" which will strengthen the consistency
between these checks.
Many thanks to Annika Wickert for her detailed report that allowed to
diagnose this problem. CVE-2024-45506 was assigned to this problem.
(cherry picked from commit
830e50561c6636be4ada175d03e8df992abbbdcd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Mon, 2 Sep 2024 13:14:16 +0000 (15:14 +0200)]
MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
The code leading to H2_CF_MUX_MFULL and H2_CF_DEM_MROOM being cleared
is quite complex and assumptions about its state are extremely difficult
when reading the code. There are indeed long sequences where the mux might
possibly be empty, still having the flag set until it reaches h2_send()
which will clear it after the last send. Even then it's not obviour whether
it's always guaranteed to release the flag when invoked in multiple passes.
Let's just simplify the conditionnn so that h2_send() does not depend on
"sent" anymore and that h2_timeout_task() doesn't leave the flags set on
the buffer on emptiness. While it doesn't seem to fix anything, it will
make the code more robust against future changes.
(cherry picked from commit
e9cdedb39b2020a7eb1ae5d8462b391d4301fb93)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Tue, 3 Sep 2024 12:10:29 +0000 (14:10 +0200)]
BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
When a 400/408/500/501 error is returned by the H1 multiplexer, we first try
to get the error message of the proxy before using the default one. This may
be configured to be mapped on /dev/null or on an empty file. In that case,
no message is emitted, as expected. But everything is handled as the error
was successfully sent.
However, there is an bug here. In h1_send_error() function, this case is not
properly handled. The flag H1C_F_ABRTED is not set on the H1 connection as it
should be and h1_close() function is not called, leaving the H1 connection in an
undefined state.
It is especially an issue when a "empty" 408-Request-Time-out error is emitted
while there are data blocked in the output buffer. In that case, the connection
remains openned until the client closes and a "cR--"/408 is logged repeatedly, every
time the client timeout is reached.
This patch must backported as far as 2.8.
(cherry picked from commit
0d4271cdae18780de79e1ce997d562f91eeee316)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Frederic Lecaille [Tue, 3 Sep 2024 08:52:39 +0000 (10:52 +0200)]
BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
qc_prep_hdshk_fast_retrans() job is to pick some packets to be retransmitted
from Initial and Handshake packet number spaces. A packet may be coalesced to
a first one into the same datagram. When a coalesced packet is inspected for
retransmission, it is skipped if its length would make the total datagram length
it is attached to exceeding the anti-amplification limit. But in this case, the
first packet must be kept for the current retransmission. This is tracked by
this trace statemement:
TRACE_PROTO("will probe Initial packet number space", QUIC_EV_CONN_SPPKTS, qc);
This was not the case because of the wrong "goto end" statement. This latter
must be run only if the Initial packet number space must not be probe with
the first packet found as coalesced to another one which must be skipped.
This bug was revealed by AWS-LC interop runner with handshakeloss and
handshakecorruption which always fail because this stack leads the server
to send more Initial packets.
Thank you to Ilya (@chipitsine) for this issue report in GH #2663.
Must be backported as far as 2.6.
(cherry picked from commit
15a737eb5fc54bbc8aa5cadad054a69badde5b8e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 2 Sep 2024 16:29:02 +0000 (18:29 +0200)]
BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
When several commands are chained on the master CLI, the same client
connection is used. Because, it is a TCP connection, the mux PT is used. It
means there is no stream at the mux level. It is not possible to release the
applicative stream between each commands as for the HTTP. So, to work around
this limitation, between two commands, the master CLI is resetting the
stream. It does exactly what it was performed on HTTP to manage keep-alive
connections on old HAProxy versions.
But this part was copied from a code dealing with connection only while the
back endpoint can be an applet or a mux for the master cli. The previous fix
on the mux PT ("BUG/MEDIUM: mux-pt: Never fully close the connection on
shutdown") revealed a bug. Between two commands, the back endpoint was only
released if the connection's XPRT was closed. This works if the back
endpoint is an applet because there is no connection. But for commands sent
to a worker, a connection is used. At this stage, this only works if the
connection's XPRT is closed. Otherwise, the old endpoint is never detached
leading to undefined behavior on the next command execution (most probably a
crash).
Without the commit above, the connection's XPRT is always closed on
shutdown. It is no longer true. At this stage, we must inconditionnally
release the back endpoint by resetting the corresponding sedesc to fix the
bug.
This patch must be backported with the commit above in all stable
versions. On 2.4 and lower, it will need to be adapted.
(cherry picked from commit
d4781bd5e7f0e0c0491fb7dfccb4da5c15055e3f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 2 Sep 2024 13:30:11 +0000 (15:30 +0200)]
BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
When a shutdown is reported to the mux (shutdown for reads or writes), the
connexion is immediately fully closed if the mux detects the connexion is
closed in both directions. Only the passthrough multiplexer is able to
perform this action at this stage because there is no stream and no internal
data. Other muxes perform a full connection close during the mux's release
stage. It was working quite well since recently. But, in theory, the bug is
quite old.
In fact, it seems possible for the lower layer to report an error on the
connection in same time a shutdown is performed on the mux. Depending on how
events are scheduled, the following may happen:
1. An connection error is detected at the fd layer and a wakeup is
scheduled on the mux to handle the event.
2. A shutdown for writes is performed on the mux. Here the mux decides to
fully close the connexion. If the xprt is not used to log info, it is
released.
3. The mux is finally woken up. It tries to retrieve data from the xprt
because it is not awayre there was an error. This leads to a crash
because of a NULL-deref.
By reading the code, it is not obvious. But it seems possible with SSL
connection when the handshake is rearmed. It happens when a
SSL_ERROR_WANT_WRITE is reported on a SSL_read() attempt or a
SSL_ERROR_WANT_READ on a SSL_write() attempt.
This bug is only visible if the XPRT is not used to log info. So it is no so
common.
This patch should fix the 2nd crash reported in the issue #2656. It must
first be backported as far as 2.9 and then slowly to all stable versions.
(cherry picked from commit
76fa71f7a8d27006ea1b06b417963501d3a5fcab)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 30 Aug 2024 10:11:03 +0000 (12:11 +0200)]
BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
Support for 429 was recently added to L7 retries (
0d142e075 "MINOR: proxy:
Add support of 429-Too-Many-Requests in retry-on status"). But the
l7_status_match() function was not properly updated. The switch statement
must match the 429 status to be able to perform a L7 retry.
This patch must be backported if the commit above is backported. It is
related to #2687.
(cherry picked from commit
62c9d51ca4d4f870723522b30d368d984f536e7e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 28 Aug 2024 13:42:22 +0000 (15:42 +0200)]
BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
If an early error occurred on the client connection, we must prevent any
multiplexer upgrades. Indeed, it is unexpected for a mux to be initialized
with no xprt. On a normal workflow it is impossible. So it is not an
issue. But if a mux upgrade is performed at the stream level, an early error
on the connection may have already been handled by the previous mux and the
connection may be already fully closed. If the mux upgrade is still
performed, a crash can be experienced.
It is possible to have a crash with an implicit TCP>HTTP upgrade if there is no
data in the input buffer. But it is also possible to get a crash with an
explicit "switch-mode http" rule.
It must be backported to all stable versions. In 2.2, the patch must be
applied directly in stream_set_backend() function.
(cherry picked from commit
e4812404c541018ba521abf6573be92553ba7c53)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Tue, 27 Aug 2024 17:16:07 +0000 (19:16 +0200)]
BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
When DATA frames are sent via the 0-copy data forwarding, we must take care
to set the ES flag on the last DATA frame. It should be performed in
h2_done_ff() when IOBUF_FL_EOI flag was set by the producer. This flag is
here to know when the producer has reached the end of input. When this
happens, the h2s state is also updated. It is switched to "half-closed
local" or "closed" state depending on its previous state.
It is mainly an issue on uploads because the server may be blocked waiting
for the end of the request. A workaround is to disable the 0-copy forwarding
support the the H2 by setting "tune.h2.zero-copy-fwd-send" directive to off
in your global section.
This patch should fix the issue #2665. It must be backported as far as 2.9.
(cherry picked from commit
4ef5251c44b83ed2f9495d200827f8696f16cd60)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Tue, 27 Aug 2024 17:09:08 +0000 (19:09 +0200)]
MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.
This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.
(cherry picked from commit
0d142e0756986b56819ecb2d131a0c4b30ae899f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Amaury Denoyelle [Wed, 14 Aug 2024 16:25:01 +0000 (18:25 +0200)]
DOC: quic: fix default minimal value for max window size
It is possible to override the default QUIC congestion algorithm on a
bind line. With the same setting, it is also possible to specify the
maximum congestion window size.
The parser rejects values outside of the range between 10k and 4g. This
is in contradiction with the documentation which specify 1k as the lower
value. Correct this value in the documentation.
This should be backported up to 2.9.
(cherry picked from commit
103d8607776dbbf6f64eaf82359ec7a5dd7e3ebb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Aurelien DARRAGON [Tue, 13 Aug 2024 15:49:46 +0000 (17:49 +0200)]
MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck()
With 7a21c3a ("MAJOR: log: implement proper postparsing for logformat
expressions") which finally made postparsing checks reliable, we started
to get report from users that couldn't start haproxy 3.0 with configs that
used to work in the past. The current situation is described in GH #2642.
While the checks are mostly relevant, it turns out there are not strictly
needed anymore from a technical point of view. Most of them were useful in
early logformat implementation to prevent runtime bugs due to the use of
an alias or fetch at runtime from an incompatible proxy. It's been a few
versions already that the code handling fetches and log aliases is robust
enough to support fetches/aliases used from the wrong context: all it
does is that the fetch/alias will silently fail if it's not available.
This can be proved by the fact that even if the postparsing checks were
partially broken in the past, it didn't cause runtime issues (at least
on recent haproxy versions).
Most of these checks can now be seen as configuration hints: when a check
triggers, it will indicate a configuration inconsistency in most cases,
but they are some corner cases where it is not possible to know at config
time if the conditions will be met for the alias/fetch to work properly..
so instead of failing with a hard error like we did so far, let's just be
more permissive and report our findings using "diag_warning": such
warnings are only emitted when haproxy is started with '-dD' cli option.
We also took this opportunity to improve messages clarity and make them
more precise (report the offending item instead of complaining about the
whole expression because of a single element).
With this patch, configs that used to start before 7a21c3a shouldn't
trigger hard errors anymore.
This may be backported in 3.0.
(cherry picked from commit
41ca89bc6fe96a660fea992643dbc2c0844a609e)
[ada: ctx adjt]
Signed-off-by: Aurelien DARRAGON <adarragon@haproxy.com>