Repositories - haproxy-2.5.git/log

BUG/MINOR: proxy: preset the error message pointer to NULL in parse_new_proxy()

As reported by Coverity in issue #1568, a missing initialization of the
error message pointer in parse_new_proxy() may result in displaying garbage
or crashing in case of memory allocation error when trying to create a new
proxy on startup.

This should be backported to 2.4.

(cherry picked from commit 282b6a7539add7517f2b16cc82b6a3509492fd76)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: Fix usage/examples of deprecated ACLs

Some examples or references were still using deprecated ACL variants.

Signed-off-by: Christian Ruppert <idl0r@qasl.de>
(cherry picked from commit 59e66e30c2aa82947c1f00ec64eec117efa8846d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-h2: Be sure to always report HTX parsing error to the app layer

If a parsing error is detected and the corresponding HTX flag is set
(HTX_FL_PARSING_ERROR), we must be sure to always report it to the app
layer. It is especially important when the error occurs during the response
parsing, on the server side. In this case, the RX buffer contains an empty
HTX message to carry the flag. And it remains in this state till the info is
reported to the app layer. This must be done otherwise, on the conn-stream,
the CS_FL_ERR_PENDING flag cannot be switched to CS_FL_ERROR and the
CS_FL_WANT_ROOM flag is always set when h2_rcv_buf() is called. The result
is a ping-pong loop between the mux and the stream.

Note that this patch fixes a bug. But it also reveals a design issue. The
error must not be reported at the HTX level. The error is already carried by
the conn-stream. There is no reason to duplicate it. In addition, it is
errorprone to have an empty HTX message only to report the error to the app
layer.

This patch should fix the issue #1561. It must be backported as far as 2.0
but the bug only affects HAProxy >= 2.4.

(cherry picked from commit ec361bbd843781fb15ebbfca6aea57455d3ac3f8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1: Don't wake h1s if mux is blocked on lack of output buffer

After sending some data, we try to wake the H1 stream to resume data
processing at the stream level, except if the output buffer is still
full. However we must also be sure the mux is not blocked because of an
allocation failure on this buffer. Otherwise, it may lead to a ping-pong
loop between the stream and the mux to send more data with an unallocated
output buffer.

Note there is a mechanism to queue buffers allocations when a failure
happens. However this mechanism is totally broken since the filters were
introducted in HAProxy 1.7. And it is worse now with the multiplexers. So
this patch fixes a possible loop needlessly consuming all the CPU. But
buffer allocation failures must remain pretty rare.

This patch must be backported as far as 2.0.

(cherry picked from commit c17c31c822dc4a5f3270f6ae4f96afa50fb7390f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: htx: Be sure to have a buffer to perform a raw copy of a message

In htx_copy_msg(), if the destination buffer is empty, we perform a raw copy
of the message instead of a copy block per block. But we must be sure the
destianation buffer was really allocated. In other word, to perform a raw
copy, the HTX message must be empty _AND_ it must have some free space
available.

This function is only used to copy an HTTP reply (for instance, an error or
a redirect) in the buffer of the response channel. For now, we are sure the
buffer was allocated because it is a pre-requisite to call stream
analyzers. However, it may be a source of bug in future.

This patch may be backported as far as 2.3.

(cherry picked from commit dc523e3b89d8ff3b73eb7d3218cbf2907c94f6ae)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

[RELEASE] Released version 2.5.3

Released version 2.5.3 with the following main changes :
    - MINOR: sock: move the unused socket cleaning code into its own function
    - BUG/MEDIUM: mworker: close unused transferred FDs on load failure
    - BUG/MINOR: mworker: fix a FD leak of a sockpair upon a failed reload
    - BUG/MINOR: sink: Use the right field in appctx context in release callback
    - BUG/MEDIUM: resolvers: Really ignore trailing dot in domain names
    - BUG/MEDIUM: fd: always align fdtab[] to 64 bytes
    - BUG/MAJOR: compiler: relax alignment constraints on certain structures
    - MINOR: httpclient: Don't limit data transfer to 1024 bytes
    - BUG/MINOR: httpclient: reinit flags in httpclient_start()
    - BUG/MINOR: mailers: negotiate SMTP, not ESMTP
    - BUG/MINOR: ssl: Add missing return value check in ssl_ocsp_response_print
    - BUG/MINOR: ssl: Fix leak in "show ssl ocsp-response" CLI command
    - BUG/MINOR: ssl: Missing return value check in ssl_ocsp_response_print
    - CLEANUP: httpclient/cli: fix indentation alignment of the help message
    - BUG/MINOR: tools: url2sa reads ipv4 too far
    - BUG/MEDIUM: httpclient: limit transfers to the maximum available room
    - DEBUG: buffer: check in __b_put_blk() whether the buffer room is respected

DEBUG: buffer: check in __b_put_blk() whether the buffer room is respected

This adds a BUG_ON() to make sure we don't face other situations like the
one fixed by previous commit.

(cherry picked from commit d439a496550ba44ceae09a59c72c278163ff8b99)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: httpclient: limit transfers to the maximum available room

A bug was uncovered by commit fc5912914 ("MINOR: httpclient: Don't limit
data transfer to 1024 bytes"), it happens that callers of b_xfer() and
b_force_xfer() are expected to check for available room in the target
buffer. Previously it was unlikely to be full but now with full buffer-
sized transfers, it happens more often and in practice it is possible
to crash the process with the debug command "httpclient" on the CLI by
going beyond a the max buffer size. Other call places ought to be
rechecked by now and it might be time to rethink this API if it tends
to generalize.

This must be backported to 2.5.

(cherry picked from commit 11adb1d8fcab29ef8b12c93e3b036bb3dcf1607b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: tools: url2sa reads ipv4 too far

The url2sa implementation is inconsitent when parsing an IPv4, indeed
url2sa() takes a <ulen> as a parameter where the call to url2ipv4() takes
a null terminated string. Which means url2ipv4 could try to read more
that it is supposed to.

This function is only used from a buffer so it never reach a unallocated
space. It can only cause an issue when used from the httpclient which
uses it with an ist.

This patch fixes the issue by copying everything in the trash and
null-terminated it.

Must be backported in all supported version.

(cherry picked from commit 8a91374487e13cf139ab36e483163a21ebdc6f4e)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

CLEANUP: httpclient/cli: fix indentation alignment of the help message

The output was not aligned with other commands, let's fix it.

(cherry picked from commit 2c8f984441e4b8875e2af306d00e1d00755f8ef7)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: ssl: Missing return value check in ssl_ocsp_response_print

When calling ssl_ocsp_response_print which is used to display an OCSP
response's details when calling the "show ssl ocsp-response" on the CLI,
we use the BIO_read function that copies an OpenSSL BIO into a trash.
The return value was not checked though, which could lead to some
crashes since BIO_read can return a negative value in case of error.

This patch should be backported to 2.5.

(cherry picked from commit 1b01b7f2eff33ca9bd1da9fa628fd07a48c5a7cc)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: ssl: Fix leak in "show ssl ocsp-response" CLI command

When calling the "show ssl ocsp-response" CLI command some OpenSSL
objects need to be created in order to get some information related to
the OCSP response and some of them were not freed.

It should be backported to 2.5.

(cherry picked from commit 8081b6769902899346f4c717007841190118d349)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: ssl: Add missing return value check in ssl_ocsp_response_print

The b_istput function called to append the last data block to the end of
an OCSP response's detailed output was not checked in
ssl_ocsp_response_print. The ssl_ocsp_response_print return value checks
were added as well since some of them were missing.
This error was raised by Coverity (CID 1469513).

This patch fixes GitHub issue #1541.
It can be backported to 2.5.

(cherry picked from commit a9a591ab3dcf316e30506ec79eb9c255d2b2106c)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: mailers: negotiate SMTP, not ESMTP

As per issue #1552 the mailer code currently breaks on ESMTP multiline
responses. Let's negotiate SMTP instead.

Should be backported to 2.0.

(cherry picked from commit 1a16e4ebcb6d5848dd867a4ef3edda3760c60124)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: httpclient: reinit flags in httpclient_start()

When starting for the 2nd time a request from the same httpclient *hc
context, the flags are not reinitialized and the httpclient will stop
after the first call to the IO handler, because the END flag is always
present.

This patch also add a test before httpclient_start() to ensure we don't
start a client already started.

Must be backported in 2.5.

(cherry picked from commit 5085bc3103e5a52f0d8909d13a6e36d24d2a6562)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

MINOR: httpclient: Don't limit data transfer to 1024 bytes

For debug purpose, no more 1024 bytes were copied at a time. But there is no
reason to keep this limitation. Thus, it is removed.

This patch may be backported to 2.5.

(cherry picked from commit fc5912914b320debad432c668b810170f6fd559a)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MAJOR: compiler: relax alignment constraints on certain structures

In github bug #1517, Mike Lothian reported instant crashes on startup
on RHEL8 + gcc-11 that appeared with 2.4 when allocating a proxy.

The analysis brought us down to the THREAD_ALIGN() entries that were
placed inside the "server" struct to avoid false sharing of cache lines.
It turns out that some modern gcc make use of aligned vector operations
to manipulate some fields (e.g. memset() etc) and that these structures
allocated using malloc() are not necessarily aligned, hence the crash.

The compiler is allowed to do that because the structure claims to be
aligned. The problem is in fact that the alignment propagates to other
structures that embed it. While most of these structures are used as
statically allocated variables, some are dynamic and cannot use that.
A deeper analysis showed that struct server does this, propagates to
struct proxy, which propagates to struct spoe_config, all of which
are allocated using malloc/calloc.

A better approach would consist in usins posix_memalign(), but this one
is not available everywhere and will either need to be reimplemented
less efficiently (by always wasting 64 bytes before the area), or a
few functions will have to be specifically written to deal with the
few structures that are dynamically allocated.

But the deeper problem that remains is that it is difficult to track
structure alignment, as there's no available warning to check this.
For the long term we'll probably have to create a macro such as
"struct_malloc()" etc which takes a type and enforces an alignment
based on the one of this type. This also means propagating that to
pools as well, and it's not a tiny task.

For now, let's get rid of the forced alignment in struct server, and
replace it with extra padding. By punching 63-byte holes, we can keep
areas on separate cache lines. Doing so moderately increases the size
of the "server" structure (~+6%) but that's the best short-term option
and it's easily backportable.

This will have to be backported as far as 2.4.

Thanks to Mike for the detailed report.

(cherry picked from commit ecc473b5299ccd0007a73ec4b4af099ac65e9c43)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MEDIUM: fd: always align fdtab[] to 64 bytes

There's a risk that fdtab is not 64-byte aligned. The first effect is that
it may cause false sharing between cache lines resulting in contention
when adjacent FDs are used by different threads. The second is related
to what is explained in commit "BUG/MAJOR: compiler: relax alignment
constraints on certain structures", i.e. that modern compilers might
make use of aligned vector operations to zero some entries, and would
crash. We do not use any memset() or so on fdtab, so the risk is almost
inexistent, but that's not a reason for violating some valid assumptions.

This patch addresses this by allocating 64 extra bytes and aligning the
structure manually (this is an extremely cheap solution for this specific
case). The original address is stored in a new variable "fdtab_addr" and
is the one that gets freed. This remains extremely simple and should be
easily backportable. A dedicated aligned allocator later would help, of
course.

This needs to be backported as far as 2.2. No issue related to this was
reported yet, but it could very well happen as compilers evolve. In
addition this should preserve high performance across restarts (i.e.
no more dependency on allocator's alignment).

(cherry picked from commit 97ea9c49f1d95c7e91e544e8ad9ba09bffbcc023)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MEDIUM: resolvers: Really ignore trailing dot in domain names

When a string is converted to a domain name label, the trailing dot must be
ignored. In resolv_str_to_dn_label(), there is a test to do so. However, the
trailing dot is not really ignored. The character itself is not copied but
the string index is still moved to the next char. Thus, this trailing dot is
counted in the length of the last encoded part of the domain name. Worst,
because the copy is skipped, a garbage character is included in the domain
name.

This patch should fix the issue #1528. It must be backported as far as 2.0.

(cherry picked from commit 0a82cf4c1662b8ab00036f65b5e3543a9c1a6ff5)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: sink: Use the right field in appctx context in release callback

In the release callback, ctx.peers was used instead of ctx.sft. Concretly,
it is not an issue because the appctx context is an union and these both
fields are structures with a unique pointer. But it will be a problem if
that changes.

This patch must be backported as far as 2.2.

(cherry picked from commit dd0b144c3ae83c2ea14f969cb58fa4b927c2c455)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MINOR: mworker: fix a FD leak of a sockpair upon a failed reload

When starting HAProxy in master-worker, the master pre-allocate a struct
mworker_proc and do a socketpair() before the configuration parsing. If
the configuration loading failed, the FD are never closed because they
aren't part of listener, they are not even in the fdtab.

This patch fixes the issue by cleaning the mworker_proc structure that
were not asssigned a process, and closing its FDs.

Must be backported as far as 2.0, the srv_drop() only frees the memory
and could be dropped since it's done before an exec().

(cherry picked from commit 55a921c9147c2d45621964428c86c5ceb9fd31d8)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MEDIUM: mworker: close unused transferred FDs on load failure

When the master process is reloaded on a new config, it will try to
connect to the previous process' socket to retrieve all known
listening FDs to be reused by the new listeners. If listeners were
removed, their unused FDs are simply closed.

However there's a catch. In case a socket fails to bind, the master
will cancel its startup and swithc to wait mode for a new operation
to happen. In this case it didn't close the possibly remaining FDs
that were left unused.

It is very hard to hit this case, but it can happen during a
troubleshooting session with fat fingers. For example, let's say
a config runs like this:

   frontend ftp
        bind 1.2.3.4:20000-29999

The admin wants to extend the port range down to 10000-29999 and
by mistake ends up with:

   frontend ftp
        bind 1.2.3.41:20000-29999

Upon restart the bind will fail if the address is not present, and the
master will then switch to wait mode without releasing the previous FDs
for 1.2.3.4:20000-29999 since they're now apparently unused. Then once
the admin fixes the config and does:

   frontend ftp
        bind 1.2.3.4:10000-29999

The service will start, but will bind new sockets, half of them
overlapping with the previous ones that were not properly closed. This
may result in a startup error (if SO_REUSEPORT is not enabled or not
available), in a FD number exhaustion (if the error is repeated many
times), or in connections being randomly accepted by the process if
they sometimes land on the old FD that nobody listens on.

This patch will need to be backported as far as 1.8, and depends on
previous patch:

   MINOR: sock: move the unused socket cleaning code into its own function

Note that before 2.3 most of the code was located inside haproxy.c, so
the patch above should probably relocate the function there instead of
sock.c.

(cherry picked from commit e08acaed19c6d6a86ebaf2b5f3089ebef78bc69d)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

MINOR: sock: move the unused socket cleaning code into its own function

The startup code used to scan the list of unused sockets retrieved from
an older process, and to close them one by one. This also required that
the knowledge of the internal storage of these temporary sockets was
known from outside sock.c and that the code was copy-pasted at every
call place.

This patch moves this into sock.c under the name
sock_drop_unused_old_sockets(), and removes the xfer_sock_list
definition from sock.h since the rest of the code doesn't need to know
this.

This cleanup is minimal and preliminary to a future fix that will need
to be backported to all versions featuring FD transfers over the CLI.

(cherry picked from commit b510116fd2b9a676297a1d2bec7d0107bf3c7383)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

[RELEASE] Released version 2.5.2

Released version 2.5.2 with the following main changes :
    - BUG/MEDIUM: connection: properly leave stopping list on error
    - BUG/MEDIUM: htx: Adjust length to add DATA block in an empty HTX buffer
    - BUG/MINOR: httpclient: don't send an empty body
    - BUG/MINOR: httpclient: set default Accept and User-Agent headers
    - BUG/MINOR: httpclient/lua: don't pop the lua stack when getting headers
    - BUILD/MINOR: fix solaris build with clang.
    - BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl
    - DOC: management: mark "set server ssl" as deprecated
    - MEDIUM: cli: yield between each pipelined command
    - MINOR: channel: add new function co_getdelim() to support multiple delimiters
    - BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands
    - MEDIUM: h2/hpack: emit a Dynamic Table Size Update after settings change
    - BUG/MEDIUM: cli: Never wait for more data on client shutdown
    - BUG/MEDIUM: mcli: do not try to parse empty buffers
    - BUG/MEDIUM: mcli: always realign wrapping buffers before parsing them
    - BUG/MINOR: stream: make the call_rate only count the no-progress calls
    - DEBUG: cli: add a new "debug dev fd" expert command
    - BUILD: debug/cli: condition test of O_ASYNC to its existence
    - DEBUG: pools: add new build option DEBUG_POOL_INTEGRITY
    - REGTESTS: ssl: Fix ssl_errors regtest with OpenSSL 1.0.2
    - BUG/MEDIUM: mworker: don't lose the stats socket on failed reload
    - BUG/MINOR: mworker: does not add the -sf in wait mode
    - BUG/MINOR: pools: always flush pools about to be destroyed
    - DEBUG: pools: add extra sanity checks when picking objects from a local cache
    - DEBUG: pools: let's add reverse mapping from cache heads to thread and pool
    - DEBUG: pools: replace the link pointer with the caller's address on pool_free()
    - BUG/MAJOR: sched: prevent rare concurrent wakeup of multi-threaded tasks
    - BUG/MINOR: mworker: does not erase the pidfile upon reload
    - DEBUG: fd: make sure we never try to insert/delete an impossible FD number
    - MINOR: listener: replace the listener's spinlock with an rwlock
    - BUG/MEDIUM: listener: read-lock the listener during accept()
    - BUG/MINOR: httpclient: Revisit HC request and response buffers allocation
    - BUG/MEDIUM: httpclient: Xfer the request when the stream is created
    - BUG/MINOR: ssl: Remove empty lines from "show ssl ocsp-response <id>" output
    - BUG/MINOR: jwt: Double free in deinit function
    - BUG/MINOR: jwt: Missing pkey free during cleanup
    - BUG/MINOR: jwt: Memory leak if same key is used in multiple jwt_verify calls
    - BUG/MINOR: httpclient/cli: display junk characters in vsn
    - BUG/MAJOR: http/htx: prevent unbounded loop in http_manage_server_side_cookies
    - BUG/MAJOR: spoe: properly detach all agents when releasing the applet
    - REGTESTS: server: close an occasional race on dynamic_server_ssl.vtc
    - REGTESTS: peers: leave a bit more time to peers to synchronize
    - BUG/MEDIUM: h2/hpack: fix emission of HPACK DTSU after settings change
    - BUG/MINOR: mux-h2: update the session's idle delay before creating the stream

BUG/MINOR: mux-h2: update the session's idle delay before creating the stream

The idle connection delay calculation before a request is a bit tricky,
especially for multiplexed protocols. It changed between 2.3 and 2.4 by
the integration of the idle delay inside the session itself with these
commits:

dd78921c6 ("MINOR: logs: Use session idle duration when no stream is provided")
7a6c51324 ("MINOR: stream: Always get idle duration from the session")

and by then it was only set by the H1 mux. But over multiple changes, what
used to be a zero idle delay + a request delay for H2 became a bit odd, with
the idle time slipping into the request time measurement. The effect is that,
as reported in GH issue #1395, some H2 request times look huge.

This patch introduces the calculation of the session's idle time on the
H2 mux before creating the stream. This is made possible because the
stream_new() code immediately copies this value into the stream for use
at log time. Thus we don't care about changing something that will be
touched by every single request. The idle time is calculated as documented,
i.e. the delay from the previous request to the current one. This also
means that when a single stream is present on a connection, a part of
the server's response time may appear in the %Ti measurement, but this
reflects the reality since nothing would prevent the client from using
the connection to fetch more objects. In addition this shows how long
it takes a client to find references to objects in an HTML page and
start to fetch them.

A different approach could have consisted in counting from the last time
the connection was left without any request (i.e. really idle), but this
would at least require a documentation change and it's not certain this
would provide a more useful information.

Thanks to Bart Butler and Luke Seelenbinder for reporting enough elements
to diagnose this issue.

This should be backported to 2.4.

(cherry picked from commit d0de6776826ee18da74e6949752e2f44cba8fdf2)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: h2/hpack: fix emission of HPACK DTSU after settings change

Sadly, despite particular care, commit 39a0a1e12 ("MEDIUM: h2/hpack: emit
a Dynamic Table Size Update after settings change") broke H2 when sending
DTSU. A missing negation on the flag caused the DTSU_EMITTED flag to be
lost and the DTSU to be sent again on the next stream, and possibly to
break flow control or a few other internal states.

This will have to be backported wherever the patch above was backported.

Thanks to Yves Lafon for notifying us with elements to reproduce the
issue!

(cherry picked from commit c7d85485a00bd9862ecb726ad1242c2ba724a8ca)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: peers: leave a bit more time to peers to synchronize

tls_basic_sync_wo_stkt_backend fails once every 200 runs for me. This
seems to be because the startup delay doesn't always allow peers to
perform a simultaneous connect, close and new attempt. With 3s I can't
see it fail anymore. In addition the long "delay 0.2" are still way too
much since we do not really care about the startup order in practice.

(cherry picked from commit c38200563645e314bd0af59958c75171cca59d4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: server: close an occasional race on dynamic_server_ssl.vtc

Sometimes when sending commands to shut down a server, haproxy complains
that some connections remain, this is because the server-side connection
might not always be completely released at the moment the client leaves
and the operation is emitted. While shutting down server sessions work,
it seems cleaner to just use "option httpclose" which releases the server
earlier and avoids the race.

This can be backported to 2.5.

(cherry picked from commit 42f2a511d32363121c667d51548d15204625301f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: spoe: properly detach all agents when releasing the applet

There's a bug in spoe_release_appctx() which checks the presence of items
in the wrong list rt[tid].agents to run over rt[tid].waiting_queue and
zero their spoe_appctx. The effect is that these contexts are not zeroed
and if spoe_stop_processing() is called, "sa->cur_fpa--" will be applied
to one of these recently freed contexts and will corrupt random memory
locations, as found at least in bugs #1494 and #1525.

This must be backported to all stable versions.

Many thanks to Christian Ruppert from Babiel for exchanging so many
useful traces over the last two months, testing debugging code and
helping set up a similar environment to reproduce it!

(cherry picked from commit b042e4f6f7dca655a337fc9ffe1a5e4f25440868)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: http/htx: prevent unbounded loop in http_manage_server_side_cookies

Ensure calls to http_find_header() terminate. If a "Set-Cookie2"
header is found then the while(1) loop in
http_manage_server_side_cookies() will never terminate, resulting in
the watchdog firing and the process terminating via SIGABRT.

The while(1) loop becomes unbounded because an unmatched call to
http_find_header("Set-Cookie") will leave ctx->blk=NULL. Subsequent
calls to check for "Set-Cookie2" will now enumerate from the beginning
of all the blocks and will once again match on subsequent
passes (assuming a match first time around), hence the loop becoming
unbounded.

This issue was introduced with HTX and this fix should be backported
to all versions supporting HTX.

Many thanks to Grant Spence (gspence@redhat.com) for working through
this issue with me.

(cherry picked from commit bfb15ab34ead85f64cd6da0e9fb418c9cd14cee8)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: httpclient/cli: display junk characters in vsn

ist are not ended by '\0', leading to junk characters being displayed
when using %s for printing the HTTP start line.

Fix the issue by replacing %s by %.*s + istlen.

Must be backported in 2.5.

(cherry picked from commit de6ecc3aceaf93612eb37ff5a913fcd8e741ea67)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: jwt: Memory leak if same key is used in multiple jwt_verify calls

If the same filename was specified in multiple calls of the jwt_verify
converter, we would have parsed the contents of the file every time it
was used instead of checking if the entry already existed in the tree.
This lead to memory leaks because we would not insert the duplicated
entry and we would not free it (as well as the EVP_PKEY it referenced).
We now check the return value of ebst_insert and free the current entry
if it is a duplicate of an existing entry.
The order in which the tree insert and the pkey parsing happen was also
switched in order to avoid parsing key files in case of duplicates.

Should be backported to 2.5.

(cherry picked from commit d544d33e10868963407212660466aa552d5888e6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: jwt: Missing pkey free during cleanup

When emptying the jwt_cert_tree during deinit, the entries are freed but
not the EVP_PKEY reference they kept, leading in a memory leak.

Should be backported in 2.5.

(cherry picked from commit 2b5a6559460b41dd6db2740cc961b461cef12edc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: jwt: Double free in deinit function

The node pointer was not moving properly along the jwt_cert_tree during
the deinit which ended in a double free during cleanup (or when checking
a configuration that used the jwt_verify converter with an explicit
certificate specified).

This patch fixes GitHub issue #1533.
It should be backported to 2.5.

(cherry picked from commit 4930c6c869fb875554d60ce9a2d6362cf16cf295)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl: Remove empty lines from "show ssl ocsp-response <id>" output

There were empty lines in the output of the CLI's "show ssl
ocsp-response <id>" command. The plain "show ssl ocsp-response" command
(without parameter) was already managed in commit
cc750efbc5c2180ed63b222a51029609ea96d0f7. This patch adds an extra space
to those lines so that the only existing empty lines actually mark the
end of the output. This requires to post-process the buffer filled by
OpenSSL's OCSP_RESPONSE_print function (which produces the output of the
"openssl ocsp -respin <ocsp.pem>" command). This way the output of our
command still looks the same as openssl's one.

Must be backported in 2.5.

(cherry picked from commit 2e7d1eb2a7dad61ee5661086005d3f85ee6ad6ba)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: httpclient: Xfer the request when the stream is created

Since the HTTP legacy mode was removed, it is unexpected to create an HTTP
stream without a valid request. Thanks to this change, the wait_for_request
analyzer was significatly simplified. And it is possible because HTTP
multiplexers already take care to have a valid request to create a stream.

But it means that any HTTP applet on the client side must do the same. The
httpclient client is one of them. And it is not a problem because the
request is generated before starting the applet. We must just take care to
set the right state.

For now it works "by chance", because the applet seems to be scheduled
before the stream itself. But if this change, this will lead to crash
because the stream expects to have a request when wait_for_request analyzer.

This patch should be backported to 2.5.

(cherry picked from commit 6ced61dd0a2056c4ed852163d6855808c07ffe8f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: httpclient: Revisit HC request and response buffers allocation

For now, these buffers are allocated when the httpclient is created and
freed when it is released. Usually, we try to avoid to keep buffer allocated
if it is not required. Empty buffers should be released ASAP. Apart for
that, there is no issue with the response side because a copy is always
performed. However, for the request side, a swap with the channel's buffer
is always performed. And there is no guarantee the channel's buffer is
allocated. Thus, after the swap, the httpclient can retrieve a null
buffer. In practice, this never happens. But this may change. And it will be
required for a futur fix.

So, now, we systematically take care to have an allocated buffer when we
want to write in it. And it is released as soon as it becomes empty.

This patch should be backported to 2.5.

(cherry picked from commit 600985df41c9777b6322f00e26dbf539f786d66f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: listener: read-lock the listener during accept()

Listeners might be disabled by other threads while running in
listener_accept() due to a stopping condition or possibly a rebinding
error after a failed stop/start. When this happens, the listener's FD
is -1 and accesses made by the lower layers to fdtab[-1] do not end up
well. This can occasionally be noticed if running at high connection
rates in master-worker mode when compiled with ASAN and hammered with
10 reloads per second. From time to time an out-of-bounds error will
be reported.

One approach could consist in keeping a copy of critical information
such as the FD before proceeding but that's not correct since in case of
close() the FD might be reassigned to another connection for example.

In fact what is needed is to read-lock the listener during this operation
so that it cannot change while we're touching it.

Tests have shown that using a spinlock only does generally work well but
it doesn't scale much with threads and we can see listener_accept() eat
10-15% CPU on a 24 thread machine at 300k conn/s. For this reason the
lock was turned to an rwlock by previous commit and this patch only takes
the read lock to make sure other operations do not change the listener's
state while threads are accepting connections. With this approach, no
performance loss was noticed at all and listener_accept() doesn't appear
in perf top.

This ought to be backported to about all branches that make use of the
unlocked listeners, but in practice it seems to mostly concern 2.3 and
above, since 2.2 and older will take the FD in the argument (and the
race exists there, this FD could end up being reassigned in parallel
but there's not much that can be done there to prevent that race; at
least a permanent error will be reported).

For backports, the current approach is preferred, with a preliminary
backport of previous commit "MINOR: listener: replace the listener's
spinlock with an rwlock". However if for any reason this commit cannot
be backported, the current patch can be modified to simply take a
spinlock (tested and works), it will just impact high performance
workloads (like DDoS protection).

(cherry picked from commit fed93d367c13c8c79e452837ec64389bc8061b5b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: listener: replace the listener's spinlock with an rwlock

We'll need to lock the listener a little bit more during accept() and
tests show that a spinlock is a massive performance killer, so let's
first switch to an rwlock for this lock.

This patch might have to be backported for the next patch to work, and
if so, the change is almost mechanical (look for LISTENER_LOCK), but do
not forget about the few HA_SPIN_INIT() in the file. There's no reference
to this lock outside of listener.c nor listener-t.h.

(cherry picked from commit 08b6f9645248405bb1cdec59c6aa4f35a8ba3add)
[wt: minor ctx adjustment]
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: fd: make sure we never try to insert/delete an impossible FD number

It's among the cases that would provoke memory corruption, let's add
some tests against negative FDs and those larger than the table. This
must never ever happen and would currently result in silent corruption
or a crash. Better have a noticeable one exhibiting the call chain if
that were to happen.

(cherry picked from commit 9aa324de2d9f69d74f5b30c33a78d3a38501342f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: does not erase the pidfile upon reload

When started in master-worker mode combined with daemon mode, HAProxy
will open() with O_TRUNC the pidfile when switching to wait mode.

In 2.5, it happens everytime after trying to load the configuration,
since we switch to wait mode.

In previous version this happens upon a failure of the configuration
loading.

Fixes bug #1545.

Must be backported in every supported branches.

(cherry picked from commit 7b820a6191736cdf4167d624b7bcab9957dd0697)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MAJOR: sched: prevent rare concurrent wakeup of multi-threaded tasks

Since the relaxation of the run-queue locks in 2.0 there has been a
very small but existing race between expired tasks and running tasks:
a task might be expiring and being woken up at the same time, on
different threads. This is protected against via the TASK_QUEUED and
TASK_RUNNING flags, but just after the task finishes executing, it
releases it TASK_RUNNING bit an only then it may go to task_queue().
This one will do nothing if the task's ->expire field is zero, but
if the field turns to zero between this test and the call to
__task_queue() then three things may happen:
  - the task may remain in the WQ until the 24 next days if it's in
    the future;
  - the task may prevent any other task after it from expiring during
    the 24 next days once it's queued
  - if DEBUG_STRICT is set on 2.4 and above, an abort may happen
  - since 2.2, if the task got killed in between, then we may
    even requeue a freed task, causing random behaviour next time
    it's found there, or possibly corrupting the tree if it gets
    reinserted later.

The peers code is one call path that easily reproduces the case with
the ->expire field being reset, because it starts by setting it to
TICK_ETERNITY as the first thing when entering the task handler. But
other code parts also use multi-threaded tasks and rightfully expect
to be able to touch their expire field without causing trouble. No
trivial code path was found that would destroy such a shared task at
runtime, which already limits the risks.

This must be backported to 2.0.

(cherry picked from commit 6c8babf6c4d8cb33ff0f5220177207610c22ffcb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: pools: replace the link pointer with the caller's address on pool_free()

Along recent evolutions of the pools, we've lost the ability to reliably
detect double-frees because while in the past the same pointer was being
used to chain the objects in the cache and to store the pool's address,
since 2.0 they're different so the pool's address is never overwritten on
free() and a double-free will rarely be detected.

This patch sets the caller's return address there. It can never be equal
to a pool's address and will help guess what was the previous call path.
It will not work on exotic architectures nor with very old compilers but
these are not the environments where we're trying to get detailed bug
reports, and this is not done by default anyway so we don't care about
this limitation. Note that depending on the inlining status of the
function, the result may differ but that's no big deal either.

A test by placing a double free of an appctx inside the release handler
itself successfully reported the trouble during appctx_free() and showed
that the return address was in stream_int_shutw_applet() (this one calls
the release handler).

(cherry picked from commit 27c8da1fd5e4f906fd3a9f96a4284c886b559305)
[wt: simplified for 2.5 and older, just use POOL_LINK()]
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: pools: let's add reverse mapping from cache heads to thread and pool

During global eviction we're visiting nodes from the LRU tail and we
determine their pool cache head and their pool. In order to make sure
we never mess up, let's add some backwards pointer to the thread number
and pool from the pool_cache_head. It's 64-byte aligned anyway so we're
not wasting space and it helps for debugging and will prevent memory
corruption the earliest possible.

(cherry picked from commit 49bb5d4268adaea64490ee20f4d73d94afe6a903)
[wt: context adjustment]
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: pools: add extra sanity checks when picking objects from a local cache

These few checks are added to make sure we never try to pick an object from
an empty list, which would have a devastating effect.

(cherry picked from commit e2830addda9baf42c682d7c1856a53e2875deea4)
[wt: adjusted context]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: pools: always flush pools about to be destroyed

When destroying a pool (e.g. at exit or when resizing buffers), it's
important to try to free all their local objects otherwise we can leave
some in the cache. This is particularly visible when changing "bufsize",
because "show pools" will then show two "trash" pools, one of which
contains a single object in cache (which is fortunately not reachable).
In all cases this happens while single-threaded so that's easy to do,
we just have to do it on the current thread.

The easiest way to do this is to pass an extra argument to function
pool_evict_from_local_cache() to force a full flush instead of a
partial one.

This can probably be backported to about all branches where this
applies, but at least 2.4 needs it.

(cherry picked from commit c895c441c7579db652e4ed976c14c2f5b2de0c0e)
[wt: context adjustments as 2.5 doesn't use pool_evict_last_items()]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: does not add the -sf in wait mode

Since the wait mode is automatically executed after charging the
configuration, -sf was shown in argv[] with the previous PID, which is
normal, but also the current one. This is only a visual problem when
listing the processes, because -sf does not do anything in wait mode.

Fix the issue by removing the whole "-sf" part in wait mode, but the
executed command can be seen in the argv[] of the latest worker forked.

Must be backported in 2.5.

(cherry picked from commit befab9ee4aa2dfe0405d7f1416991175be2fbe67)
[wla: in 2.5 we still need the -x in wait mode, because we don't know
anymore the stats socket in the configuration once we reexec in waitmode]
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MEDIUM: mworker: don't lose the stats socket on failed reload

In master mode, when the master reexecs itself, it passes the stats socket
address via "-x" on the command line. However, the address is first
retrieved from the configuration, and only later in the boot sequence
the one passed on the command line is considered. As such, some early
errors on reload like a missing config file will cause the process to
rexec itself with no stats socket address on the command line, preventing
the new master from retrieving previous listeners.

The right thing to do is to preset the value to the one found on the
command line so that it can be preserved across failed reloads.

There is no mainline equivalent for this commit because this code was
removed from 2.6, which now only uses the socket pair. The patch must
be backported to older versions where it applies.

REGTESTS: ssl: Fix ssl_errors regtest with OpenSSL 1.0.2

This test was broken with OpenSSL 1.0.2 after commit a996763619d
(BUG/MINOR: ssl: Store client SNI in SSL context in case of ClientHello
error) because it expected the default TLS version to be 1.3 in some
cases (when it can't be the case with OpenSSL 1.0.2).

(cherry picked from commit aab8d255bc0fcbcc50884a4be4f69598ee08fe73)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: pools: add new build option DEBUG_POOL_INTEGRITY

When enabled, objects picked from the cache are checked for corruption
by comparing their contents against a pattern that was placed when they
were inserted into the cache. Objects are also allocated in the reverse
order, from the oldest one to the most recent, so as to maximize the
ability to detect such a corruption. The goal is to detect writes after
free (or possibly hardware memory corruptions). Contrary to DEBUG_UAF
this cannot detect reads after free, but may possibly detect later
corruptions and will not consume extra memory. The CPU usage will
increase a bit due to the cost of filling/checking the area and for the
preference for cold cache instead of hot cache, though not as much as
with DEBUG_UAF. This option is meant to be usable in production.

(cherry picked from commit 0575d8fd760c6cd1de3d6ed66599d685a03c1873)
[wt: adjusted slightly since there is no batch refilling in 2.5; dropped
the API doc parts; tested with/without option and works fine]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: debug/cli: condition test of O_ASYNC to its existence

David Carlier reported a build breakage on Haiku since commit
5be7c198e ("DEBUG: cli: add a new "debug dev fd" expert command")
due to O_ASYNC not being defined. Ilya also reported it broke the
build on Cygwin. It's not that portable and sometimes defined as
O_NONBLOCK for portability. But here we don't even need that, as
we already condition other flags, let's just ignore it if it does
not exist.

(cherry picked from commit 410942b92a2687f690c886ae1fdc7ad69a3640e0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: cli: add a new "debug dev fd" expert command

This command will scan the whole file descriptors space to look for
existing FDs that are unknown to haproxy's fdtab, and will try to dump
a maximum number of information about them (including type, mode, device,
size, uid/gid, cloexec, O_* flags, socket types and addresses when
relevant). The goal is to help detecting inherited FDs from parent
processes as well as potential leaks.

Some of those listed are actually known but handled so deep into some
systems that they're not in the fdtab (such as epoll FDs or inter-
thread pipes). This might be refined in the future so that these ones
become known and do not appear.

Example of output:

$ socat - /tmp/sock1 <<< "expert-mode on;debug dev fd"

    0 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND
    1 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND
    2 type=tty. mod=0620 dev=0x8803 siz=0 uid=1000 gid=5 fs=0x16 ino=0x6 getfd=+0 getfl=O_RDONLY,O_APPEND
    3 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x18112348 getfd=+0
    4 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY
   33 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8251 getfd=+0 getfl=O_RDONLY
   34 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY
   36 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8d1b getfd=+0 getfl=O_RDONLY
   37 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY
   39 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24afa04f getfd=+0 getfl=O_RDONLY
   41 type=pipe mod=0600 dev=0 siz=0 uid=1000 gid=100 fs=0xc ino=0x24af8252 getfd=+0 getfl=O_RDONLY
   42 type=epol mod=0600 dev=0 siz=0 uid=0 gid=0 fs=0xd ino=0x3674 getfd=+0 getfl=O_RDONLY

(cherry picked from commit 5be7c198e5991d5b601af3e648bccfd6f345c673)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stream: make the call_rate only count the no-progress calls

We have an anti-looping protection in process_stream() that detects bugs
that used to affect a few filters like compression in the past which
sometimes forgot to handle a read0 or a particular error, leaving a
thread looping at 100% CPU forever. When such a condition is detected,
an alert it emitted and the process is killed so that it can be replaced
by a sane one:

  [ALERT]    (19061) : A bogus STREAM [0x274abe0] is spinning at 2057156
             calls per second and refuses to die, aborting now! Please
             report this error to developers [strm=0x274abe0,3 src=unix
             fe=MASTER be=MASTER dst=<MCLI> txn=(nil),0 txn.req=-,0
             txn.rsp=-,0 rqf=c02000 rqa=10000 rpf=88000021 rpa=8000000
             sif=EST,40008 sib=DIS,84018 af=(nil),0 csf=0x274ab90,8600
             ab=0x272fd40,1 csb=(nil),0
             cof=0x25d5d80,1300:PASS(0x274aaf0)/RAW((nil))/unix_stream(9)
             cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0) filters={}]
    call trace(11):
    |       0x4dbaab [c7 04 25 01 00 00 00 00]: stream_dump_and_crash+0x17b/0x1b4
    |       0x4df31f [e9 bd c8 ff ff 49 83 7c]: process_stream+0x382f/0x53a3
    (...)

One problem with this detection is that it used to only count the call
rate because we weren't sure how to make it more accurate, but the
threshold was high enough to prevent accidental false positives.

There is actually one case that manages to trigger it, which is when
sending huge amounts of requests pipelined on the master CLI. Some
short requests such as "show version" are sufficient to be handled
extremely fast and to cause a wake up of an analyser to parse the
next request, then an applet to handle it, back and forth. But this
condition is not an error, since some data are being forwarded by
the stream, and it's easy to detect it.

This patch modifies the detection so that update_freq_ctr() only
applies to calls made without CF_READ_PARTIAL nor CF_WRITE_PARTIAL
set on any of the channels, which really indicates that nothing is
happening at all.

This is greatly sufficient and extremely effective, as the call above
is still caught (shutr being ignored by an analyser) while a loop on
the master CLI now has no effect. The "call_rate" field in the detailed
"show sess" output will now be much lower, except for bogus streams,
which may help spot them. This field is only there for developers
anyway so it's pretty fine to slightly adjust its meaning.

This patch could be backported to stable versions in case of reports
of such an issue, but as that's unlikely, it's not really needed.

(cherry picked from commit 6c539c4b8c24718177e7ff38af0d186ec84608ea)
[wt: bringing this to 2.5 only for now so that power users have at
least a stable branch to upgrade to if they face this issue]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mcli: always realign wrapping buffers before parsing them

Pipelined commands easily result in request buffers to wrap, and the
master-cli parser only deals with linear buffers since it needs contiguous
keywords to look for in a list. As soon as a buffer wraps, some commands
are ignored and the parser is called in loops because the wrapped data
do not leave the buffer.

Let's take the easiest path that's already used at the HTTP layer, we
simply realign the buffer if its input wraps. This rarely happens anyway
(typically once per buffer), remains reasonably cheap and guarantees this
cannot happen anymore.

This needs to be backported as far as 2.0.

(cherry picked from commit a4e4d66f701b58da87cad0a2a90dea7e7553bc71)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mcli: do not try to parse empty buffers

When pcli_parse_request() is called with an empty buffer, it still tries
to parse it and can go on believing it finds an empty request if the last
char before the beginning of the buffer is a '\n'. In this case it overwrites
it with a zero and processes it as an empty command, doing nothing but not
making the buffer progress. This results in an infinite loop that is stopped
by the watchdog. For a reason related to another issue (yet to be fixed),
this can easily be reproduced by pipelining lots of commands such as
"show version".

Let's add a length check after the search for a '\n'.

This needs to be backported as far as 2.0.

(cherry picked from commit 6cd93f52e974aca4dac7aa2756c4e17659ae5354)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: cli: Never wait for more data on client shutdown

When a shutdown is detected on the cli, we try to execute all pending
commands first before closing the connection. It is required because
commands execution is serialized. However, when the last part is a partial
command, the cli connection is not closed, waiting for more data. Because
there is no timeout for now on the cli socket, the connection remains
infinitely in this state. And because the maxconn is set to 10, if it
happens several times, the cli socket quickly becomes unresponsive because
all its slots are waiting for more data on a closed connections.

This patch should fix the issue #1512. It must be backported as far as 2.0.

(cherry picked from commit 0f727dabf51d4ffead40fd43feb7c07193ebde99)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: h2/hpack: emit a Dynamic Table Size Update after settings change

As reported by @jinsubsim in github issue #1498, there is an
interoperability issue between nghttp2 as a client and a few servers
among which haproxy (in fact likely all those which do not make use
of the dynamic headers table in responses or which do not intend to
use a larger table), when reducing the header table size below 4096.
These are easily testable this way:

  nghttp -v -H":method: HEAD" --header-table-size=0 https://$SITE

It will result in a compression error for those which do not start
with an HPACK dynamic table size update opcode.

There is a possible interpretation of the H2 and HPACK specs that
says that an HPACK encoder must send an HPACK headers table update
confirming the new size it will be using after having acknowledged
it, because since it's possible for a decoder to advertise a late
SETTINGS and change it after transfers have begun, the initially
advertised value might very well be seen as a first change from the
initial setting, and the HPACK spec doesn't specify the side which
causes the change that triggers a DTSU update, which was essentially
summed up in this question from nghttp2's author when this issue
was already raised 6 years ago, but which didn't really find a solid
response by then:

  https://lists.w3.org/Archives/Public/ietf-http-wg/2015OctDec/0107.html

The ongoing consensus based on what some servers are doing and that aims
at limiting interoperability issues seems to be that a DTSU is expected
for each reduction from the current size, which should be reflected in
the next revision of the H2 spec:

  https://github.com/httpwg/http2-spec/pull/1005

Given that we do not make use of this table we can emit a DTSU of zero
before encoding any HPACK frame. However, some clients do not support
receiving DTSU with such values (e.g. VTest) so we cannot do it
inconditionnally!

The current patch aims at sticking as close to the spec as possible by
proceeding this way:
  - when a SETTINGS_HEADER_TABLE_SIZE is received, a flag is set
    indicating that the value changed
  - before sending any HPACK frame, this flag is checked to see if
    an update is wanted and if none was sent
  - in this case a DTSU of size zero is emitted and a flag is set
    to mention it was emitted so that it never has to be sent again

This addresses the problem with nghttp2 without affecting VTest.

More context is available here:
  https://github.com/nghttp2/nghttp2/issues/1660
  https://lists.w3.org/Archives/Public/ietf-http-wg/2021OctDec/0235.html

Many thanks to @jinsubsim for this report and participating to the issue
that led to an improvement of the H2 spec.

This should be backported to stable releases in a timely manner, ideally
as far as 2.4 once the h2spec update is merged, then to other versions
after a few months of observation or in case an issue around this is
reported.

(cherry picked from commit 39a0a1e12097a80d6707363792aee5caaa254f28)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands

Sending pipelined commands on the CLI using a semi-colon as a delimiter
has a cost that grows linearly with the buffer size, because co_getline()
is called for each word and looks up a '\n' in the whole buffer while
copying its contents into a temporary buffer.

This causes huge parsing delays, for example 3s for 100k "show version"
versus 110ms if parsed only once for a default 16k buffer.

This patch makes use of the new co_getdelim() function to support both
an LF and a semi-colon as delimiters so that it's no more needed to parse
the whole buffer, and that commands are instantly retrieved. We still
need to rely on co_getline() in payload mode as escapes and semi-colons
are not used there.

It should likely be backported where CLI processing speed matters, but
will require to also backport previous patch "MINOR: channel: add new
function co_getdelim() to support multiple delimiters". It's worth noting
that backporting it without "MEDIUM: cli: yield between each pipelined
command" would significantly increase the ratio of disconnections caused
by empty request buffers, for the sole reason that the currently slow
parsing grants more time to request data to come in. As such it would
be better to backport the patch above before taking this one.

(cherry picked from commit 0011c251448929180216761c760a552c81162226)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: channel: add new function co_getdelim() to support multiple delimiters

For now we have co_getline() which reads a buffer and stops on LF, and
co_getword() which reads a buffer and stops on one arbitrary delimiter.
But sometimes we'd need to stop on a set of delimiters (CR and LF, etc).

This patch adds a new function co_getdelim() which takes a set of delimiters
as a string, and constructs a small map (32 bytes) that's looked up during
parsing to stop after the first delimiter found within the set. It also
supports an optional escape character that skips a delimiter (typically a
backslash). For the rest it works exactly like the two other variants.

(cherry picked from commit c5143653177368a40f3153dcf79d145284a5495e)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: cli: yield between each pipelined command

Pipelining commands on the CLI is sometimes needed for batched operations
such as map deletion etc, but it causes two problems:
  - some possibly long-running commands will be run in series without
    yielding, possibly causing extremely long latencies that will affect
    quality of service and even trigger the watchdog, as seen in github
    issue #1515.

  - short commands that end on a buffer size boundary, when not run in
    interactive mode, will often cause the socket to be closed when
    the last command is parsed, because the buffer is empty.

This patch proposes a small change to this: by yielding in the CLI applet
after processing a command when there are data left, we significantly
reduce the latency, since only one command is executed per call, and
we leave an opportunity for the I/O layers to refill the request buffer
with more commands, hence to execute all of them much more often.

With this change there's no more watchdog triggered on long series of
"del map" on large map files, and the operations are much less disturbed.
It would be desirable to backport this patch to stable versions after some
period of observation in recent versions.

(cherry picked from commit fa7b4f6691646dc6bdebdb0d11c9cdf0382aeccb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: management: mark "set server ssl" as deprecated

This command was integrated in 2.4 when it was not possible to handle
SSL with dynamic servers, this is now possible so we should prefer this
way.

Must be backported in 2.5.

(cherry picked from commit 9998a33d3a027ff6863eab71bcc2f2d7158319b4)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl

While giving a fresh try to `set server ssl` (which I wrote), I realised
the behavior is a bit inconsistent. Indeed when using this command over
a server with ssl enabled for the data path but also for the health
check path we have:

- data and health check done using tls
- emit `set server be_foo/srv0 ssl off`
- data path and health check path becomes plain text
- emit `set server be_foo/srv0 ssl on`
- data path becomes tls and health check path remains plain text

while I thought the end result would be:
- data path and health check path comes back in tls

In the current code we indeed erase all connections while deactivating,
but restore only the data path while activating. I made this mistake in
the past because I was testing with a case where the health check plain
text by default.

There are several ways to solve this issue. The cleanest one would
probably be to avoid changing the health check connection when we use
`set server ssl` command, and create a new command `set server
ssl-check` to change this. For now I assumed this would be ok to simply
avoid changing the health check path and be more consistent.

This patch tries to address that and also update the documentation. It
should not break the existing usage with health check on plain text, as
in this case they should have `no-check-ssl` in defaults. Without this
patch, it makes the command unusable in an env where you have a list of
server to add along the way with initial `server-template`, and all
using tls for data and healthcheck path.

For 2.6 we should probably reconsider and add `set server ssl-check`
command for better granularity of cases.

If this solution is accepted, this patch should be backported up to >=
2.4.

The alternative solution was to restore the previous state, but I
believe this will create even more confusion in the future.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
(cherry picked from commit a087f878753d73042ca9a69649d3f2668489e81d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD/MINOR: fix solaris build with clang.

clang 9 sets the level to POSIX 2004.

(cherry picked from commit 4831db2f32077469472ebda03e176e6c77dc7472)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: httpclient/lua: don't pop the lua stack when getting headers

hlua_httpclient_table_to_hdrs() does a lua_pop(L, 1) at the end of the
function, this is supposed to be done in the caller and it is already be
done in hlua_httpclient_send().

This call has the consequence of poping the next parameter of the
httpclient, ignoring it.

This patch fixes the issue by removing the lua_pop(L, 1).

Must be backported in 2.5.

(cherry picked from commit 01e2be84d746c5203f7c5723bb6ca9e469026c3d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: httpclient: set default Accept and User-Agent headers

Some servers require at least an Accept and a User-Agent header in the
request. This patch sets some default value.

Must be backported in 2.5.

(cherry picked from commit bad9c8cac411f11144236abe8f9342f1058c9344)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: httpclient: don't send an empty body

Forbid the httpclient to send an empty chunked client when there is no
data to send. It does happen when doing a simple GET too.

Must be backported in 2.5.

(cherry picked from commit e1e045f4d71f85462414d048449c9307886f7c36)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: htx: Adjust length to add DATA block in an empty HTX buffer

htx_add_data() is able to partially consume data. However there is a bug
when the HTX buffer is empty. The data length is not properly
adjusted. Thus, if it exceeds the HTX buffer size, no block is added. To fix
the issue, the length is now adjusted first.

This patch must be backported as far as 2.0.

(cherry picked from commit 28e7ba86885d60a29036bc6fa3a89777148a3762)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: connection: properly leave stopping list on error

The stopping-list management introduced by commit d3a88c1c3 ("MEDIUM:
connection: close front idling connection on soft-stop") missed two
error paths in the H1 and H2 muxes. The effect is that if a stream
or HPACK table couldn't be allocated for these incoming connections,
we would leave with the connection freed still attached to the
stopping_list and it would never leave it, resulting in use-after-free
hence either a crash or a data corruption.

This is marked as medium as it only happens under extreme memory pressure
or when playing with tune.fail-alloc. Other stability issues remain in
such a case so that abnormal behaviors cannot be explained by this bug
alone.

This must be backported to 2.4.

(cherry picked from commit 3b990fe0bee3072cb70c965c202707cdad0f29cd)
Signed-off-by: Willy Tarreau <w@1wt.eu>

[RELEASE] Released version 2.5.1

Released version 2.5.1 with the following main changes :
    - BUG/MINOR: cache: Fix loop on cache entries in "show cache"
    - BUG/MINOR: httpclient: allow to replace the host header
    - BUG/MINOR: lua: don't expose internal proxies
    - BUG/MINOR: lua: remove loop initial declarations
    - BUG/MEDIUM: cli: Properly set stream analyzers to process one command at a time
    - BUILD: evports: remove a leftover from the dead_fd cleanup
    - BUG/MINOR: vars: Fix the set-var and unset-var converters
    - BUG/MINOR: server: Don't rely on last default-server to init server SSL context
    - BUG/MEDIUM: resolvers: Detach query item on response error
    - BUG/MAJOR: segfault using multiple log forward sections.
    - BUG/MEDIUM: h1: Properly reset h1m flags when headers parsing is restarted
    - BUG/MEDIUM: mworker: FD leak of the eventpoll in wait mode
    - BUG/MINOR: mworker: deinit of thread poller was called when not initialized
    - MINOR: mux-h1: Improve H1 traces by adding info about http parsers
    - BUILD: bug: Fix error when compiling with -DDEBUG_STRICT_NOCRASH
    - BUG/MEDIUM: sample: Fix memory leak in sample_conv_jwt_member_query
    - MINOR: cli: "show version" displays the current process version
    - BUILD: tree-wide: avoid warnings caused by redundant checks of obj_types
    - IMPORT: slz: use the correct CRC32 instruction when running in 32-bit mode
    - MINOR: http-rules: Add capture action to http-after-response ruleset
    - BUG/MINOR: cli/server: Don't crash when a server is added with a custom id
    - DOC: spoe: Clarify use of the event directive in spoe-message section
    - DOC: config: Specify %Ta is only available in HTTP mode
    - DOC: config: retry-on list is space-delimited
    - DOC: config: fix error-log-format example
    - BUG/MEDIUM: mworker/cli: crash when trying to access an old PID in prompt mode
    - MINOR: ssl: Remove empty lines from "show ssl ocsp-response" output
    - MINOR: pools: work around possibly slow malloc_trim() during gc
    - BUG/MEDIUM: backend: fix possible sockaddr leak on redispatch
    - BUG/MEDIUM: peers: properly skip conn_cur from incoming messages
    - BUG/MEDIUM: mux-h1: Fix splicing by properly detecting end of message
    - BUG/MINOR: mux-h1: Fix splicing for messages with unknown length
    - BUILD: ssl: unbreak the build with newer libressl
    - DOC: fix misspelled keyword "resolve_retries" in resolvers
    - DEBUG: ssl: make sure we never change a servername on established connections
    - BUILD: opentracing: display warning in case of using OT_USE_VARS at compile time
    - BUG/MEDIUM: ssl: initialize correctly ssl w/ default-server
    - REGTESTS: ssl: fix ssl_default_server.vtc
    - MINOR: compat: detect support for dl_iterate_phdr()
    - MINOR: debug: add ability to dump loaded shared libraries
    - MINOR: debug: add support for -dL to dump library names at boot
    - MINOR: proxy: add option idle-close-on-response
    - MINOR: cpuset: switch to sched_setaffinity for FreeBSD 14 and above.
    - BUILD: makefile: add -Wno-atomic-alignment to work around clang abusive warning
    - CI: Github Actions: do not show VTest failures if build failed
    - BUG/MINOR: ssl: free the fields in srv->ssl_ctx
    - BUG/MEDIUM: ssl: free the ckch instance linked to a server
    - REGTESTS: ssl: update of a crt with server deletion
    - BUILD/MINOR: cpuset FreeBSD 14 build fix.
    - CI: github actions: update OpenSSL to 3.0.1
    - BUILD/MINOR: tools: solaris build fix on dladdr.
    - BUG/MINOR: cli: fix _getsocks with musl libc
    - BUG/MEDIUM: http-ana: Preserve response's FLT_END analyser on L7 retry
    - BUG/MEDIUM: mworker: don't use _getsocks in wait mode
    - BUG/MINOR: ssl: Store client SNI in SSL context in case of ClientHello error
    - BUG/MAJOR: mux-h1: Don't decrement .curr_len for unsent data
    - BUILD: cpuset: fix build issue on macos introduced by previous change
    - CI: github actions: clean default step conditions

CI: github actions: clean default step conditions

step condition "if: ${{ !failure() }}" was added in 2ef4c7c84363f5a9b80a2093df1370514319db28
during my experiments. As Tim Düsterhus mentioned, that condition is default and may be omitted.

(cherry picked from commit 65eab587a2ce58f2c8b92d03568a33ee7dc45528)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: cpuset: fix build issue on macos introduced by previous change

The build on macos was broken by recent commit df91cbd58 ("MINOR: cpuset:
switch to sched_setaffinity for FreeBSD 14 and above."), let's move the
variable declaration inside the ifdef.

(cherry picked from commit bb10dad5a8321ecc437ef637385ee3a0e7c09747)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: mux-h1: Don't decrement .curr_len for unsent data

A regression was introduced by commit 140f1a58 ("BUG/MEDIUM: mux-h1: Fix
splicing by properly detecting end of message"). To detect end of the
outgoing message, when the content-length is announced, we count amount of
data already sent. But only data really sent must be counted.

If the output buffer is full, we can fail to send data (fully or
partially). In this case, we must take care to only count sent
data. Otherwise we may think too much data were sent and an internal error
may be erroneously reported.

This patch should fix issues #1510 and #1511. It must be backported as far
as 2.4.

(cherry picked from commit b4eca0e9087762ba7b8c36690001792ac454b7ea)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl: Store client SNI in SSL context in case of ClientHello error

If an error is raised during the ClientHello callback on the server side
(ssl_sock_switchctx_cbk), the servername callback won't be called and
the client's SNI will not be saved in the SSL context. But since we use
the SSL_get_servername function to return this SNI in the ssl_fc_sni
sample fetch, that means that in case of error, such as an SNI mismatch
with a frontend having the strict-sni option enabled, the sample fetch
would not work (making strict-sni related errors hard to debug).

This patch fixes that by storing the SNI as an ex_data in the SSL
context in case the ClientHello callback returns an error. This way the
sample fetch can fallback to getting the SNI this way. It will still
first call the SSL_get_servername function first since it is the proper
way of getting a client's SNI when the handshake succeeded.

In order to avoid memory allocations are runtime into this highly used
runtime function, a new memory pool was created to store those client
SNIs. Its entry size is set to 256 bytes since SNIs can't be longer than
255 characters.

This fixes GitHub #1484.

It can be backported in 2.5.

(cherry picked from commit a996763619de2d3e5883b9e097948a2290279d26)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mworker: don't use _getsocks in wait mode

Since version 2.5 the master is automatically re-executed in wait-mode
when the config is successfully loaded, puting corner cases of the wait
mode in plain sight.

When using the -x argument and with the right timing, the master will
try to get the FDs again in wait mode even through it's not needed
anymore, which will harm the worker by removing its listeners.

However, if it fails, (and it's suppose to, sometimes), the
master will exit with EXIT_FAILURE because it does not have the
MODE_MWORKER flag, but only the MODE_MWORKER_WAIT flag. With the
consequence of killing the workers.

This patch fixes the issue by restricting the use of _getsocks to some
modes.

This patch must be backported in every version supported, even through
the impact should me more harmless in version prior to 2.5.

(cherry picked from commit f82afbb9cd1515e9fa7f9e6df84904b832226304)
Signed-off-by: William Lallemand <wlallemand@haproxy.org>

BUG/MEDIUM: http-ana: Preserve response's FLT_END analyser on L7 retry

When a filter is attached on a stream, the FLT_END analyser must not be
removed from the response channel on L7 retry. It is especially important
because CF_FLT_ANALYZE flag is still set. This means the synchronization
between the two sides when the filter ends can be blocked. Depending on the
timing, this can freeze the stream infinitely or lead to a spinning loop.

Note that the synchronization between the two sides at the end of the
analysis was introduced because the stream was reused in HTTP between two
transactions. But, since the HTX was introduced, a new stream is created for
each transaction. So it is probably possible to remove this step for 2.2 and
higher.

This patch must be backported as far as 2.0.

(cherry picked from commit 7bf46bb9a972c1e9de50b31ce20811f2f59a6849)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: fix _getsocks with musl libc

In ticket #1413, the transfer of FDs couldn't correctly work on alpine
linux. After a few tests with musl on another distribution it seems to
be a limitation of this libc.

The number of FD that could be sent per sendmsg was set to 253, which
does not seem to work with musl, decreasing it 252 seems to work
better, so lets set this value everywhere since it does not have that
much impact.

This must be backported in every maintained version.

(cherry picked from commit 148d7a030156ca927e744013dd6fc32b473c1ef6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD/MINOR: tools: solaris build fix on dladdr.

dladdr takes a mutable address on this platform.

(cherry picked from commit ae5c42f4d0ce74af7554b9a94c8a3c43286efc35)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CI: github actions: update OpenSSL to 3.0.1

OpenSSL-3.0.1 was released on 14 Dec 2021, let's switch to it

(cherry picked from commit 874c907a2e5ffae50599b1f368e0488d8eb2fe4f)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD/MINOR: cpuset FreeBSD 14 build fix.

The 14th release started to introduce api compatibility layer with Linux
for the cpuset part and doing so irrevocably change the CPU* macros as well.

(cherry picked from commit f64504716843f69a9b10582b2d9cce6a7919795a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: ssl: update of a crt with server deletion

This test verifies that a certificate is in a "Unused" state once every
server which uses it are dynamically removed.

(cherry picked from commit acd546b07cd68537d173dde38342d86fadd706d1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: ssl: free the ckch instance linked to a server

This patch unlinks and frees the ckch instance linked to a server during
the free of this server.

This could have locked certificates in a "Used" state when removing
servers dynamically from the CLI. And could provoke a segfault once we
try to dynamically update the certificate after that.

This must be backported as far as 2.4.

(cherry picked from commit e69563fd8ed1a492ae4547451935908c0802e2c4)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl: free the fields in srv->ssl_ctx

A lot of free are missing in ssl_sock_free_srv_ctx(), this could result
in memory leaking when removing dynamically a server via the CLI.

This must be backported in every branches, by removing the fields that
does not exist in the previous branches.

(cherry picked from commit 231610ad9ccc2470930f7a728ba710a548677a65)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CI: Github Actions: do not show VTest failures if build failed

this is mostly cleanup, issue is minor. If build failed, VTest execution
tried to be performed as well as VTest result show. This change ignores
those steps if build failed.

(cherry picked from commit 2ef4c7c84363f5a9b80a2093df1370514319db28)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: makefile: add -Wno-atomic-alignment to work around clang abusive warning

As reported in github issue #1502, clang, when building for i386, will
try to use CMPXCHG8B-based loops for 64-bit atomic operations, and emits
warnings for all 64-bit operands that are not 64-bit aligned, an alignment
that is *not* required by the ABI, that the compiler itself does not
enforce, and that the intel SDM clearly says is not required on this
32-bit platform for this operation. But this is likely an excessive
outcome of the same code being used in 64-bit for CMPXCHG16B which does
require proper alignment. Firefox already gave up on this one 3 years
ago, let's not waste our time arguing and just shut up the warning
instead. It might hide some real bugs in the future but till now
experience showed that overall it's unlikely.

This should be backported to all maintained branches that use 64-bit
atomic ops (e.g. for counters).

Thanks to Brad Smith for reporting it and confirming that shutting the
warning addresses it.

(cherry picked from commit 790169fe69c139e4fb6fc3046985696aa78c0569)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: cpuset: switch to sched_setaffinity for FreeBSD 14 and above.

Following up previous update on cpuset-t.h. Ultimately, at some point
the cpuset_setaffinity code path could be removed.

(cherry picked from commit df91cbd5845eb7ccdd2586712cb62585fd55d7c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: proxy: add option idle-close-on-response

Avoid closing idle connections if a soft stop is in progress.

By default, idle connections will be closed during a soft stop. In some
environments, a client talking to the proxy may have prepared some idle
connections in order to send requests later. If there is no proper retry
on write errors, this can result in errors while haproxy is reloading.
Even though a proper implementation should retry on connection/write
errors, this option was introduced to support back compat with haproxy <
v2.4. Indeed before v2.4, we were waiting for a last request to be able
to add a "connection: close" header and advice the client to close the
connection.

In a real life example, this behavior was seen in AWS using the ALB in
front of a haproxy. The end result was ALB sending 502 during haproxy
reloads.
This patch was tested on haproxy v2.4, with a regular reload on the
process, and a constant trend of requests coming in. Before the patch,
we see regular 502 returned to the client; when activating the option,
the 502 disappear.

This patch should help fixing github issue #1506.
In order to unblock some v2.3 to v2.4 migraton, this patch should be
backported up to v2.4 branch.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
[wt: minor edits to the doc to mention other options to care about]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a9dd901143385fef3a5113d0bf1681cd0536357a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: add support for -dL to dump library names at boot

This is a second help to dump loaded library names late at boot, once
external code has already been initialized. The purpose is to provide
a format that makes it easy to pass to "tar" to produce an archive
containing the executable and the list of dependencies. For example
if haproxy is started as "haproxy -f foo.cfg", a config check only
will suffice to quit before starting, "-q" will be used to disable
undesired output messages, and -dL will be use to dump libraries.
This will result in such a command to trivially produce a tarball
of loaded libraries:

./haproxy -q -c -dL -f foo.cfg | tar -T - -hzcf archive.tgz

(cherry picked from commit 654726db5ab1160ad5dc8d356e2965e69c163dcf)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: debug: add ability to dump loaded shared libraries

Many times core dumps reported by users who experience trouble are
difficult to exploit due to missing system libraries. Sometimes,
having just a list of loaded libraries and their respective addresses
can already provide some hints about some problems.

This patch makes a step in that direction by adding a new "show libs"
command that will try to enumerate the list of object files that are
loaded in memory, relying on the dynamic linker for this. It may also
be used to detect that some foreign code embarks other undesired libs
(e.g. some external Lua modules).

At the moment it's only supported on glibc when USE_DL is set, but it's
implemented in a way that ought to make it reasonably easy to be extended
to other platforms.

(cherry picked from commit 6ab7b21a1108c0f187d09196dfc20d40f654e6c9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compat: detect support for dl_iterate_phdr()

We'll use this glibc function to dump loaded libs. It's been
available since glibc-2.2.4, and as it requires dlpi headers defined
in link.h, it implicitly relies on dlfcn, thus we condition it to
USE_DL. Other operating systems or libc might have different
dependencies so let's stick to the bare minimum for now.

(cherry picked from commit 3f3a56c9b07175f342f30c6271df24f783c892c6)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: ssl: fix ssl_default_server.vtc

Patch 2c776f1 ("BUG/MEDIUM: ssl: initialize correctly ssl w/
default-server") added tests that are not relevant anymore and broke the
reg-test. revert them.

(cherry picked from commit 0387632ac0db520402550cf19a96d41f8c1661de)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: ssl: initialize correctly ssl w/ default-server

This bug was introduced by d817dc73 ("MEDIUM: ssl: Load client
certificates in a ckch for backend servers") in which the creation of
the SSL_CTX for a server was moved to the configuration parser when
using a "crt" keyword instead of being done in ssl_sock_prepare_srv_ctx().

The patch 0498fa40 ("BUG/MINOR: ssl: Default-server configuration ignored by
server") made it worse by setting the same SSL_CTX for every servers
using a default-server. Resulting in any SSL option on a server applied
to every server in its backend.

This patch fixes the issue by reintroducing a string which store the
path of certificate inside the server structure, and loading the
certificate in ssl_sock_prepare_srv_ctx() again.

This is a quick fix to backport, a cleaner way can be achieve by always
creating the SSL_CTX in ssl_sock_prepare_srv_ctx() and splitting
properly the ssl_sock_load_srv_cert() function.

This patch fixes issue #1488.

Must be backported as far as 2.4.

(cherry picked from commit 2c776f1c30c85be11c9ba8ca8d9a7d62690d1a32)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: opentracing: display warning in case of using OT_USE_VARS at compile time

Please do not set the OT_USE_VARS configuration variable, as the source
will probably not be able to compile! For now, this variable can only
be used for experimental purposes, and is not intended for wider use.

For further clarification, please see commit 4cb2c83f4.

Must be backported to 2.5.

(cherry picked from commit 6c9f7faa62a00a4d028704d7b11e290c83f8a49d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DEBUG: ssl: make sure we never change a servername on established connections

Since this case was already met previously with commit 655dec81b
("BUG/MINOR: backend: do not set sni on connection reuse"), let's make
sure that we don't change reused connection settings. This could be
generalized to most settings that are only in effect before the handshake
in fact (like set_alpn and a few other ones).

(cherry picked from commit 77bfa66124e8532a4ecbe5025657574bb43b7160)
[wt: it's not enabled in default builds thus is safe to backport; it may
help sort out certain future bug reports in field]
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: fix misspelled keyword "resolve_retries" in resolvers

"resolve_retries" was spelled "resolve_retires".

(cherry picked from commit 55c40ea17778b1afa3cf8f0f0a2cc42717c9364a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUILD: ssl: unbreak the build with newer libressl

In LibreSSL 3.5.0, BIO is going to become opaque, so haproxy's
compat macros will no longer work. The functions they substitute
have been available since LibreSSL 2.7.0.

(cherry picked from commit d1a2e2b0d1da0dff726738343fbaed044fb93470)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-h1: Fix splicing for messages with unknown length

Splicing was disabled fo Messages with an unknown length (no C-L or T-E
header) with no valid reason. So now, it is possible to use the kernel
splicing for such messages.

This patch should be backported as far as 2.4.

(cherry picked from commit f5ce320156c1a1233755440dad766e7b2b401c0a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: mux-h1: Fix splicing by properly detecting end of message

Since the 2.4.4, the splicing support in the H1 multiplexer is buggy because
end of the message is not properly detected.

On the 2.4, when the requests is spliced, there is no issue. But when the
response is spliced, the client connection is always closed at the end of the
message. Note the response is still fully sent.

On the 2.5 and higher, when the last requests on a connection is spliced, a
client abort is reported. For other requests there is no issue. In all cases,
the requests are fully sent. When the response is spliced, the server connection
hangs till the server timeout and a server abort is reported. The response is
fully sent with no delay.

The root cause is the EOM block suppression. There is no longer extra block to
be sure to call a last time rcv_buf()/snd_buf() callback functions. At the end,
to fix the issue, we must now detect end of the message in rcv_pipe() and
snd_pipe() callback functions. To do so, we rely on the announced message length
to know when the payload is finished. This works because the chunk-encoded
messages are not spliced.

This patch must be backported as far as 2.4 after an observation period.

(cherry picked from commit 140f1a585248a7da1fefbb5f20f260dfafb32a9a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: peers: properly skip conn_cur from incoming messages

The approach used for skipping conn_cur in commit db2ab8218 ("MEDIUM:
stick-table: never learn the "conn_cur" value from peers") was wrong,
it only works with simple tables but as soon as frequency counters or
arrays are exchanged after conn_cur, the stream is desynchronized and
incorrect values are read. This is because the fields have a variable
length depending on their types and cannot simply be skipped by a
"continue" statement.

Let's change the approach to make sure we continue to completely parse
these local-only fields, and only drop the value at the moment we're
about to store them, since this is exactly the intent.

A simpler approach could consist in having two sets of stktable_data_ptr()
functions, one for retrieval and one for storage, and to make the store
function return a NULL pointer for local types. For now this doesn't
seem worth the trouble.

This fixes github issue #1497. Thanks to @brenc for the reproducer.

This must be backported to 2.5.

(cherry picked from commit b4ff6f4ae9267620827f7da9b519f4e1b28b10e9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: backend: fix possible sockaddr leak on redispatch

A subtle change of target address allocation was introduced with commit
68cf3959b ("MINOR: backend: rewrite alloc of stream target address") in
2.4. Prior to this patch, a target address was allocated by function
assign_server_address() only if none was previously allocated. After
the change, the allocation became unconditional. Most of the time it
makes no difference, except when we pass multiple times through
connect_server() with SF_ADDR_SET cleared.

The most obvious fix would be to avoid allocating that address there
when already set, but the root cause is that since introduction of
dynamically allocated addresses, the SF_ADDR_SET flag lies. It can
be cleared during redispatch or during a queue redistribution without
the address being released.

This patch instead gives back all its correct meaning to SF_ADDR_SET
and guarantees that when not set no address is allocated, by freeing
that address at the few places the flag is cleared. The flag could
even be removed so that only the address is checked but that would
require to touch many areas for no benefit.

The easiest way to test it is to send requests to a proxy with l7
retries enabled, which forwards to a server returning 500:

  defaults
    mode http
    timeout client 1s
    timeout server 1s
    timeout connect 1s
    retry-on all-retryable-errors
    retries 1
    option redispatch

  listen proxy
    bind *:5000
    server app 0.0.0.0:5001

  frontend dummy-app
    bind :5001
    http-request return status 500

Issuing "show pools" on the CLI will show that pool "sockaddr" grows
as requests are redispatched, and remains stable with the fix. Even
"ps" will show that the process' RSS grows by ~160B per request.

This fix will need to be backported to 2.4. Note that before 2.5,
there's no strm->si[1].dst, strm->target_addr must be used instead.

This addresses github issue #1499. Special thanks to Daniil Leontiev
for providing a well-documented reproducer.

(cherry picked from commit 266d5405490050adeaf414158f7f4b9bad5298bc)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pools: work around possibly slow malloc_trim() during gc

During 2.4-dev, support for malloc_trim() was implemented to ease
release of memory in a stopping process. This was found to be quite
effective and later backported to 2.3.7.

Then it was found that sometimes malloc_trim() could take a huge time
to complete it if was competing with other threads still allocating and
releasing memory, reason why it was decided in 2.5-dev to move
malloc_trim() under the thread isolation that was already in place in
the shared pool version of pool_gc() (this was commit 26ed1835).

However, other instances of pool_gc() that used to call malloc_trim()
were not updated since they were not using thread isolation. Currently
we have two other such instances, one for when there is absolutely no
pool and one for when there are only thread-local pools.

Christian Ruppert reported in GH issue #1490 that he's sometimes seeing
and old process die upon reload when upgrading from 2.3 to 2.4, and
that this happens inside malloc_trim(). The problem is that since
2.4-dev11 with commit 0bae07592 we detect modern libc that provide a
faster thread-aware allocator and do not maintain shared pools anymore.
As such we're using again the simpler pool_gc() implementations that do
not use thread isolation around the malloc_trim() call.

All this code was cleaned up recently and the call moved to a new
function trim_all_pools(). This patch implements explicit thread isolation
inside that function so that callers do not have to care about this
anymore. The thread isolation is conditional so that this doesn't affect
the one already in place in the larger version of pool_gc(). This way it
will solve the problem for all callers.

This patch must be backported as far as 2.3. It may possibly require
some adaptations. If trim_all_pools() is not present, copy-pasting the
tests in each version of pool_gc() will have the same effect.

Thanks to Christian for his detailed report and his testing.

(cherry picked from commit 0d93a8186331972d9c60624ed822e7c2c008d1d4)
[wt: dropped the 2.6-specific mallctl() context]
Signed-off-by: Willy Tarreau <w@1wt.eu>