Repositories - haproxy-2.3.git/log

BUG/MAJOR: pools: fix incomplete backport of lockless pool fix

Commit bc76411e0 ("BUG/MAJOR: pools: fix possible race with free() in
the lockless variant") wasn't complete. The __pool_free() function also
needed a part of the lockless variant of the code that was in
pool_put_to_shared_cache() in 2.4. Without it, a __pool_free() during
a contented pool_alloc() may catch free_list == POOL_BUSY and copy it
into the updated free list, so that the next call to __pool_get_first()
loops forever and gets killed by the watchdog.

Sadly it requires load under contention on a non-glibc system to notice
it :-/

This fix is only for 2.3 and 2.2 since 2.2 now also contains the faulty
backport of the patch above.

[RELEASE] Released version 2.3.11

Released version 2.3.11 with the following main changes :
    - BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
    - BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
    - BUG/MINOR: applet: Notify the other side if data were consumed by an applet
    - BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
    - BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
    - BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
    - BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
    - BUG/MEDIUM: peers: reset commitupdate value in new conns
    - BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
    - BUG/MEDIUM: peers: reset tables stage flags stages on new conns
    - MINOR: peers: add informative flags about resync process for debugging
    - REGTESTS: add minimal CLI "add map" tests
    - BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
    - BUG/MEDIUM: cli: prevent memory leak on write errors
    - BUG/MINOR: ssl/cli: fix a lock leak when no memory available
    - MINOR: compat: automatically include malloc.h on glibc
    - MEDIUM: pools: call malloc_trim() from pool_gc()
    - MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS
    - MINOR: debug: add a new "debug dev sym" command in expert mode
    - BUG/MEDIUM: dns: reset file descriptor if send returns an error
    - BUG/MINOR: stream: Decrement server current session counter on L7 retry
    - BUG/MINOR: stream: properly clear the previous error mask on L7 retries
    - BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
    - BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
    - BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
    - BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port
    - MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
    - BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
    - MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
    - MEDIUM: mux-h1: Don't block reads when waiting for the other side
    - BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
    - REGTESTS: Add script to test abortonclose option
    - BUG/MEDIUM: ebtree: Invalid read when looking for dup entry
    - BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
    - BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter
    - BUG/MINOR: http-comp: Preserve HTTP_MSGF_COMPRESSIONG flag on the response
    - BUG/MINOR: http-ana: Handle L7 retries on refused early data before K/A aborts
    - BUG/MINOR: server: Missing calloc return value check in srv_parse_source
    - BUG/MINOR: peers: Missing calloc return value check in peers_register_table
    - BUG/MINOR: ssl: Missing calloc return value check in ssl_init_single_engine
    - BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture
    - BUG/MINOR: proxy: Missing calloc return value check in proxy_parse_declare
    - BUG/MINOR: proxy: Missing calloc return value check in proxy_defproxy_cpy
    - BUG/MINOR: http: Missing calloc return value check while parsing tcp-request/tcp-response
    - BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule
    - BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo
    - BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list
    - BUG/MINOR: http: Missing calloc return value check while parsing redirect rule
    - BUG/MINOR: http: Missing calloc return value check in make_arg_list
    - BUG/MINOR: proxy: Missing calloc return value check in chash_init_server_tree
    - BUG/MINOR: lua/vars: prevent get_var() from allocating a new name
    - DOC/MINOR: move uuid in the configuration to the right alphabetical order
    - BUG/MAJOR: stream-int: Release SI endpoint on server side ASAP on retry
    - DOC: use the req.ssl_sni in examples
    - BUILD: make tune.ssl.keylog available again
    - BUG/MINOR: ssl: OCSP stapling does not work if expire too far in the future
    - BUG/MEDIUM: compression: Add a flag to know the filter is still processing data
    - BUG/MINOR: pools: fix a possible memory leak in the lockless pool_flush()
    - MINOR: pools: do not maintain the lock during pool_flush()
    - MINOR: pools: call malloc_trim() under thread isolation
    - MEDIUM: pools: use a single pool_gc() function for locked and lockless
    - BUG/MAJOR: pools: fix possible race with free() in the lockless variant
    - CLEANUP: pools: remove now unused seq and pool_free_list
    - BUG/MAJOR: htx: Fix htx_defrag() when an HTX block is expanded
    - BUG/MINOR: mux-fcgi: Expose SERVER_SOFTWARE parameter by default
    - DOC: lua: Add a warning about buffers modification in HTTP
    - BUG/MINOR: stick-table: insert srv in used_name tree even with fixed id
    - BUG/MEDIUM: shctx: use at least thread-based locking on USE_PRIVATE_CACHE
    - BUG/MINOR: ssl: use atomic ops to update global shctx stats
    - BUG/MINOR: mworker: fix typo in chroot error message
    - BUG/MAJOR: queue: set SF_ASSIGNED when setting strm->target on dequeue
    - MINOR: backend: only skip LB when there are actual connections
    - BUG/MINOR: stats: make "show stat typed desc" work again
    - MINOR: mux-h2: obey http-ignore-probes during the preface
    - BUG/MEDIUM: dns: send messages on closed/reused fd if fd was detected broken
    - BUILD: cfgparse-ssl: Remove const from defpx param in keylog parsing function
    - BUG/MINOR: resolvers: answser item list was randomly purged or errors
    - MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item
    - MEDIUM: resolvers: add a ref between servers and srv request or used SRV record
    - BUG/MAJOR: resolvers: segfault using server template without SRV RECORDs
    - BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI
    - BUG/MINOR: server/cli: Fix locking in function processing "set server" command
    - BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
    - REGTESTS: fix maxconn update with agent-check
    - MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules
    - DOC: config: Add missing actions in "tcp-request session" documentation
    - BUG/MINOR: checks: return correct error code for srv_parse_agent_check
    - BUG/MINOR: tcpcheck: Fix numbering of implicit HTTP send/expect rules
    - BUG/MEDIUM: sock: make sure to never miss early connection failures
    - BUG/MINOR: cli: fix server name output in "show fd"
    - BUG/MINOR: stick-table: fix several printf sign errors dumping tables
    - DOC: stick-table: add missing documentation about gpt0 stored type
    - DOC: peers: fix the protocol tag name in the doc
    - DOC: config: use CREATE USER for mysql-check
    - BUG/MINOR: server-state: load SRV resolution only if params match the config
    - BUG/MINOR: server: Forbid to set fqdn on the CLI if SRV resolution is enabled
    - MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
    - MINOR: resolvers: Remove server from named_servers tree when removing a SRV item
    - BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status
    - BUG/MINOR: resolvers: Use resolver's lock in resolv_srvrq_expire_task()
    - BUG/MINOR: resolvers: Always attach server on matching record on resolution
    - BUG/MINOR: resolvers: Reset server IP when no ip is found in the response
    - MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()
    - BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution
    - BUG/MINOR: peers: fix data_type bit computation more than 32 data_types
    - Revert "MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules"

Revert "MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules"

This reverts commit 19bbbe05629ea947dd60d5b96d96f0066b047b97.

For now, set-src/set-src-port actions are directly performed on the client
connection. Using these actions at the stream level is really a problem with
HTTP connection (See #90) because all requests are affected by this change
and not only the current request. And it is worse with the H2, because
several requests can set their source address into the same connection at
the same time.

It is already an issue when these actions are called from "http-request"
rules. It is safer to wait a bit before adding the support to "tcp-request
content" rules. The solution is to be able to set src/dst address on the
stream and not on the connection when the action if performed from the L7
level..

Reverting the above commit means the issue #1303 is no longer fixed.

This patch must be backported in all branches containing the above commit
(as far as 2.0 for now).

(cherry picked from commit 23048875a4eacf5d7d4450d677cb077e67778b95)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 390f49477159de53d0506cd52bd6ed323febde0a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: peers: fix data_type bit computation more than 32 data_types

This patch fixes the computation of the bit of the current data_type
in some part of code of peer protocol where the computation is limited
to 32bits whereas the bitfield of data_types can support 64bits.

Without this patch it could result in bugs when we will define more
than 32 data_types.

Backport is useless because there is currently less than 32 data_types

(cherry picked from commit 08b0f6780c45099b8d03bfd9e398d3f51519e667)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 45f12b06b491b433fdb951ba851284e46901a917)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution

The commit 3406766d5 ("MEDIUM: resolvers: add a ref between servers and srv
request or used SRV record") introduced a regression. The first server of a
template based on SRV record is no longer resolved. The same bug exists for
a normal server based on a SRV record.

In fact, the server used during parsing (used as reference when a
server-template line is parsed) is never attached to the corresponding srvrq
object. Thus with following lines, no resolution is performed because
"srvrq->attached_servers" is empty:

server-template test 1 _http.domain.tld resolvers dns ...
server test1 _http.domain.tld resolvers dns ...

This patch should fix the issue #1295 (but not confirmed yet it is the same
bug). It must be backported everywhere the above commit is.

(cherry picked from commit 81ba74ae5061bc55b3460f01a99371281de8f7f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 08736f990ed0387fa9c2340b1520c679fd429921)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()

If resolv_get_ip_from_response() returns an error (or an unexpected return
value), the server is set to RMAINT status. However, its address must also
be reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing. Note that it is a theorical patch because
this code path does not exist. Thus it is not tagged as a BUG.

This patch may be backported as far as 2.0.

(cherry picked from commit 07ecff589d5e67eacfacfe62bccd70ea825b8bc0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit b142fb40a4ce7566ee2ca42fb6b164f9402bbdcd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: resolvers: Reset server IP when no ip is found in the response

For A/AAAA resolution, if no ip is found for a server in the response, the
server is set to RMAINT status. However, its address must also be
reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing.

This patch may be backported as far as 2.0.

(cherry picked from commit a8ce497aacd6871c6056baf5a30e04f021956a5c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 7c129bba960aebec175b3fcf2d0ffa52d584552f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: resolvers: Always attach server on matching record on resolution

On A/AAAA resolution, for a given server, if a record is matching, we must
always attach the server to this record. Before it was only done if the
server IP was not the same than the record one. However, it is a problem if
the server IP was not set for a previous resolution. From the libc during
startup for instance. In this case, the server IP is not updated and the
server is not attached to any record. It remains in this state while a
matching record is found in the DNS response. It is especially a problem
when the resolution is used for server-templates.

This bug was introduced by the commit bd78c912f ("MEDIUM: resolvers: add a
ref on server to the used A/AAAA answer item").

This patch should solve the issue #1305. It must be backported to all
versions containing the above commit.

(cherry picked from commit d7bb23490c1a71542c7e98b7c44d92df081c7e52)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 2558d5aac1147b6e5c028152fc5382492dc6807c)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: resolvers: Use resolver's lock in resolv_srvrq_expire_task()

The commit dcac41806 ("BUG/MEDIUM: resolvers: Add a task on servers to check
SRV resolution status") introduced a type. In resolv_srvrq_expire_task()
function, the resolver's lock must be used instead of the resolver itself.

This patch must be backported with the patch above (at least as far as 2.2).

(cherry picked from commit e886dd5c32aa1bc99acf48c5bb1826f80d96923d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 152b50c38ab7ef1829d9a0115c393f518860314f)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status

When a server relies on a SRV resolution, a task is created to clean it up
(fqdn/port and address) when the SRV resolution is considered as outdated
(based on the resolvers 'timeout' value). It is only possible if the server
inherits outdated info from a state file and is no longer selected to be
attached to a SRV item. Note that most of time, a server is attached to a
SRV item. Thus when the item becomes obsolete, the server is cleaned
up.

It is important to have such task to be sure the server will be free again
to have a chance to be resolved again with fresh information. Of course,
this patch is a workaround to solve a design issue. But there is no other
obvious way to fix it without rewritting all the resolvers part. And it must
be backportable.

This patch relies on following commits:
* MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
* MINOR: resolvers: Remove server from named_servers tree when removing a SRV item

All the series must be backported as far as 2.2 after some observation
period. Backports to 2.0 and 1.8 must be evaluated.

(cherry picked from commit dcac41806239d4f3b2de5e4d1bacf9c03ca99efb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 2c0f527b61837bfcd3c1a86c175778a1d5ba2e2d)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: resolvers: Remove server from named_servers tree when removing a SRV item

When a server is cleaned up because the corresponding SRV item is removed,
we always remove the server from the srvrq's name_servers tree. For now, it
is useless because, if a server was attached to a SRV item, it means it was
already removed from the tree. But it will be mandatory to fix a bug.

(cherry picked from commit 73001ab6e39fb25ff406bbcd3809d0a1fa0131b5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit a62d9743fd4efaab5e08c94229af390e4afcd17a)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: resolvers: Clean server in a dedicated function when removing a SRV item

A dedicated function is now used to clean up servers when a SRV item becomes
obsolete or when a requester is removed from a resolution. This patch is
mandatory to fix a bug.

(cherry picked from commit 11c6c396560107ac892a553bfa0dada222823b1c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 5f8ccff67f98efdf8b3e6cb2c02593e4b4aa5b7e)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server: Forbid to set fqdn on the CLI if SRV resolution is enabled

If a server is configured to rely on a SRV resolution, we must forbid to
change its fqdn on the CLI. Indeed, in this case, the server retrieves its
fqdn from the SRV resolution. If the fqdn is changed via the CLI, this
conflicts with the SRV resolution and leaves the server in an undefined
state. Most of time, the SRV resolution remains enabled with no effect on
the server (no update). Some time the A/AAAA resolution for the new fqdn is
not enabled at all. It depends on the server state and resolver state when
the CLI command is executed.

This patch must be backported as far as 2.0 (maybe to 1.8 too ?) after some
observation period.

(cherry picked from commit a386e7882378a4bd3afdd2915fefe4bcc9fa286d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 5b04f4634308f57d9d03f798aaea0c7667a6d1d7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server-state: load SRV resolution only if params match the config

When the state of a server is loaded, if there is no hostname defined for
this server and if a fqdn and a server record are retrieved from the state
file, it means the server should rely on a SRV resolution. But we must be
sure the server is configured this way. A SRV resolution must be configured
with the same SRV record. This part must be skipped if there is no SRV
resolution configured for this server or if the SRV record used is not the
same.

This patch should be backported as far as 1.8 after some observation period.

(cherry picked from commit 85af93b8c74b5664fc473cf84d59ce6279249216)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit ccb720d3faf0847eefbd1905130fef383efb1e4c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: use CREATE USER for mysql-check

CREATE USER has been the standard way of creating users since
MySQL-5.0 (2005).

The current syntax of INSERT INTO mysql.user won't actually work
on MariaDB-10.4+.

Because haproxy doesn't use any resources the MySQL executable comment
syntax provides resource contraints to make it more palatable
to risk adverse users.

/*!50701 is a syntax recognised by MySQL and MariaDB 5.7.1+ when
resource contraints where added.

/*M!100201 is a MariaDB executable comment syntax recognised for MariaDB
for the 10.2.1 where the MAX_STATEMENT_TIME was added.

This patch may be backported as far as 2.0.

(cherry picked from commit d3e7dc498baeab3535fcaf48f8983138d35442f5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 9dc310dc520fb5fbc9dbd9594f7cfaa1472372fe)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: peers: fix the protocol tag name in the doc

The peers protocol has been using "HAProxyS" as a binary tag sent on the
wire since day one in 1.5-dev3 with commit 2b920a1af ("[MAJOR] Add new
files src/peer.c, include/proto/peers.h and include/types/peers.h for
sync stick table management"), regardless, the doc says the protocol
identifier is "HaproxyS". It is likely this got fixed in the code before
merging and not in the doc.

This should be backported to any release as the doc is wrong.

(cherry picked from commit 84c4e7451eb99261758f3e009d3e938506dd8982)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: stick-table: add missing documentation about gpt0 stored type

The store type 'gpt0' was present in code but was not documented.

The patch fix this and should be backported since 'gpt0' is supported.
[wt: ~1.6-dev4 hence all stable]

(cherry picked from commit 1a6b7254de849765bef34f7b65b92ff63bcea1ee)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a5d1531dc047f7aa3e672411b63e0aca708b109d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stick-table: fix several printf sign errors dumping tables

This patch fixes several errors printing integers
of stick table entry values and args during dump on cli.

This patch should be backported since the dump of entries
is supported. [wt: roughly 1.5-dev1 hence all stable branches]

(cherry picked from commit 01928ae56b0296e5312eca195a4e155d6e8d79d0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit c24b414034d4b7f848ba37ac0a6fe28271375b4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: cli: fix server name output in "show fd"

A server name was displayed as <srv>/<proxy> instead of the reverse.
It only confuses diagnostics. This was introduced by commit 7a4a0ac71
("MINOR: cli: add a new "show fd" command") so this fix can be backport
down to 1.8.

(cherry picked from commit dfb34a8f87cacc1ced48f555ab74b6630ce4bb08)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit ab40792af6b81e45e632b5c14d396aba4efb4cde)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: sock: make sure to never miss early connection failures

As shown in issue #1251, it is possible for a connect() to report an
error directly via the poller without ever reporting send readiness,
but currentlt sock_conn_check() manages to ignore that situation,
leading to high CPU usage as poll() wakes up on these FDs.

The bug was apparently introduced in 1.5-dev22 with commit fd803bb4d
("MEDIUM: connection: add check for readiness in I/O handlers"), but
was likely only woken up by recent changes to conn_fd_handler() that
made use of wakeups instead of direct calls between 1.8 and 1.9,
voiding any chance to catch such errors in the early recv() callback.

The exact sequence that leads to this situation remains obscure though
because the poller does not report send readiness nor does it report an
error. Only HUP and IN are reported on the FD. It is also possible that
some recent kernel updates made this condition appear while it never
used to previously.

This needs to be backported to all stable branches, at least as far
as 2.0. Before 2.2 the code was in tcp_connect_probe() in proto_tcp.c.

(cherry picked from commit 5a9c637bf3f9daf595d5a5cd0e98961d6fdc4b1b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 9547338fb4a4dda6aee428ee40aacf09ea2607b2)
[wt: code is in conn_fd_check() in 2.3]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: tcpcheck: Fix numbering of implicit HTTP send/expect rules

The index of the failing rule is reported in the health-check log message. The
rules index is also used in the check traces. But for implicit HTTP send/expect
rules, the index is wrong. It must be incremented by one compared to the
preceding rule.

This patch may be backported as far as 2.2.

(cherry picked from commit fa5880bd539b9069fd1b139d32cffae2bb3c2c3c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit b3cb3221e5dabc73deec942a3475ef21289853bd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: checks: return correct error code for srv_parse_agent_check

In srv_parse_agent_check the error code is not returned in case
something goes wrong. The value 0 is always return.

Additionally, there's a small cleanup of unreachable returns that in
most checks are not present either and removed in two places they were
present. This makes the code consistent across the different checks.

(cherry picked from commit dfee217b68a241f2723c2a4b67d2af187f0b6690)
[cf: Must be backported as far as 2.2]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 1065e7edb7bd1b536bd3de6531463d16000d4956)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: config: Add missing actions in "tcp-request session" documentation

set-src/set-src-port and set-dst/set-dst-port actions were not listed in the
documentation of "tcp-request session".

This patch may be backported to all stable versions.

(cherry picked from commit 14aec6e8aeee7e31acb05cc546b8631e30cd129b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 57759f3c00c5f782255c74e58df2b45a7d4c02b0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules

If it possible to set source IP/Port from "tcp-request connection",
"tcp-request session" and "http-request" rules but not from "tcp-request
content" rules. There is no reason for this limitation and it may be a
problem for anyone wanting to call a lua fetch to dynamically set source
IP/Port from a TCP proxy. Indeed, to call a lua fetch, we must have a
stream. And there is no stream when "tcp-request connection/session" rules
are evaluated.

Thanks to this patch, "set-src" and "set-src-port" action are now supported
by "tcp_request content" rules.

This patch is related to the issue #1303. It may be backported to all stable
versions.

(cherry picked from commit 19bbbe05629ea947dd60d5b96d96f0066b047b97)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 40f95c49bbbc79f99fe4d1554c092d0764606a46)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: fix maxconn update with agent-check

Correct the typo in the parameter used to update the 'maxconn' via
agent-check. The test is also completed to detect the update of maxconn
using CLI 'show stats'.

(cherry picked from commit e500e593a79e3797eada5a839aeee017b002d7bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit c16c6dee710dc1e729c47de7e23d48a627749111)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check

The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :

  commit 79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6
  BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'

This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.

This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
  server_parse_maxconn_change_request must again be called with the
  server lock.

- to counter the deadlock fixed by the above commit, process_srv_queue
  now takes an argument to render the server locking optional if the
  caller already held it. This is only used by
  server_parse_maxconn_change_request.

The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.

(cherry picked from commit 0274286dd3768c0d5e58588a1cb7e7e710fbc9d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit caaafd055edcbceb3d57ed9672b90900ec52a203)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server/cli: Fix locking in function processing "set server" command

The commit c7b391aed ("BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn
is set from the CLI") introduced 2 bugs. The first one is a typo on the
server's lock label (s/SERVER_UNLOCK/SERVER_LOCK/). The second one is about
the server's lock itself. It must be acquired to execute the "agent-send"
subcommand.

The patch above is marked to be backported as far as 1.8. Thus, this one
must also backported as far 1.8.

BUG/MINOR: server/cli: Don't forget to lock server on agent-send subcommand

(cherry picked from commit 0ba54bb40102787ce875c5d15cb0fb4193f178ec)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 88fca377546267d24ab23cb7f7d2e0eef9d71ee6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI

To perform servers resolution, the resolver's lock is first acquired then
the server's lock when necessary. However, when the fqdn is set via the CLI,
the opposite is performed. So, it is possible to experience an ABBA
deadlock.

To fix this bug, the server's lock is acquired and released for each
subcommand of "set server" with an exception when the fqdn is set. The
resolver's lock is first acquired. Of course, this means we must be sure to
have a resolver to lock.

This patch must be backported as far as 1.8.

(cherry picked from commit c7b391aed21f384ac38b9f2f98239c5f1cdd1895)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 77df12ca694b168326c29efa2580a95427ad644e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: resolvers: segfault using server template without SRV RECORDs

This patch fix the issue adding a test in srvrq before registering
the server on it during server template init.

This was a regression due to commit :
3406766d57fc11478d54a6fa2d048cbfe4524a4e

This should be backported with this previous commit (until 2.0)

(cherry picked from commit caef19e0c7594b00aa341d8984a33ddd0f15113f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 3ecaf53374306519ff4394a7e7eebfe05235b9d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: resolvers: add a ref between servers and srv request or used SRV record

This patch add a ref into servers to register them onto the
record answer item used to set their hostnames.

It also adds a head list into 'srvrq' to register servers free
to be affected to a SRV record.

A head of a tree is also added to srvrq to put servers which
present a hotname in server state file. To re-link them fastly
to the matching record as soon an item present the same name.

This results in better performances on SRV record response
parsing.

This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)

(cherry picked from commit 3406766d57fc11478d54a6fa2d048cbfe4524a4e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 0c4a8a3cfc0e7cf8df28012222f930e55bcb7df4)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item

This patch adds a head list into answer items on servers which use
this record to set their IPs. It makes lookup on duplicated ip faster and
allow to check immediatly if an item is still valid renewing the IP.

This results in better performances on A/AAAA resolutions.

This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)

(cherry picked from commit bd78c912fd73b8506a39c53b6f0b00330f998aca)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit f9ca5d8bc12441c3c21a66b92ea401803a102f2b)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: resolvers: answser item list was randomly purged or errors

In case of SRV records, The answer item list was purged by the
error callback of the first requester which considers the error
could not be safely ignored. It makes this item list unavailable
for subsequent requesters even if they consider the error
could be ignored.

On A resolution or do_resolve action error, the answer items were
never trashed.

This patch re-work the error callbacks and the code to check the return code
If a callback return 1, we consider the error was ignored and
the answer item list must be kept. At the opposite, If all error callbacks
of all requesters of the same resolution returns 0 the list will be purged

This patch should be backported as far as 2.0.

(cherry picked from commit 12ca658dbe5bc3a3f47de06b2c19d5f0ee08941e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 43839d0ec40d8aa57c97955fc2dc6bfc1404d3c0)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: cfgparse-ssl: Remove const from defpx param in keylog parsing function

defpx parameter is const in cfg parsing functions since the 2.4. It is not
const in 2.3 and below. This build issue was introduced by the commit
b32469435c ("BUILD: make tune.ssl.keylog available again").

This patch is 2.3-specific. Thus there is upstream commid ID. And no
backport is needed, except if the commit above is backported.

BUG/MEDIUM: dns: send messages on closed/reused fd if fd was detected broken

This fix complete this incomplete bug fix '(751872e4)':
'BUG/MEDIUM: dns: reset file descriptor if send returns an error

This previous patch detects error on fd and close this one but the code
continues the loop on pending queries and try to send them using the
previously closed fd which could be reused by an other thread.

This patch stop to send queries on this fd and count them on
snd_error counter.

This patch applies on branch 2.3 and all this stuff is already
fixed in newer version by '(d26a6237ad)':
'MEDIUM: resolvers: split resolving and dns message exchange layers'

This patch should be backported on all branches including the previous
fix '(751872e4)':
'BUG/MEDIUM: dns: reset file descriptor if send returns an error'

MINOR: mux-h2: obey http-ignore-probes during the preface

We're seeing some browsers setting up multiple connections and closing
some to just keep one. It looks like they do this in case they'd
negotiate H1. This results in aborted prefaces and log pollution about
bad requests and "PR--" in the status flags.

We already have an option to ignore connections with no data, it's called
http-ignore-probes. But it was not used by the H2 mux. However it totally
makes sense to use it during the preface.

This patch changes this so that connections aborted before sending the
preface can avoid being logged.

This should be backported to 2.4 and 2.3 at least, and probably even
as far as 2.0.

(cherry picked from commit ee4684f65b6ea627e34395d254daf7971d3ed90f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit c23d6d11bc7014b58feed96e91aa55aff3d03a2e)
[wt: s/HA_ATOMIC_INC/HA_ATOMIC_ADD]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stats: make "show stat typed desc" work again

As part of the changes to support per-module stats data in 2.3-dev6
with commit ee63d4bd6 ("MEDIUM: stats: integrate static proxies stats
in new stats"), a small change resulted in the description field to
be replaced by the name field, making it pointless. Let's fix this
back.

This should fix issue #1291. Thanks to Nick Ramirez for reporting this
issue.

This patch can be backported to 2.3.

(cherry picked from commit fc8e438637aa926e681c8b7f3fa061021dfb2201)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b653d2f152a3ab837fea5c3b559e89133031e791)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: backend: only skip LB when there are actual connections

In 2.3, a significant improvement was brought against situations where
the queue was heavily used, because some LB algos were still checked
for no reason before deciding to put the request into the queue. This
was commit 82cd5c13a ("OPTIM: backend: skip LB when we know the backend
is full").

As seen in previous commit ("BUG/MAJOR: queue: set SF_ASSIGNED when
setting strm->target on dequeue") the dequeuing code is extremely
tricky, and the optimization above tends to emphasize transient issues
by making them permanent until the next reload, which is not acceptable
as the code must always be robust against any bad situation.

This commit brings a protection against such a situation by slightly
relaxing the test. Instead of checking that there are pending connections
in the backend queue, it also verifies that the backend's connections are
not solely composed of queued connections, which would then indicate we
are in this situation. This is not rocket science, but at least if the
situation happens, we know that it will unlock by itself once the streams
have left, as new requests will be allowed to reach the servers and to
flush the queue again.

This needs to be backported to 2.4 and 2.3.

(cherry picked from commit f9a7c442f64263476210756e375ff923d7f6cf33)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a7ed7d4cea76cec55023219821d93b1c2ceebc75)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: queue: set SF_ASSIGNED when setting strm->target on dequeue

Commit 82cd5c13a ("OPTIM: backend: skip LB when we know the backend is
full") has uncovered a long-burried bug in the dequeing code: when a
server releases a connection, it picks a new one from the proxy's or
its queue. Technically speaking it only picks a pendconn which is a
link between a position in the queue and a stream. It then sets this
pendconn's target to itself, and wakes up the stream's task so that
it can try to connect again.

The stream then goes through the regular connection setup phases,
calls back_try_conn_req() which calls pendconn_dequeue(), which
sets the stream's target to the pendconn's and releases the pendconn.
It then reaches assign_server() which sees no SF_ASSIGNED and calls
assign_server_and_queue() to perform load balancing or queuing. This
one first destroys the stream's target and gets ready to perform load
balancing. At this point we're load-balancing for no reason since we
already knew what server was available. And this is where the commit
above comes into play: the check for the backend's queue above may
detect other connections that arrived in between, and will immediately
return FULL, forcing this request back into the queue. If the server
had a very low maxconn (e.g. 1 due to a long slowstart), it's possible
that this evicted connection was the last one on the server and that
no other one will ever be present to process the queue. Usually a
regularly processed request will still have its own srv_conn that will
be used during stream_free() to dequeue other connections. But if the
server had a down-up cycle, then a call to pendconn_grab_from_px()
may start to dequeue entries which had no srv_conn and which will have
no server slot to offer when they expire, thus maintaining the situation
above forever. Worse, as new requests arrive, there are always some
requests in the queue and the situation feeds on itself.

The correct fix here is to properly set SF_ASSIGNED in pendconn_dequeue()
when the stream's target is assigned (as it's what this flag means), so
as to avoid a load-balancing pass when dequeuing.

Many thanks to Pierre Cheynier for the numerous detailed traces he
provided that helped narrow this problem down.

This could be backported to all stable versions, but in practice only
2.3 and above are really affected since the presence of the commit
above. Given how tricky this code is it's better to limit it to those
versions that really need it.

(cherry picked from commit 7867cebf31403b6d7bab97cb3047e6d77238fcdc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a6cce193896a91c2cb7d0585191f51e1e0fc3fdd)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mworker: fix typo in chroot error message

Since its introduction in 1.8 with commit 095ba4c24 ("MEDIUM: mworker:
replace systemd mode by master worker mode"), it says "cannot chroot1(...)"
which seems to be a leftover of a debug message. It could be backported but
probably nobody will notice.

(cherry picked from commit e34cf2801154eae542191120d5165a963d2cb98c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 46317376f2f3be9c7e7df0c2b996716a5818fa77)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl: use atomic ops to update global shctx stats

The global shctx lookups and misses was updated without using atomic
ops, so the stats available in "show info" are very likely off by a few
units over time. This should be backported as far as 1.8. Versions
without _HA_ATOMIC_INC() can use HA_ATOMIC_ADD(,1).

(cherry picked from commit 4c19e996218f6c205c1716a0b4718f9bced7f893)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 37d69399648e489fea9f93b7e9fae02dfa59acbb)
[wt: s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: shctx: use at least thread-based locking on USE_PRIVATE_CACHE

Since threads were introduced in 1.8, the USE_PRIVATE_CACHE mode of the
shctx was not updated to use locks. Originally it was meant to disable
sharing between processes, so it removes the lock/unlock instructions.
But with threads enabled, it's not possible to work like this anymore.

It's easy to see that once built with private cache and threads enabled,
sending violent SSL traffic to the the process instantly makes it die.
The HTTP cache is very likely affected as well.

This patch addresses this by falling back to our native spinlocks when
USE_PRIVATE_CACHE is used. In practice we could use them also for other
modes and remove all older implementations, but this patch aims at keeping
the changes very low and easy to backport. A new SHCTX_LOCK label was
added to help with debugging, but OTHER_LOCK might be usable as well
for backports.

An even lighter approach for backports may consist in always declaring
the lock (or reusing "waiters"), and calling pl_take_s() for the lock()
and pl_drop_s() for the unlock() operation. This could even be used in
all modes (process and threads), even when thread support is disabled.

Subsequent patches will further clean up this area.

This patch must be backported to all supported versions since 1.8.

(cherry picked from commit 9e467af804b3dfa05eaa4677644f5ff0c0e0619f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit ca36771425fbced2c81de5259da304dc4fa0254c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: stick-table: insert srv in used_name tree even with fixed id

If the server id is fixed in the configuration, it is immediately
inserted in the 'used_server_id' backend tree via srv_parse_id. On
check_config_validity, the dynamic id generation is thus skipped for
fixed-id servers. However, it must nevertheless be inserted in the
'used_server_name' backend tree.

This bug seems to be not noticeable for the user. Indeed, before the
fix, the search in sticking_rule_find_target always returned NULL for
the name, then the fallback search with server id succeeded, so the
persistence is properly applied. However with the fix the fallback
search is not executed anymore, which saves from the locking of
STK_SESS.

This should be backported up to 2.0.

(cherry picked from commit 077c6b8d29ba63a625648ebb73bf151a793f0096)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 50779aefca2b685bf6e2c922d02bfab59195534a)
Signed-off-by: Willy Tarreau <w@1wt.eu>

DOC: lua: Add a warning about buffers modification in HTTP

Since the 1.9, it is forbidden to alter the channel buffer from an HTTP
stream because there is no way to keep the HTTP parser synchronized if the
buffer content is altered. In addition, since the HTX is the only
reprensentation for HTTP messages, the data in HTTP buffers are structured
and cannot be read or updated in a raw fashion.

A warning is triggered when a user tries to alter an HTTP buffer. However,
it was not documented. This patch adds a warning in the lua documentation.

This patch is related to the issue #1287. It may be backported as far as
2.0.

(cherry picked from commit 095303956708a3f87594ae60cd9d7df12d12d905)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 1cda6c8bd2461819cfc7b3c9625b2cb49d808406)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: mux-fcgi: Expose SERVER_SOFTWARE parameter by default

As specified in the RFC3875 (section 4.1.17), this parameter must be set to
the name and version of the information server software making the CGI
request. Thus, it is now added to the default parameters defined by
HAProxy. It is set to the string "HAProxy $version".

This patch should fix the issue #1285 and must be backported as far as 2.2.

(cherry picked from commit 5cd0e528cf2bed6eb1f79582ef89cf1667337e46)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit b2a50298b1f6f46c943b77685142d59d3e69e193)
[wt: minor ctx adjustments]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: htx: Fix htx_defrag() when an HTX block is expanded

When an HTX block is expanded, a defragmentation may be performed first to
have enough space to copy the new data. When it happens, the meta data of
the HTX message must take account of the new data length but copied data are
still unchanged at this stage (because we need more space to update the
message content). And here there is a bug because the meta data are updated
by the caller. It means that when the blocks content is copied, the new
length is already set. Thus a block larger than the reality is copied and
data outside the buffer may be accessed, leading to a crash.

To fix this bug, htx_defrag() is updated to use an extra argument with the
new meta data to use for the referenced block. Thus the caller does not need
to update the HTX message by itself. However, it still have to update the
data.

Most of time, the bug will be encountered in the HTTP compression
filter. But, even if it is highly unlikely, in theory it is also possible to
hit it when a HTTP header (or only its value) is replaced or when the
start-line is changed.

This patch must be backported as far as 2.0.

(cherry picked from commit 1cf414b522b465c97e5e87e4e38be93e6f634b32)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 159873a5b380933e69444ae9c42fce6cc116599d)
Signed-off-by: Willy Tarreau <w@1wt.eu>

CLEANUP: pools: remove now unused seq and pool_free_list

These ones were only used by the lockless implementation and are not
needed anymore.

(cherry picked from commit 1526ffe815ef6136eaabb9d1c188fe7e636f2062)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 47245188ace9165555dda4c8712527367de96c2c)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MAJOR: pools: fix possible race with free() in the lockless variant

In GH issue #1275, Fabiano Nunes Parente provided a nicely detailed
report showing reproducible crashes under musl. Musl is one of the libs
coming with a simple allocator for which we prefer to keep the shared
cache. On x86 we have a DWCAS so the lockless implementation is enabled
for such libraries.

And this implementation has had a small race since day one: the allocator
will need to read the first object's <next> pointer to place it into the
free list's head. If another thread picks the same element and immediately
releases it, while both the local and the shared pools are too crowded, it
will be freed to the OS. If the libc's allocator immediately releases it,
the memory area is unmapped and we can have a crash while trying to read
that pointer. However there is no problem as long as the item remains
mapped in memory because whatever value found there will not be placed
into the head since the counter will have changed.

The probability for this to happen is extremely low, but as analyzed by
Fabiano, it increases with the buffer size. On 16 threads it's relatively
easy to reproduce with 2MB buffers above 200k req/s, where it should
happen within the first 20 seconds of traffic usually.

This is a structural issue for which there are two non-trivial solutions:
  - place a read lock in the alloc call and a barrier made of lock/unlock
    in the free() call to force to serialize operations; this will have
    a big performance impact since free() is already one of the contention
    points;

  - change the allocator to use a self-locked head, similar to what is
    done in the MT_LISTS. This requires two memory writes to the head
    instead of a single one, thus the overhead is exactly one memory
    write during alloc and one during free;

This patch implements the second option. A new POOL_DUMMY pointer was
defined for the locked pointer value, allowing to both read and lock it
with a single xchg call. The code was carefully optimized so that the
locked period remains the shortest possible and that bus writes are
avoided as much as possible whenever the lock is held.

Tests show that while a bit slower than the original lockless
implementation on large buffers (2MB), it's 2.6 times faster than both
the no-cache and the locked implementation on such large buffers, and
remains as fast or faster than the all implementations when buffers are
48k or higher. Tests were also run on arm64 with similar results.

Note that this code is not used on modern libcs featuring a fast allocator.

A nice benefit of this change is that since it removes a dependency on
the DWCAS, it will be possible to remove the locked implementation and
replace it with this one, that is then usable on all systems, thus
significantly increasing their performance with large buffers.

Given that lockless pools were introduced in 1.9 (not supported anymore),
this patch will have to be backported as far as 2.0. The code changed
several times in this area and is subject to many ifdefs which will
complicate the backport. What is important is to remove all the DWCAS
code from the shared cache alloc/free lockless code and replace it with
this one. The pool_flush() code is basically the same code as the
allocator, retrieving the whole list at once. If in doubt regarding what
barriers to use in older versions, it's safe to use the generic ones.

This patch depends on the following previous commits:

- MINOR: pools: do not maintain the lock during pool_flush()
- MINOR: pools: call malloc_trim() under thread isolation
- MEDIUM: pools: use a single pool_gc() function for locked and lockless

The last one also removes one occurrence of an unneeded DWCAS in the
code that was incompatible with this fix. The removal of the now unused
seq field will happen in a future patch.

Many thanks to Fabiano for his detailed report, and to Olivier for
his help on this issue.

(cherry picked from commit 2a4523f6f4cba1cb9c899d0527a1bb4c5d5c0f2e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit d0cc3761495b55764e58aff97f7c82a687c5239f)
[wt: backported into the lockless functions only; kept pool_free_area()
and local accounting instead of pool_free_to_os(); tested with random
pool_flush() and heavy pool_gc() calls]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: pools: use a single pool_gc() function for locked and lockless

Locked and lockless shared pools don't need to use a different pool_gc()
function because this function isolates itself during the operation, so
we do not need to rely on DWCAS nor any atomic operation in fact. Let's
just get rid of the lockless one in favor of the simple one. This should
even result in a faster execution.

The ifdefs were slightly moved so that we can have pool_gc() defined
as soon as there are global pools, this avoids duplicating the function.

(cherry picked from commit 9b3ed51371693f70bab07e09cdd34e26e3a46d93)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit a206cf93eb17e2d0f8184b91a0775ada763e7f71)
[wt: for 2.3, this was simplified, only the lockless version was
reworked to work without the DWCAS, very similarly to the one in
2.4 except that accounting still has to be done]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pools: call malloc_trim() under thread isolation

pool_gc() was adjusted to run under thread isolation by commit c0e2ff202
("MEDIUM: memory: make pool_gc() run under thread isolation") so that the
underlying malloc() and free() don't compete between threads during these
potentially aggressive moments (especially when mmap/munmap are involved).

Commit 88366c292 ("MEDIUM: pools: call malloc_trim() from pool_gc()")
later added a call to malloc_trim() but made it outside of the thread
isolation, which is contrary to the principle explained above. Also it
missed it in the locked version, meaning that those without a lockless
implementation cannot benefit from trimming.

This patch fixes that by calling it before thread_release() in both
places.

(cherry picked from commit 26ed183556f3df9e9bde6f6d47143aa4a18ae463)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 46b515ca55b113827ed3e52937cf7ad154d2f6c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pools: do not maintain the lock during pool_flush()

The locked version of pool_flush() is absurd, it locks the pool for each
and every element to be released till the end. Not only this is extremely
inefficient, but it may even never finish if other threads spend their
time refilling the pool. The only case where this can happen is during
soft-stop so the risk remains limited, but it should be addressed.

(cherry picked from commit c88914379da35c46d093f6d410b9507355aacd0a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit af8120a53cf6a7a8e244135893195c6b65ccc91a)
[wt: in 2.3 the pool_flush code is different because it still requires
to manipulate ->allocated under the lock, thus the lock covers the
counting and detaching, and the freeing is performed out of the lock]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: pools: fix a possible memory leak in the lockless pool_flush()

The lockless version of pool_flush() had a leftover of the original
version causing the pool's first entry to be set to NULL at the end.
The problem is that it does this outside of any lock and in a non-
atomic way, so that any concurrent alloc+free would result in a lost
object.

The risk is low and the consequence even lower, given that pool_flush()
is only used in pool_destroy() (hence single-threaded) or by stream_free()
during a soft-stop (not the place where most allocations happen), so in
the worst case it could result in valgrind complaining on soft-stop.

The bug was introduced with the first version of the code, in 1.9, so
the fix can be backported to all stable versions.

(cherry picked from commit c239cde26fbf0b07abdf590b77c31154bd65c566)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit caf6555f418767a1298aa7e273609fc1510184db)
[wt: adjusted context: pool->allocated still updated last]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: compression: Add a flag to know the filter is still processing data

Since the commit acfd71b97 ("BUG/MINOR: http-comp: Preserve
HTTP_MSGF_COMPRESSIONG flag on the response"), there is no more flag to know
when the compression ends. This means it is possible to finish the
compression several time if there are trailers.

So, we reintroduce almost the same mechanism but with a dedicated flag. So
now, there is a bits field in the compression filter context.

The commit above is marked to be backported as far as 2.0. Thus this patch
must also be backported as far as 2.0.

(cherry picked from commit 12554d00f6ed16324a0dfbafb5230f8f0136e4ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 734863ecfa1f8a620904fd9332e01673728c784b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl: OCSP stapling does not work if expire too far in the future

The wey the "Next Update" field of the OCSP response is converted into a
timestamp relies on the use of signed integers for the year and month so
if the calculated timestamp happens to overflow INT_MAX, it ends up
being seen as negative and the OCSP response being dwignored in
ssl_sock_ocsp_stapling_cbk (because of the "ocsp->expire < now.tv_sec"
test).

It could be backported to all stable branches.

(cherry picked from commit a3a0cce8ee8c142cd148090854ca8551a36d9bd7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 1adb439f84b3bb3a004736ef1fa88c899cd64f3e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUILD: make tune.ssl.keylog available again

Since commit 04a5a44 ("BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of
OpenSSL versions") the "tune.ssl.keylog" feature is broken because
HAVE_OPENSSL_KEYLOG does not exist.

Replace this by a HAVE_SSL_KEYLOG which is defined in openssl-compat.h.
Also add an error when not built with the right openssl version.

Must be backported as far as 2.3.

(cherry picked from commit 722180aca8757d8807b21cf125a2d68249be5bf8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit ca5cf0a196ef2e7d1a16ecaeda5f983551604a30)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC: use the req.ssl_sni in examples

This patch should be backported to at least 2.0

(cherry picked from commit 5c866200d4ec61e02676596d473a2af88916d45e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 18c8877b34d3e376b9f15299ff92e8d641db30f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: stream-int: Release SI endpoint on server side ASAP on retry

When a connection attempt failed, if a retry is possible, the SI endpoint on
the server side is immediately released, instead of waiting to establish a
new connection to a server. Thus, when the backend SI is switched from
SI_ST_CER state to SI_ST_REQ, SI_ST_ASS or SI_ST_TAR, its endpoint is
released. It is expected because the SI is moved to a state prior to the
connection stage ( < SI_ST_CONN). So it seems logical to not have any server
connection.

It is especially important if the retry is delayed (SI_ST_TAR or
SI_ST_QUE). Because, if the server connection is preserved, any error at the
connection level is unexpectedly relayed to the stream, via the
stream-interface, leading to an infinite loop in process_stream(). if
SI_FL_ERR flag is set on the backend SI in another state than SI_ST_CLO, an
internal goto is performed to resync the stream-interfaces. In addtition,
some ressources are not released ASAP.

This bug is quite old and was reported 1 or 2 times per years since the 2.2
(at least) with not enough information to catch it. It must be backported as
far as 2.2 with a special care because this part has moved several times and
after some observation period and feedback from users to be sure. For info,
in 2.0 and prior, the connection is released when an error is encountered in
SI_ST_CON or SI_ST_RDY states.

(cherry picked from commit f822decfda1f662147c02fd8b4f752da7500f374)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit d909078dec09ea7636bc1b0fce536303f146ed5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

DOC/MINOR: move uuid in the configuration to the right alphabetical order

This patch can be backported up to 2.1 where the uuid fetch was
introduced

(cherry picked from commit 528adc3b18fd610df807f14034acc212eaf685b3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit a429ad3fb0261bbe7ae0bc1cc247e4236739828f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: lua/vars: prevent get_var() from allocating a new name

Variable names are stored into a unified list that helps compare them
just based on a pointer instead of duplicating their name with every
variable. This is convenient for those declared in the configuration
but this started to cause issues with Lua when random names would be
created upon each access, eating lots of memory and CPU for lookups,
hence the work in 2.2 with commit 4e172c93f ("MEDIUM: lua: Add
`ifexist` parameter to `set_var`") to address this.

But there remains a corner case with get_var(), which also allocates
a new variables. After a bit of thinking and discussion, it never
makes sense to allocate a new variable name on get_var():
  - if the name exists, it will be returned ;
  - if it does not exist, then the only way for it to appear will
    be that some code calls set_var() on it
  - a call to get_var() after a careful set_var(ifexist) ruins the
    effort on set_var().

For this reason, this patch addresses this issue by making sure that
get_var() will never cause a variable to be allocated. This is done
by modifying vars_get_by_name() to always call register_name() with
alloc=0, since vars_get_by_name() is exclusively used by Lua and the
new CLI's "get/set var" which also benefit from this protection.

It probably makes sense to backport this as far as 2.2 after some
observation period and feedback from users.

For more context and discussions about the issues this was causing,
see https://www.mail-archive.com/haproxy@formilux.org/msg40451.html
and in issue #664.

(cherry picked from commit 89f6dedf486823237ffbe02f8d78df454332c615)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: proxy: Missing calloc return value check in chash_init_server_tree

A memory allocation failure happening in chash_init_server_tree while
trying to allocate a server's lb_nodes item used in consistent hashing
would have resulted in a crash. This function is only called during
configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 476462010ecc2329e9ecba5afd20f84d9605d809)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit c5c5bc4e36ce4a6f3bc113c8e16824fdb276c220)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http: Missing calloc return value check in make_arg_list

A memory allocation failure happening in make_arg_list when trying to
allocate the argument list would have resulted in a crash. This function
is only called during configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 17acbab0ac5933205788eb0a134f9e918247efef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 6d303d66057957dead2b26e8cd865cd30054902d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http: Missing calloc return value check while parsing redirect rule

A memory allocation failure happening in http_parse_redirect_rule when
trying to allocate a redirect_rule structure would have resulted in a
crash. This function is only called during configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit b6864a5b6fa1ea6eb224710d00c897105b9e2a46)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit e10a8f6b9c9437efe8b78239c92ed18775353d1c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list

A memory allocation failure happening in mworker_env_to_proc_list when
trying to allocate a mworker_proc would have resulted in a crash. This
function is only called during init.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 1f4fa906c73e61dba74f9e8b762da12df3052f57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit d90aa347d2a3a81bae8461c009c9b4d69eb069a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo

A memory allocation failure happening in comp_append_type or
comp_append_algo called while parsing compression options would have
resulted in a crash. These functions are only called during
configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 6443bcc2e1f2e1e11af76ef460d8241f06223de8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit a3a8d50d992ab145e5c353cc36e8a787d3aead57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule

A memory allocation failure happening in tcp_parse_request_rule while
processing the "capture" keyword and trying to allocate a cap_hdr
structure would have resulted in a crash. This function is only called
during configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 8cb033643ff3235ac0d3887167ce06fefeaf850b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 31d170c581e8e4f726d7a54288b969cd38dde4b5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http: Missing calloc return value check while parsing tcp-request/tcp-response

A memory allocation failure happening in tcp_parse_tcp_req or
tcp_parse_tcp_rep when trying to allocate an act_rule structure would
have resulted in a crash. These functions are only called during
configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 2ca42b4656f60655a5295a66f321239faee2d9fe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 8f86a29d20a094fede1b60695754aed3bebabe44)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: proxy: Missing calloc return value check in proxy_defproxy_cpy

A memory allocation failure happening in proxy_defproxy_cpy while
copying the default compression options would have resulted in a crash.
This function is called for every new proxy found while parsing the
configuration.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 18a82ba690a6ff4adbf9702cefa6dc89eb36372d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 4d8e20f83cc07c08c1d562e6afc5b4d3a54f4058)
[Cf: Patch applied on cfgparse-listen.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: proxy: Missing calloc return value check in proxy_parse_declare

A memory allocation failure happening during proxy_parse_declare while
processing the "capture" keyword and allocating a cap_hdr structure
would have resulted in a crash. This function is only called during
configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 55ba0d6865662036d1d137dc4aac3841d29ad6d3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 581d892c823ec18ff7066af696c2017c664959da)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture

A memory allocation failure happening in parse_http_req_capture while
processing a "len" keyword and allocating a cap_hdr structure would
have resulted in a crash. This function is only called during
configuration parsing.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit a4bf8a059dd9c783c00680aedcf055faa4b5d784)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 737d8b5c7bbcc38152b5621715f6f34b94d6fc37)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: ssl: Missing calloc return value check in ssl_init_single_engine

A memory allocation failure happening during ssl_init_single_engine
would have resulted in a crash. This function is only called during
init.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 612b2c37be70636e73698c6c064af9522301be65)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 46697c80fec88e656a4f7280a2cfa9dedbfb33a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: peers: Missing calloc return value check in peers_register_table

A memory allocation failure happening during peers_register_table would
have resulted in a crash. This function is only called during init.

It was raised in GitHub issue #1233.
It could be backported to all stable branches.

(cherry picked from commit 208ff01b23eda19fe41780310f4d08ca86e28766)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit be5b1bb8cb683290f80856a3728ccaf3ad47b44b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: server: Missing calloc return value check in srv_parse_source

Two calloc calls were not checked in the srv_parse_source function.
Considering that this function could be called at runtime through a
dynamic server creation via the CLI, this could lead to an unfortunate
crash.

It was raised in GitHub issue #1233.
It could be backported to all stable branches even though the runtime
crash could only happen on branches where dynamic server creation is
possible.

(cherry picked from commit f1800e64ef2428747e696b0ef2f78ab05a116dcc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit dc187917dfbcb851daf7f8dd78a1ade5331d5693)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-ana: Handle L7 retries on refused early data before K/A aborts

When a network error occurred on the server side, if it is not the first
request (in case of keep-alive), nothing is returned to the client and its
connexion is closed to be sure it may retry. However L7 retries on refused
early data (0rtt-rejected) must be performed first.

In addition, such L7 retries must also be performed before incrementing the
failed responses counter.

This patch must be backported as far as 2.0.

(cherry picked from commit d976923ab27e8aeb6cc6518232b5d2835401b1c8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit b187534982d285f569ec9731f4bce0287232fd25)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http-comp: Preserve HTTP_MSGF_COMPRESSIONG flag on the response

This flag is set on the response when its payload is compressed by HAProxy.
It must be preserved because it may be used when the log message is emitted.

When the compression filter was refactored to support the HTX, an
optimization was added to not perform extra proessing on the trailers.
HTTP_MSGF_COMPRESSIONG flag is removed when the last data block is
compressed. It is not required, it is just an optimization and unfortunately
a bug. This optimization must be removed to preserve the flag.

This patch must be backported as far as 2.0. On the HTX is affected.

(cherry picked from commit acfd71b97aecea49bc3e3fad18d662960b750970)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 73f40e02b055f4fcc92fec9fe12e78034fde0260)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter

For each filter, pre and post callback functions must only be called one
time. To do so, when one of them is finished, the corresponding analyser bit
must be removed from pre_analyzers or post_analyzers bit field. It is only
an issue with pre-analyser callback functions if the corresponding analyser
yields. It may happens with lua action for instance. In this case, the
filters pre analyser callback function is unexpectedly called several times.

This patch should fix the issue #1263. It must be backported is all stable
versions.

(cherry picked from commit a6d3704e38b6c1cb09286e9778b6363825d65c4d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 476013c1213bd053c4c29984fc07c980e71893a2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'

A deadlock is possible with 'set maxconn server' command, if there is
pending connection ready to be dequeued. This is caused by the locking
of server spinlock in both cli_parse_set_maxconn_server and
process_srv_queue.

Fix this by reducing the scope of the server lock into
server_parse_maxconn_change_request. If connection are dequeued, the
lock is taken a second time. This can be seen as suboptimal but as it
happens only during 'set maxconn server' it can be considered as
tolerable.

This issue was reported on the mailing list, for the 1.8.x branch.
It must be backported up to the 1.8.

(cherry picked from commit 79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 3cf814498e8196a9bc5dade38a5de77bd4ee6431)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: ebtree: Invalid read when looking for dup entry

The first item inserted into an ebtree will be inserted directly below
the root, which is a simple struct eb_root which only holds two branch
pointers (left and right).
If we try to find a duplicated entry to this first leaf through a
ebmb_next_dup, our leaf_p pointer will point to the eb_root instead of a
complete eb_node so we cannot look for the bit part of our leaf_p since
it would try to cast our eb_root into an eb_node and perform an out of
bounds access when reading "eb_root_to_node(eb_untag(t,EB_LEFT)))->bit".
This bug was found by address sanitizer running on a CRL hot update VTC
test.

Note that the bug has been there since the import of the eb_next_dup()
and eb_prev_dup() function in 1.5-dev19 by commit 2b5702030 ("MINOR:
ebtree: add new eb_next_dup/eb_prev_dup() functions to visit duplicates").

It can be backported to all stable branches.

(cherry picked from commit 2608e348bec8a533225424fad06760ce6e19f167)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 0179e1d7aa4a15b5325e8e3744ef302a488893d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

REGTESTS: Add script to test abortonclose option

This script test abortonclose option for HTTP/1 client only. It may be
backported as far as 2.0. But on the 2.2 and prior, the syslog part must be
adapted to catch log messages emitted by proxy during HAProxy
startup. Following lines must be added :

    recv
    expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe1 started."
    recv
    expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe2 started."

(cherry picked from commit 16b37510bc1114ab7603240f8bbfeea20e5ee23b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set

On client side, if CO_RFL_KEEP_RECV flags is set when h1_rcv_buf() is
called, we force subscription for reads to be able to catch read0. This way,
the event will be reported to upper layer to let the stream abort the
request.

This patch fixes the abortonclose option for H1 connections. It depends on
following patches :

  * MEDIUM: mux-h1: Don't block reads when waiting for the other side
  * MINOR: conn-stream: Force mux to wait for read events if abortonclose is set

But to be sure the event is handled by the stream, the following patches are
also required :

  * BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
  * MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()

All the series must be backported with caution as far as 2.0, and only after
a period of observation to be sure nothing broke.

(cherry picked from commit 1baef1523d1be1b6edf76c78688af932d4d36d8a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MEDIUM: mux-h1: Don't block reads when waiting for the other side

When we are waiting for the other side to read more data, or to read the
next request, we must only stop the processing of input data and not the
data receipt. This patch don't change anything on the subscribes for
reads. So it should not change anything. The only difference is that the H1
connection will try to read data if it is woken up for an I/O event and if
it was subscribed for reads.

This patch is required to fix abortonclose option for H1 client connections.

(cherry picked from commit ec4207cb68b1d7d50e06d35aaa73586e2c7d46b0)
[Cf: H1C_F_IN_BUSY flag is used instead of H1C_F_WAIT_OUTPUT]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: conn-stream: Force mux to wait for read events if abortonclose is set

When the abortonclose option is enabled, to be sure to be immediately
notified when a shutdown is received from the client, the frontend
conn-stream must be sure the mux will wait for read events. To do so, the
CO_RFL_KEEP_RECV flag is set when mux->rcv_buf() is called. This new flag
instructs the mux to wait for read events, regardless its internal state.

This patch is required to fix abortonclose option for H1 client connections.

(cherry picked from commit d8219b31e7cfe0dec64c33e1186e5f33cf87191a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive

In si_update_rx() function, the reads may be blocked because we explicitly
don't want to read or because of a lack of room in the input buffer. The
first condition is valid. However the second one only test if the channel is
empty or not. It means the reads are blocked if there are still some output
data in the input channel, in its buffer or its pipe. This condition is not
accurate. The reads must not be blocked if the channel can still receive
data. Thus instead of relying on channel_is_empty() function, we now call
channel_may_recv().

This patch is especially useful to be able to catch read0 on client side
when we are waiting for a connection to the server, when abortonclose option
is enabled. Otherwise, the client abort is not detected.

This patch depends on "MINOR: channel: Rely on HTX version if appropriate in
channel_may_recv()". Both must be backported as far as 2.0 after a period of
observation to be sure nothing broke.

(cherry picked from commit e0dec4b7b258101f6d5faa15234103a45c16f0f8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()

When channel_may_recv() is called for an HTX stream, the HTX version,
channel_htx_may_recv() is called. This patch is mandatory to fix a bug
related to the abortonclose option.

(cherry picked from commit 1c235e57d0087e86074178f682c24f8aa44e0fcd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port

Check the return value of url2sa in smp_fetch_url_ip/port. If negative,
the address result is uninitialized and the sample fetch is aborted.
Also, the sockaddr is prelimiary zero'ed before calling url2sa to ensure
that it is not used by upper functions even if the sample returns 0.

Without the check, the value returned by the url_ip/url_port fetches is
unspecified. This can be triggered with the following curl :
$ curl -iv --request-target "xxx://127.0.0.1:20080/" http://127.0.0.1:20080/

This should be backported to all stable branches. However, note that
between the 1.8 and 2.0, the targetted functions have been extracted
from proto_http.c to http_fetch.c.

This should fix in part coverity report from the github issue #1244.

(cherry picked from commit c89d5337ee5064091ceab7b2d967f99071dc39eb)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set

On observe mode, if a server is marked as DOWN, the server's health-check is
rescheduled using the fastinter timeout if the new expiration date is newer
that the current one. But this must only be performed if the fastinter
timeout is defined.

Internally, tick_is_lt() function only checks the date and does not perform any
verification on the provided args. Thus, we must take care of it. However, it is
possible to disable the server health-check by setting its task expiration date
to TICK_ETERNITY.

This patch must be backported as far as 2.2. It is related to

(cherry picked from commit ea860837183c4800c97d6e8e7fb417d8f1ab2ef0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started

A connection may be synchronously established. In the tcpcheck context, it
may be a problem if several connections come one after another. In this
case, there is no event to close the very first connection before starting
the next one. The checks is thus blocked and timed out, a L7 timeout error
is reported.

To fix the bug, when a tcpcheck is started, we immediately evaluate its
state. Most of time, nothing is performed and we must wait. But it is thus
possible to handle the result of a successfull connection.

This patch should fix the issue #1234. It must be backported as far as 2.2.

(cherry picked from commit 92017a3215545c26cd9baad11a1cdfb2859acc6c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: Reset stream final state and si error type on L7 retry

Thanks to a previous fix, the stream error mask is now cleared on L7
retry. But the stream final state (SF_FINST_*) and the stream-interface
error type must also be reset to properly restart a new connection and be
sure to not inherit errors from the previous connection attempt.

In addition, SF_ADDR_SET flag is not systematically removed.
stream_choose_redispatch() already takes care to unset it if necessary. When
the connection is not redispatch, the server address can be preserved.

This patch must be backported as far as 2.0.

(cherry picked from commit 30aa0da532a9853b3ed5b89152fd57537d07ee88)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: properly clear the previous error mask on L7 retries

The cleanup of the previous error was incorrect on L7 retries, it would
OR two values while they're part of an enum, leaving some bits set.
Depending on the errors it was possible to occasionally see an internal
error ("I" flag) being logged.

This should be backported as far as 2.0, though the do_l7_retry() function
in in proto_htx.c in older versions.

(cherry picked from commit 75a4284babf50feecac17f9b32d07509b6c519ed)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MINOR: stream: Decrement server current session counter on L7 retry

When a L7 retry is performed, we must not forget to decrement the current
session counter of the assigned server. Of course, it must only be done if
the current session is already counted on the server, thus if SF_CURR_SESS
flag is set on the stream.

This patch is related to the issue #1003. It must be backported as far as
2.0.

(cherry picked from commit e763c8c99f7134a009117a59e6a6002b3c9e8c84)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: dns: reset file descriptor if send returns an error

The fd is purged if send returns -1 without EAGAIN and the
sock.fd is set to -1 to recreate a new fd on next
sending attempt on this nameserver.

Before this fix we continue to send on the fd returning
an error indefinitely.

This patch applies on branch 2.3 and is fixed in newer versions
by this commit (d26a6237ad):
'MEDIUM: resolvers: split resolving and dns message exchange layers'

This patch should be backported an all supported branches 1.6 <= v <= 2.3

MINOR: debug: add a new "debug dev sym" command in expert mode

This command attempts to resolve a pointer to a symbol name. This is
convenient during development as it's easier to get such pointers live
than by issuing a debugger or calling addr2line.

(cherry picked from commit 48129be18afd17688bced0bba8c0d98e2b85ef39)
[wt: backported as sometimes useful when debugging in field;
s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)/]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS

The purpose of this debugging option was to prevent certain pools from
masking other ones when they were shared. For example, task, http_txn,
h2s, h1s, h1c, session, fcgi_strm, and connection are all 192 bytes and
would normally be mergedi, but not with this option. The problem is that
certain pools are declared multiple times with various parameters, which
are often very close, and due to the way the option works, they're not
shared either. Good examples of this are captures and stick tables. Some
configurations have large numbers of stick-tables of pretty similar types
and it's very common to end up with the following when the option is
enabled:

  $ socat - /tmp/sock1  <<< "show pools" | grep stick
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753800=56
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753880=57
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753900=58
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753980=59
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a00=60
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a80=61
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753b00=62
    - Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753780=55

In addition to not being convenient, it can have important effects on the
memory usage because these pools will not share their entries, so one stick
table cannot allocate from another one's pool.

This patch solves this by going back to the initial goal which was not to
have different pools in the same list. Instead of masking the MAP_F_SHARED
flag, it simply adds a test on the pool's name, and disables pool sharing
if the names differ. This way pools are not shared unless they're of the
same name and size, which doesn't hinder debugging. The same test above
now returns this:

  $ socat - /tmp/sock1  <<< "show pools" | grep stick
    - Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 7 users, @0x3fadb30 [SHARED]
    - Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x3facaa0 [SHARED]

This is much better. This should probably be backported, in order to limit
the side effects of DEBUG_DONT_SHARE_POOLS being enabled in production.

(cherry picked from commit 1ab6c0bfd2b8e178d78955e5a8279fe04ac45da1)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MEDIUM: pools: call malloc_trim() from pool_gc()

If available it definitely makes sense to call it since it's also
called when stopping to reclaim the maximum possible memory.

(cherry picked from commit 88366c2926deac5ee257b6c541633b6a8b5111b3)
[wt: backported as it helps not only on reload but also on SIGQUIT to
try to return all unused memory]
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: compat: automatically include malloc.h on glibc

This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.

(cherry picked from commit 1d3c7003d91a2c3b48f0f1fee3eb361f4121057b)
[wt: CONFIG_HAP_NO_GLOBAL_POOLS was backported via commitc0457dc81
but the optional HA_HAVE_MALLOC_TRIM was never set, and it was
shown to help in some cases]
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: ssl/cli: fix a lock leak when no memory available

This bug was introduced in e5ff4ad ("BUG/MINOR: ssl: fix a trash buffer
leak in some error cases").

When cli_parse_set_cert() returns because alloc_trash_chunk() failed, it
does not unlock the spinlock which can lead to a deadlock later.

Must be backported as far as 2.1 where e5ff4ad was backported.

(cherry picked from commit 5ba80d677d563517bb9754c272e6df94adae281b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MEDIUM: cli: prevent memory leak on write errors

Since the introduction of payload support on the CLI in 1.9-dev1 by
commit abbf60710 ("MEDIUM: cli: Add payload support"), a chunk is
temporarily allocated for the CLI to support defragmenting a payload
passed with a command. However it's only released when passing via
the CLI_ST_END state (i.e. on clean shutdown), but not on errors.
Something as trivial as:

$ while :; do ncat --send-only -U /path/to/cli <<< "show stat"; done

with a few hundreds of servers is enough see the number of allocated
trash chunks go through the roof in "show pools".

This needs to be backported as far as 2.0.

(cherry picked from commit 18b2a9dd874b66acca304580047fa6b3c16da1e3)
Signed-off-by: Willy Tarreau <w@1wt.eu>

BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers

When the lua buffers are used, a variable number of stack slots may be
used. Thus we cannot assume that we know where the top of the stack is. It
was not an issue for lua < 5.4.3 (at least for small buffers). But
'socket:receive()' now fails with lua 5.4.3 because a light userdata is
systematically pushed on the top of the stack when a buffer is initialized.

To fix the bug, in hlua_socket_receive(), we save the index of the top of
the stack before creating the buffer. This way, we can check the number of
arguments, regardless anything was pushed on the stack or not.

Note that the other buffer usages seem to be safe.

This patch should solve the issue #1240. It should be backport to all stable
branches.

(cherry picked from commit c31b2008728babbc8738b88b593f2be40324369b)
Signed-off-by: Willy Tarreau <w@1wt.eu>

REGTESTS: add minimal CLI "add map" tests

The map_redirect test already tests for "show map", "del map" and
"clear map" but doesn't have any "add map" command. Let's add some
trivial ones involving one regular entry and two other ones added as
payload, checking they are properly returned.

(cherry picked from commit deee369cfabdea713e232c94c5d15f9613109fe9)
Signed-off-by: Willy Tarreau <w@1wt.eu>

MINOR: peers: add informative flags about resync process for debugging

This patch adds miscellenous informative flags raised during the initial
full resync process performed during the reload for debugging purpose.

0x00000010: Timeout waiting for a full resync from a local node
0x00000020: Timeout waiting for a full resync from a remote node
0x00000040: Session aborted learning from a local node
0x00000080: Session aborted learning from a remote node
0x00000100: A local node teach us and was fully up to date
0x00000200: A remote node teach us and was fully up to date
0x00000400: A local node teach us but was partially up to date
0x00000800: A remote node teach us but was partially up to date
0x00001000: A local node was assigned for a full resync
0x00002000: A remote node was assigned for a full resync
0x00004000: A resync was explicitly requested

This patch could be backported on any supported branch

(cherry picked from commit ccdfbae62cc5e5023bc857cbc04f680145435a1e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: peers: reset tables stage flags stages on new conns

Flags used as context to know current status of each table pushing a
full resync to a peer were correctly reset receiving a new resync
request or confirmation message but in case of local peer sync during
reload the resync request is implicit and those flags were not
correctly reset in this case.

This could result to a partial initial resync of some tables after reload
if the connection with the old process was broken and retried.

This patch reset those flags at the end of the handshake for all new
connections to be sure to push a entire full resync if needed.

This patch should be backported on all supported branches ( v >= 1.6 )

(cherry picked from commit 1a6b43e13eeb4872dff9944064ddadc1ceb5178f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly

Only entries between the opposite of the last 'local update' rotating
counter were considered to be pushed. This processing worked in most
cases because updates are continually pushed trying to reach this point
but it remains some cases where updates id are more far away in the past
and appearing in futur and the push of updates is stuck until the head
reach again the tail which could take a very long time.

This patch re-work the lookup to consider that all positions on the
rotating counter is considered in the past until we reach exactly
the 'local update' value. Doing this, the updates push won't be stuck
anymore.

This patch should be backported on all supported branches ( >= 1.6 )

(cherry picked from commit 8e7a13ed66aad97c550f3a6600145ed7c757a894)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>