Willy Tarreau [Thu, 8 Jul 2021 13:23:46 +0000 (15:23 +0200)]
BUG/MAJOR: pools: fix incomplete backport of lockless pool fix
Commit
bc76411e0 ("BUG/MAJOR: pools: fix possible race with free() in
the lockless variant") wasn't complete. The __pool_free() function also
needed a part of the lockless variant of the code that was in
pool_put_to_shared_cache() in 2.4. Without it, a __pool_free() during
a contented pool_alloc() may catch free_list == POOL_BUSY and copy it
into the updated free list, so that the next call to __pool_get_first()
loops forever and gets killed by the watchdog.
Sadly it requires load under contention on a non-glibc system to notice
it :-/
This fix is only for 2.3 and 2.2 since 2.2 now also contains the faulty
backport of the patch above.
Christopher Faulet [Wed, 7 Jul 2021 16:26:59 +0000 (18:26 +0200)]
[RELEASE] Released version 2.3.11
Released version 2.3.11 with the following main changes :
- BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
- BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
- BUG/MINOR: applet: Notify the other side if data were consumed by an applet
- BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
- BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
- BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
- BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
- BUG/MEDIUM: peers: reset commitupdate value in new conns
- BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
- BUG/MEDIUM: peers: reset tables stage flags stages on new conns
- MINOR: peers: add informative flags about resync process for debugging
- REGTESTS: add minimal CLI "add map" tests
- BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
- BUG/MEDIUM: cli: prevent memory leak on write errors
- BUG/MINOR: ssl/cli: fix a lock leak when no memory available
- MINOR: compat: automatically include malloc.h on glibc
- MEDIUM: pools: call malloc_trim() from pool_gc()
- MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS
- MINOR: debug: add a new "debug dev sym" command in expert mode
- BUG/MEDIUM: dns: reset file descriptor if send returns an error
- BUG/MINOR: stream: Decrement server current session counter on L7 retry
- BUG/MINOR: stream: properly clear the previous error mask on L7 retries
- BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
- BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
- BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
- BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port
- MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
- BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
- MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
- MEDIUM: mux-h1: Don't block reads when waiting for the other side
- BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
- REGTESTS: Add script to test abortonclose option
- BUG/MEDIUM: ebtree: Invalid read when looking for dup entry
- BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
- BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter
- BUG/MINOR: http-comp: Preserve HTTP_MSGF_COMPRESSIONG flag on the response
- BUG/MINOR: http-ana: Handle L7 retries on refused early data before K/A aborts
- BUG/MINOR: server: Missing calloc return value check in srv_parse_source
- BUG/MINOR: peers: Missing calloc return value check in peers_register_table
- BUG/MINOR: ssl: Missing calloc return value check in ssl_init_single_engine
- BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture
- BUG/MINOR: proxy: Missing calloc return value check in proxy_parse_declare
- BUG/MINOR: proxy: Missing calloc return value check in proxy_defproxy_cpy
- BUG/MINOR: http: Missing calloc return value check while parsing tcp-request/tcp-response
- BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule
- BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo
- BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list
- BUG/MINOR: http: Missing calloc return value check while parsing redirect rule
- BUG/MINOR: http: Missing calloc return value check in make_arg_list
- BUG/MINOR: proxy: Missing calloc return value check in chash_init_server_tree
- BUG/MINOR: lua/vars: prevent get_var() from allocating a new name
- DOC/MINOR: move uuid in the configuration to the right alphabetical order
- BUG/MAJOR: stream-int: Release SI endpoint on server side ASAP on retry
- DOC: use the req.ssl_sni in examples
- BUILD: make tune.ssl.keylog available again
- BUG/MINOR: ssl: OCSP stapling does not work if expire too far in the future
- BUG/MEDIUM: compression: Add a flag to know the filter is still processing data
- BUG/MINOR: pools: fix a possible memory leak in the lockless pool_flush()
- MINOR: pools: do not maintain the lock during pool_flush()
- MINOR: pools: call malloc_trim() under thread isolation
- MEDIUM: pools: use a single pool_gc() function for locked and lockless
- BUG/MAJOR: pools: fix possible race with free() in the lockless variant
- CLEANUP: pools: remove now unused seq and pool_free_list
- BUG/MAJOR: htx: Fix htx_defrag() when an HTX block is expanded
- BUG/MINOR: mux-fcgi: Expose SERVER_SOFTWARE parameter by default
- DOC: lua: Add a warning about buffers modification in HTTP
- BUG/MINOR: stick-table: insert srv in used_name tree even with fixed id
- BUG/MEDIUM: shctx: use at least thread-based locking on USE_PRIVATE_CACHE
- BUG/MINOR: ssl: use atomic ops to update global shctx stats
- BUG/MINOR: mworker: fix typo in chroot error message
- BUG/MAJOR: queue: set SF_ASSIGNED when setting strm->target on dequeue
- MINOR: backend: only skip LB when there are actual connections
- BUG/MINOR: stats: make "show stat typed desc" work again
- MINOR: mux-h2: obey http-ignore-probes during the preface
- BUG/MEDIUM: dns: send messages on closed/reused fd if fd was detected broken
- BUILD: cfgparse-ssl: Remove const from defpx param in keylog parsing function
- BUG/MINOR: resolvers: answser item list was randomly purged or errors
- MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item
- MEDIUM: resolvers: add a ref between servers and srv request or used SRV record
- BUG/MAJOR: resolvers: segfault using server template without SRV RECORDs
- BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI
- BUG/MINOR: server/cli: Fix locking in function processing "set server" command
- BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
- REGTESTS: fix maxconn update with agent-check
- MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules
- DOC: config: Add missing actions in "tcp-request session" documentation
- BUG/MINOR: checks: return correct error code for srv_parse_agent_check
- BUG/MINOR: tcpcheck: Fix numbering of implicit HTTP send/expect rules
- BUG/MEDIUM: sock: make sure to never miss early connection failures
- BUG/MINOR: cli: fix server name output in "show fd"
- BUG/MINOR: stick-table: fix several printf sign errors dumping tables
- DOC: stick-table: add missing documentation about gpt0 stored type
- DOC: peers: fix the protocol tag name in the doc
- DOC: config: use CREATE USER for mysql-check
- BUG/MINOR: server-state: load SRV resolution only if params match the config
- BUG/MINOR: server: Forbid to set fqdn on the CLI if SRV resolution is enabled
- MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
- MINOR: resolvers: Remove server from named_servers tree when removing a SRV item
- BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status
- BUG/MINOR: resolvers: Use resolver's lock in resolv_srvrq_expire_task()
- BUG/MINOR: resolvers: Always attach server on matching record on resolution
- BUG/MINOR: resolvers: Reset server IP when no ip is found in the response
- MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()
- BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution
- BUG/MINOR: peers: fix data_type bit computation more than 32 data_types
- Revert "MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules"
Christopher Faulet [Tue, 6 Jul 2021 09:25:36 +0000 (11:25 +0200)]
Revert "MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules"
This reverts commit
19bbbe05629ea947dd60d5b96d96f0066b047b97.
For now, set-src/set-src-port actions are directly performed on the client
connection. Using these actions at the stream level is really a problem with
HTTP connection (See #90) because all requests are affected by this change
and not only the current request. And it is worse with the H2, because
several requests can set their source address into the same connection at
the same time.
It is already an issue when these actions are called from "http-request"
rules. It is safer to wait a bit before adding the support to "tcp-request
content" rules. The solution is to be able to set src/dst address on the
stream and not on the connection when the action if performed from the L7
level..
Reverting the above commit means the issue #1303 is no longer fixed.
This patch must be backported in all branches containing the above commit
(as far as 2.0 for now).
(cherry picked from commit
23048875a4eacf5d7d4450d677cb077e67778b95)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
390f49477159de53d0506cd52bd6ed323febde0a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Thu, 1 Jul 2021 16:54:05 +0000 (18:54 +0200)]
BUG/MINOR: peers: fix data_type bit computation more than 32 data_types
This patch fixes the computation of the bit of the current data_type
in some part of code of peer protocol where the computation is limited
to 32bits whereas the bitfield of data_types can support 64bits.
Without this patch it could result in bugs when we will define more
than 32 data_types.
Backport is useless because there is currently less than 32 data_types
(cherry picked from commit
08b0f6780c45099b8d03bfd9e398d3f51519e667)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
45f12b06b491b433fdb951ba851284e46901a917)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 29 Jun 2021 18:52:35 +0000 (20:52 +0200)]
BUG/MEDIUM: resolvers: Make 1st server of a template take part to SRV resolution
The commit
3406766d5 ("MEDIUM: resolvers: add a ref between servers and srv
request or used SRV record") introduced a regression. The first server of a
template based on SRV record is no longer resolved. The same bug exists for
a normal server based on a SRV record.
In fact, the server used during parsing (used as reference when a
server-template line is parsed) is never attached to the corresponding srvrq
object. Thus with following lines, no resolution is performed because
"srvrq->attached_servers" is empty:
server-template test 1 _http.domain.tld resolvers dns ...
server test1 _http.domain.tld resolvers dns ...
This patch should fix the issue #1295 (but not confirmed yet it is the same
bug). It must be backported everywhere the above commit is.
(cherry picked from commit
81ba74ae5061bc55b3460f01a99371281de8f7f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
08736f990ed0387fa9c2340b1520c679fd429921)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 24 Jun 2021 13:33:52 +0000 (15:33 +0200)]
MINOR: resolvers: Reset server IP on error in resolv_get_ip_from_response()
If resolv_get_ip_from_response() returns an error (or an unexpected return
value), the server is set to RMAINT status. However, its address must also
be reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing. Note that it is a theorical patch because
this code path does not exist. Thus it is not tagged as a BUG.
This patch may be backported as far as 2.0.
(cherry picked from commit
07ecff589d5e67eacfacfe62bccd70ea825b8bc0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
b142fb40a4ce7566ee2ca42fb6b164f9402bbdcd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 24 Jun 2021 13:26:03 +0000 (15:26 +0200)]
BUG/MINOR: resolvers: Reset server IP when no ip is found in the response
For A/AAAA resolution, if no ip is found for a server in the response, the
server is set to RMAINT status. However, its address must also be
reset. Otherwise, it is still reported by the cli on "show servers state"
commands. This may be confusing.
This patch may be backported as far as 2.0.
(cherry picked from commit
a8ce497aacd6871c6056baf5a30e04f021956a5c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
7c129bba960aebec175b3fcf2d0ffa52d584552f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 24 Jun 2021 13:16:48 +0000 (15:16 +0200)]
BUG/MINOR: resolvers: Always attach server on matching record on resolution
On A/AAAA resolution, for a given server, if a record is matching, we must
always attach the server to this record. Before it was only done if the
server IP was not the same than the record one. However, it is a problem if
the server IP was not set for a previous resolution. From the libc during
startup for instance. In this case, the server IP is not updated and the
server is not attached to any record. It remains in this state while a
matching record is found in the DNS response. It is especially a problem
when the resolution is used for server-templates.
This bug was introduced by the commit
bd78c912f ("MEDIUM: resolvers: add a
ref on server to the used A/AAAA answer item").
This patch should solve the issue #1305. It must be backported to all
versions containing the above commit.
(cherry picked from commit
d7bb23490c1a71542c7e98b7c44d92df081c7e52)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
2558d5aac1147b6e5c028152fc5382492dc6807c)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 18 Jun 2021 07:05:49 +0000 (09:05 +0200)]
BUG/MINOR: resolvers: Use resolver's lock in resolv_srvrq_expire_task()
The commit
dcac41806 ("BUG/MEDIUM: resolvers: Add a task on servers to check
SRV resolution status") introduced a type. In resolv_srvrq_expire_task()
function, the resolver's lock must be used instead of the resolver itself.
This patch must be backported with the patch above (at least as far as 2.2).
(cherry picked from commit
e886dd5c32aa1bc99acf48c5bb1826f80d96923d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
152b50c38ab7ef1829d9a0115c393f518860314f)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 15 Jun 2021 14:17:17 +0000 (16:17 +0200)]
BUG/MEDIUM: resolvers: Add a task on servers to check SRV resolution status
When a server relies on a SRV resolution, a task is created to clean it up
(fqdn/port and address) when the SRV resolution is considered as outdated
(based on the resolvers 'timeout' value). It is only possible if the server
inherits outdated info from a state file and is no longer selected to be
attached to a SRV item. Note that most of time, a server is attached to a
SRV item. Thus when the item becomes obsolete, the server is cleaned
up.
It is important to have such task to be sure the server will be free again
to have a chance to be resolved again with fresh information. Of course,
this patch is a workaround to solve a design issue. But there is no other
obvious way to fix it without rewritting all the resolvers part. And it must
be backportable.
This patch relies on following commits:
* MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
* MINOR: resolvers: Remove server from named_servers tree when removing a SRV item
All the series must be backported as far as 2.2 after some observation
period. Backports to 2.0 and 1.8 must be evaluated.
(cherry picked from commit
dcac41806239d4f3b2de5e4d1bacf9c03ca99efb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
2c0f527b61837bfcd3c1a86c175778a1d5ba2e2d)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 15 Jun 2021 14:14:37 +0000 (16:14 +0200)]
MINOR: resolvers: Remove server from named_servers tree when removing a SRV item
When a server is cleaned up because the corresponding SRV item is removed,
we always remove the server from the srvrq's name_servers tree. For now, it
is useless because, if a server was attached to a SRV item, it means it was
already removed from the tree. But it will be mandatory to fix a bug.
(cherry picked from commit
73001ab6e39fb25ff406bbcd3809d0a1fa0131b5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
a62d9743fd4efaab5e08c94229af390e4afcd17a)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 15 Jun 2021 14:08:48 +0000 (16:08 +0200)]
MINOR: resolvers: Clean server in a dedicated function when removing a SRV item
A dedicated function is now used to clean up servers when a SRV item becomes
obsolete or when a requester is removed from a resolution. This patch is
mandatory to fix a bug.
(cherry picked from commit
11c6c396560107ac892a553bfa0dada222823b1c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
5f8ccff67f98efdf8b3e6cb2c02593e4b4aa5b7e)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 15 Jun 2021 09:37:40 +0000 (11:37 +0200)]
BUG/MINOR: server: Forbid to set fqdn on the CLI if SRV resolution is enabled
If a server is configured to rely on a SRV resolution, we must forbid to
change its fqdn on the CLI. Indeed, in this case, the server retrieves its
fqdn from the SRV resolution. If the fqdn is changed via the CLI, this
conflicts with the SRV resolution and leaves the server in an undefined
state. Most of time, the SRV resolution remains enabled with no effect on
the server (no update). Some time the A/AAAA resolution for the new fqdn is
not enabled at all. It depends on the server state and resolver state when
the CLI command is executed.
This patch must be backported as far as 2.0 (maybe to 1.8 too ?) after some
observation period.
(cherry picked from commit
a386e7882378a4bd3afdd2915fefe4bcc9fa286d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
5b04f4634308f57d9d03f798aaea0c7667a6d1d7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 10 Jun 2021 14:59:53 +0000 (16:59 +0200)]
BUG/MINOR: server-state: load SRV resolution only if params match the config
When the state of a server is loaded, if there is no hostname defined for
this server and if a fqdn and a server record are retrieved from the state
file, it means the server should rely on a SRV resolution. But we must be
sure the server is configured this way. A SRV resolution must be configured
with the same SRV record. This part must be skipped if there is no SRV
resolution configured for this server or if the SRV record used is not the
same.
This patch should be backported as far as 1.8 after some observation period.
(cherry picked from commit
85af93b8c74b5664fc473cf84d59ce6279249216)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
ccb720d3faf0847eefbd1905130fef383efb1e4c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Daniel Black [Thu, 1 Jul 2021 02:09:32 +0000 (12:09 +1000)]
DOC: config: use CREATE USER for mysql-check
CREATE USER has been the standard way of creating users since
MySQL-5.0 (2005).
The current syntax of INSERT INTO mysql.user won't actually work
on MariaDB-10.4+.
Because haproxy doesn't use any resources the MySQL executable comment
syntax provides resource contraints to make it more palatable
to risk adverse users.
/*!50701 is a syntax recognised by MySQL and MariaDB 5.7.1+ when
resource contraints where added.
/*M!100201 is a MariaDB executable comment syntax recognised for MariaDB
for the 10.2.1 where the MAX_STATEMENT_TIME was added.
This patch may be backported as far as 2.0.
(cherry picked from commit
d3e7dc498baeab3535fcaf48f8983138d35442f5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
9dc310dc520fb5fbc9dbd9594f7cfaa1472372fe)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Sun, 9 May 2021 04:38:07 +0000 (06:38 +0200)]
DOC: peers: fix the protocol tag name in the doc
The peers protocol has been using "HAProxyS" as a binary tag sent on the
wire since day one in 1.5-dev3 with commit
2b920a1af ("[MAJOR] Add new
files src/peer.c, include/proto/peers.h and include/types/peers.h for
sync stick table management"), regardless, the doc says the protocol
identifier is "HaproxyS". It is likely this got fixed in the code before
merging and not in the doc.
This should be backported to any release as the doc is wrong.
(cherry picked from commit
84c4e7451eb99261758f3e009d3e938506dd8982)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Emeric Brun [Thu, 1 Jul 2021 16:34:48 +0000 (18:34 +0200)]
DOC: stick-table: add missing documentation about gpt0 stored type
The store type 'gpt0' was present in code but was not documented.
The patch fix this and should be backported since 'gpt0' is supported.
[wt: ~1.6-dev4 hence all stable]
(cherry picked from commit
1a6b7254de849765bef34f7b65b92ff63bcea1ee)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a5d1531dc047f7aa3e672411b63e0aca708b109d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Emeric Brun [Wed, 30 Jun 2021 14:24:04 +0000 (16:24 +0200)]
BUG/MINOR: stick-table: fix several printf sign errors dumping tables
This patch fixes several errors printing integers
of stick table entry values and args during dump on cli.
This patch should be backported since the dump of entries
is supported. [wt: roughly 1.5-dev1 hence all stable branches]
(cherry picked from commit
01928ae56b0296e5312eca195a4e155d6e8d79d0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
c24b414034d4b7f848ba37ac0a6fe28271375b4b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 6 Jul 2021 09:41:10 +0000 (11:41 +0200)]
BUG/MINOR: cli: fix server name output in "show fd"
A server name was displayed as <srv>/<proxy> instead of the reverse.
It only confuses diagnostics. This was introduced by commit
7a4a0ac71
("MINOR: cli: add a new "show fd" command") so this fix can be backport
down to 1.8.
(cherry picked from commit
dfb34a8f87cacc1ced48f555ab74b6630ce4bb08)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
ab40792af6b81e45e632b5c14d396aba4efb4cde)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 6 Jul 2021 06:29:20 +0000 (08:29 +0200)]
BUG/MEDIUM: sock: make sure to never miss early connection failures
As shown in issue #1251, it is possible for a connect() to report an
error directly via the poller without ever reporting send readiness,
but currentlt sock_conn_check() manages to ignore that situation,
leading to high CPU usage as poll() wakes up on these FDs.
The bug was apparently introduced in 1.5-dev22 with commit
fd803bb4d
("MEDIUM: connection: add check for readiness in I/O handlers"), but
was likely only woken up by recent changes to conn_fd_handler() that
made use of wakeups instead of direct calls between 1.8 and 1.9,
voiding any chance to catch such errors in the early recv() callback.
The exact sequence that leads to this situation remains obscure though
because the poller does not report send readiness nor does it report an
error. Only HUP and IN are reported on the FD. It is also possible that
some recent kernel updates made this condition appear while it never
used to previously.
This needs to be backported to all stable branches, at least as far
as 2.0. Before 2.2 the code was in tcp_connect_probe() in proto_tcp.c.
(cherry picked from commit
5a9c637bf3f9daf595d5a5cd0e98961d6fdc4b1b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
9547338fb4a4dda6aee428ee40aacf09ea2607b2)
[wt: code is in conn_fd_check() in 2.3]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 25 Jun 2021 09:37:45 +0000 (11:37 +0200)]
BUG/MINOR: tcpcheck: Fix numbering of implicit HTTP send/expect rules
The index of the failing rule is reported in the health-check log message. The
rules index is also used in the check traces. But for implicit HTTP send/expect
rules, the index is wrong. It must be incremented by one compared to the
preceding rule.
This patch may be backported as far as 2.2.
(cherry picked from commit
fa5880bd539b9069fd1b139d32cffae2bb3c2c3c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
b3cb3221e5dabc73deec942a3475ef21289853bd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Dirkjan Bussink [Fri, 18 Jun 2021 19:57:49 +0000 (19:57 +0000)]
BUG/MINOR: checks: return correct error code for srv_parse_agent_check
In srv_parse_agent_check the error code is not returned in case
something goes wrong. The value 0 is always return.
Additionally, there's a small cleanup of unreachable returns that in
most checks are not present either and removed in two places they were
present. This makes the code consistent across the different checks.
(cherry picked from commit
dfee217b68a241f2723c2a4b67d2af187f0b6690)
[cf: Must be backported as far as 2.2]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
1065e7edb7bd1b536bd3de6531463d16000d4956)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 23 Jun 2021 10:19:25 +0000 (12:19 +0200)]
DOC: config: Add missing actions in "tcp-request session" documentation
set-src/set-src-port and set-dst/set-dst-port actions were not listed in the
documentation of "tcp-request session".
This patch may be backported to all stable versions.
(cherry picked from commit
14aec6e8aeee7e31acb05cc546b8631e30cd129b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
57759f3c00c5f782255c74e58df2b45a7d4c02b0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 23 Jun 2021 10:07:21 +0000 (12:07 +0200)]
MINOR: tcp-act: Add set-src/set-src-port for "tcp-request content" rules
If it possible to set source IP/Port from "tcp-request connection",
"tcp-request session" and "http-request" rules but not from "tcp-request
content" rules. There is no reason for this limitation and it may be a
problem for anyone wanting to call a lua fetch to dynamically set source
IP/Port from a TCP proxy. Indeed, to call a lua fetch, we must have a
stream. And there is no stream when "tcp-request connection/session" rules
are evaluated.
Thanks to this patch, "set-src" and "set-src-port" action are now supported
by "tcp_request content" rules.
This patch is related to the issue #1303. It may be backported to all stable
versions.
(cherry picked from commit
19bbbe05629ea947dd60d5b96d96f0066b047b97)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
40f95c49bbbc79f99fe4d1554c092d0764606a46)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Tue, 22 Jun 2021 14:23:11 +0000 (16:23 +0200)]
REGTESTS: fix maxconn update with agent-check
Correct the typo in the parameter used to update the 'maxconn' via
agent-check. The test is also completed to detect the update of maxconn
using CLI 'show stats'.
(cherry picked from commit
e500e593a79e3797eada5a839aeee017b002d7bf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
c16c6dee710dc1e729c47de7e23d48a627749111)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Fri, 18 Jun 2021 09:11:36 +0000 (11:11 +0200)]
BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :
commit
79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6
BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.
This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
server_parse_maxconn_change_request must again be called with the
server lock.
- to counter the deadlock fixed by the above commit, process_srv_queue
now takes an argument to render the server locking optional if the
caller already held it. This is only used by
server_parse_maxconn_change_request.
The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.
(cherry picked from commit
0274286dd3768c0d5e58588a1cb7e7e710fbc9d4)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
caaafd055edcbceb3d57ed9672b90900ec52a203)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 18 Jun 2021 06:47:14 +0000 (08:47 +0200)]
BUG/MINOR: server/cli: Fix locking in function processing "set server" command
The commit
c7b391aed ("BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn
is set from the CLI") introduced 2 bugs. The first one is a typo on the
server's lock label (s/SERVER_UNLOCK/SERVER_LOCK/). The second one is about
the server's lock itself. It must be acquired to execute the "agent-send"
subcommand.
The patch above is marked to be backported as far as 1.8. Thus, this one
must also backported as far 1.8.
BUG/MINOR: server/cli: Don't forget to lock server on agent-send subcommand
(cherry picked from commit
0ba54bb40102787ce875c5d15cb0fb4193f178ec)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
88fca377546267d24ab23cb7f7d2e0eef9d71ee6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 15 Jun 2021 10:01:29 +0000 (12:01 +0200)]
BUG/MEDIUM: server/cli: Fix ABBA deadlock when fqdn is set from the CLI
To perform servers resolution, the resolver's lock is first acquired then
the server's lock when necessary. However, when the fqdn is set via the CLI,
the opposite is performed. So, it is possible to experience an ABBA
deadlock.
To fix this bug, the server's lock is acquired and released for each
subcommand of "set server" with an exception when the fqdn is set. The
resolver's lock is first acquired. Of course, this means we must be sure to
have a resolver to lock.
This patch must be backported as far as 1.8.
(cherry picked from commit
c7b391aed21f384ac38b9f2f98239c5f1cdd1895)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
77df12ca694b168326c29efa2580a95427ad644e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Mon, 14 Jun 2021 08:02:18 +0000 (10:02 +0200)]
BUG/MAJOR: resolvers: segfault using server template without SRV RECORDs
This patch fix the issue adding a test in srvrq before registering
the server on it during server template init.
This was a regression due to commit :
3406766d57fc11478d54a6fa2d048cbfe4524a4e
This should be backported with this previous commit (until 2.0)
(cherry picked from commit
caef19e0c7594b00aa341d8984a33ddd0f15113f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
3ecaf53374306519ff4394a7e7eebfe05235b9d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Fri, 11 Jun 2021 08:48:45 +0000 (10:48 +0200)]
MEDIUM: resolvers: add a ref between servers and srv request or used SRV record
This patch add a ref into servers to register them onto the
record answer item used to set their hostnames.
It also adds a head list into 'srvrq' to register servers free
to be affected to a SRV record.
A head of a tree is also added to srvrq to put servers which
present a hotname in server state file. To re-link them fastly
to the matching record as soon an item present the same name.
This results in better performances on SRV record response
parsing.
This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
(cherry picked from commit
3406766d57fc11478d54a6fa2d048cbfe4524a4e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
0c4a8a3cfc0e7cf8df28012222f930e55bcb7df4)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Fri, 11 Jun 2021 08:08:05 +0000 (10:08 +0200)]
MEDIUM: resolvers: add a ref on server to the used A/AAAA answer item
This patch adds a head list into answer items on servers which use
this record to set their IPs. It makes lookup on duplicated ip faster and
allow to check immediatly if an item is still valid renewing the IP.
This results in better performances on A/AAAA resolutions.
This is an optimization but it could avoid to trigger the haproxy's
internal wathdog in some circumstances. And for this reason
it should be backported as far we can (2.0 ?)
(cherry picked from commit
bd78c912fd73b8506a39c53b6f0b00330f998aca)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
f9ca5d8bc12441c3c21a66b92ea401803a102f2b)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Thu, 10 Jun 2021 13:25:25 +0000 (15:25 +0200)]
BUG/MINOR: resolvers: answser item list was randomly purged or errors
In case of SRV records, The answer item list was purged by the
error callback of the first requester which considers the error
could not be safely ignored. It makes this item list unavailable
for subsequent requesters even if they consider the error
could be ignored.
On A resolution or do_resolve action error, the answer items were
never trashed.
This patch re-work the error callbacks and the code to check the return code
If a callback return 1, we consider the error was ignored and
the answer item list must be kept. At the opposite, If all error callbacks
of all requesters of the same resolution returns 0 the list will be purged
This patch should be backported as far as 2.0.
(cherry picked from commit
12ca658dbe5bc3a3f47de06b2c19d5f0ee08941e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
43839d0ec40d8aa57c97955fc2dc6bfc1404d3c0)
[cf: Changes applied in src/dns.c instead of src/resolvers.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 21 Jun 2021 09:17:30 +0000 (11:17 +0200)]
BUILD: cfgparse-ssl: Remove const from defpx param in keylog parsing function
defpx parameter is const in cfg parsing functions since the 2.4. It is not
const in 2.3 and below. This build issue was introduced by the commit
b32469435c ("BUILD: make tune.ssl.keylog available again").
This patch is 2.3-specific. Thus there is upstream commid ID. And no
backport is needed, except if the commit above is backported.
Emeric Brun [Thu, 17 Jun 2021 16:23:04 +0000 (18:23 +0200)]
BUG/MEDIUM: dns: send messages on closed/reused fd if fd was detected broken
This fix complete this incomplete bug fix '(
751872e4)':
'BUG/MEDIUM: dns: reset file descriptor if send returns an error
This previous patch detects error on fd and close this one but the code
continues the loop on pending queries and try to send them using the
previously closed fd which could be reused by an other thread.
This patch stop to send queries on this fd and count them on
snd_error counter.
This patch applies on branch 2.3 and all this stuff is already
fixed in newer version by '(
d26a6237ad)':
'MEDIUM: resolvers: split resolving and dns message exchange layers'
This patch should be backported on all branches including the previous
fix '(
751872e4)':
'BUG/MEDIUM: dns: reset file descriptor if send returns an error'
Willy Tarreau [Thu, 17 Jun 2021 06:08:48 +0000 (08:08 +0200)]
MINOR: mux-h2: obey http-ignore-probes during the preface
We're seeing some browsers setting up multiple connections and closing
some to just keep one. It looks like they do this in case they'd
negotiate H1. This results in aborted prefaces and log pollution about
bad requests and "PR--" in the status flags.
We already have an option to ignore connections with no data, it's called
http-ignore-probes. But it was not used by the H2 mux. However it totally
makes sense to use it during the preface.
This patch changes this so that connections aborted before sending the
preface can avoid being logged.
This should be backported to 2.4 and 2.3 at least, and probably even
as far as 2.0.
(cherry picked from commit
ee4684f65b6ea627e34395d254daf7971d3ed90f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
c23d6d11bc7014b58feed96e91aa55aff3d03a2e)
[wt: s/HA_ATOMIC_INC/HA_ATOMIC_ADD]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 17 Jun 2021 05:22:27 +0000 (07:22 +0200)]
BUG/MINOR: stats: make "show stat typed desc" work again
As part of the changes to support per-module stats data in 2.3-dev6
with commit
ee63d4bd6 ("MEDIUM: stats: integrate static proxies stats
in new stats"), a small change resulted in the description field to
be replaced by the name field, making it pointless. Let's fix this
back.
This should fix issue #1291. Thanks to Nick Ramirez for reporting this
issue.
This patch can be backported to 2.3.
(cherry picked from commit
fc8e438637aa926e681c8b7f3fa061021dfb2201)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
b653d2f152a3ab837fea5c3b559e89133031e791)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 9 Jun 2021 13:56:16 +0000 (15:56 +0200)]
MINOR: backend: only skip LB when there are actual connections
In 2.3, a significant improvement was brought against situations where
the queue was heavily used, because some LB algos were still checked
for no reason before deciding to put the request into the queue. This
was commit
82cd5c13a ("OPTIM: backend: skip LB when we know the backend
is full").
As seen in previous commit ("BUG/MAJOR: queue: set SF_ASSIGNED when
setting strm->target on dequeue") the dequeuing code is extremely
tricky, and the optimization above tends to emphasize transient issues
by making them permanent until the next reload, which is not acceptable
as the code must always be robust against any bad situation.
This commit brings a protection against such a situation by slightly
relaxing the test. Instead of checking that there are pending connections
in the backend queue, it also verifies that the backend's connections are
not solely composed of queued connections, which would then indicate we
are in this situation. This is not rocket science, but at least if the
situation happens, we know that it will unlock by itself once the streams
have left, as new requests will be allowed to reach the servers and to
flush the queue again.
This needs to be backported to 2.4 and 2.3.
(cherry picked from commit
f9a7c442f64263476210756e375ff923d7f6cf33)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a7ed7d4cea76cec55023219821d93b1c2ceebc75)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 16 Jun 2021 06:42:23 +0000 (08:42 +0200)]
BUG/MAJOR: queue: set SF_ASSIGNED when setting strm->target on dequeue
Commit
82cd5c13a ("OPTIM: backend: skip LB when we know the backend is
full") has uncovered a long-burried bug in the dequeing code: when a
server releases a connection, it picks a new one from the proxy's or
its queue. Technically speaking it only picks a pendconn which is a
link between a position in the queue and a stream. It then sets this
pendconn's target to itself, and wakes up the stream's task so that
it can try to connect again.
The stream then goes through the regular connection setup phases,
calls back_try_conn_req() which calls pendconn_dequeue(), which
sets the stream's target to the pendconn's and releases the pendconn.
It then reaches assign_server() which sees no SF_ASSIGNED and calls
assign_server_and_queue() to perform load balancing or queuing. This
one first destroys the stream's target and gets ready to perform load
balancing. At this point we're load-balancing for no reason since we
already knew what server was available. And this is where the commit
above comes into play: the check for the backend's queue above may
detect other connections that arrived in between, and will immediately
return FULL, forcing this request back into the queue. If the server
had a very low maxconn (e.g. 1 due to a long slowstart), it's possible
that this evicted connection was the last one on the server and that
no other one will ever be present to process the queue. Usually a
regularly processed request will still have its own srv_conn that will
be used during stream_free() to dequeue other connections. But if the
server had a down-up cycle, then a call to pendconn_grab_from_px()
may start to dequeue entries which had no srv_conn and which will have
no server slot to offer when they expire, thus maintaining the situation
above forever. Worse, as new requests arrive, there are always some
requests in the queue and the situation feeds on itself.
The correct fix here is to properly set SF_ASSIGNED in pendconn_dequeue()
when the stream's target is assigned (as it's what this flag means), so
as to avoid a load-balancing pass when dequeuing.
Many thanks to Pierre Cheynier for the numerous detailed traces he
provided that helped narrow this problem down.
This could be backported to all stable versions, but in practice only
2.3 and above are really affected since the presence of the commit
above. Given how tricky this code is it's better to limit it to those
versions that really need it.
(cherry picked from commit
7867cebf31403b6d7bab97cb3047e6d77238fcdc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a6cce193896a91c2cb7d0585191f51e1e0fc3fdd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 06:59:19 +0000 (08:59 +0200)]
BUG/MINOR: mworker: fix typo in chroot error message
Since its introduction in 1.8 with commit
095ba4c24 ("MEDIUM: mworker:
replace systemd mode by master worker mode"), it says "cannot chroot1(...)"
which seems to be a leftover of a debug message. It could be backported but
probably nobody will notice.
(cherry picked from commit
e34cf2801154eae542191120d5165a963d2cb98c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
46317376f2f3be9c7e7df0c2b996716a5818fa77)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 14:39:22 +0000 (16:39 +0200)]
BUG/MINOR: ssl: use atomic ops to update global shctx stats
The global shctx lookups and misses was updated without using atomic
ops, so the stats available in "show info" are very likely off by a few
units over time. This should be backported as far as 1.8. Versions
without _HA_ATOMIC_INC() can use HA_ATOMIC_ADD(,1).
(cherry picked from commit
4c19e996218f6c205c1716a0b4718f9bced7f893)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
37d69399648e489fea9f93b7e9fae02dfa59acbb)
[wt: s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 13:03:19 +0000 (15:03 +0200)]
BUG/MEDIUM: shctx: use at least thread-based locking on USE_PRIVATE_CACHE
Since threads were introduced in 1.8, the USE_PRIVATE_CACHE mode of the
shctx was not updated to use locks. Originally it was meant to disable
sharing between processes, so it removes the lock/unlock instructions.
But with threads enabled, it's not possible to work like this anymore.
It's easy to see that once built with private cache and threads enabled,
sending violent SSL traffic to the the process instantly makes it die.
The HTTP cache is very likely affected as well.
This patch addresses this by falling back to our native spinlocks when
USE_PRIVATE_CACHE is used. In practice we could use them also for other
modes and remove all older implementations, but this patch aims at keeping
the changes very low and easy to backport. A new SHCTX_LOCK label was
added to help with debugging, but OTHER_LOCK might be usable as well
for backports.
An even lighter approach for backports may consist in always declaring
the lock (or reusing "waiters"), and calling pl_take_s() for the lock()
and pl_drop_s() for the unlock() operation. This could even be used in
all modes (process and threads), even when thread support is disabled.
Subsequent patches will further clean up this area.
This patch must be backported to all supported versions since 1.8.
(cherry picked from commit
9e467af804b3dfa05eaa4677644f5ff0c0e0619f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
ca36771425fbced2c81de5259da304dc4fa0254c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Amaury Denoyelle [Mon, 14 Jun 2021 15:04:25 +0000 (17:04 +0200)]
BUG/MINOR: stick-table: insert srv in used_name tree even with fixed id
If the server id is fixed in the configuration, it is immediately
inserted in the 'used_server_id' backend tree via srv_parse_id. On
check_config_validity, the dynamic id generation is thus skipped for
fixed-id servers. However, it must nevertheless be inserted in the
'used_server_name' backend tree.
This bug seems to be not noticeable for the user. Indeed, before the
fix, the search in sticking_rule_find_target always returned NULL for
the name, then the fallback search with server id succeeded, so the
persistence is properly applied. However with the fix the fallback
search is not executed anymore, which saves from the locking of
STK_SESS.
This should be backported up to 2.0.
(cherry picked from commit
077c6b8d29ba63a625648ebb73bf151a793f0096)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
50779aefca2b685bf6e2c922d02bfab59195534a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 14 Jun 2021 09:43:18 +0000 (11:43 +0200)]
DOC: lua: Add a warning about buffers modification in HTTP
Since the 1.9, it is forbidden to alter the channel buffer from an HTTP
stream because there is no way to keep the HTTP parser synchronized if the
buffer content is altered. In addition, since the HTX is the only
reprensentation for HTTP messages, the data in HTTP buffers are structured
and cannot be read or updated in a raw fashion.
A warning is triggered when a user tries to alter an HTTP buffer. However,
it was not documented. This patch adds a warning in the lua documentation.
This patch is related to the issue #1287. It may be backported as far as
2.0.
(cherry picked from commit
095303956708a3f87594ae60cd9d7df12d12d905)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
1cda6c8bd2461819cfc7b3c9625b2cb49d808406)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 11 Jun 2021 11:34:42 +0000 (13:34 +0200)]
BUG/MINOR: mux-fcgi: Expose SERVER_SOFTWARE parameter by default
As specified in the RFC3875 (section 4.1.17), this parameter must be set to
the name and version of the information server software making the CGI
request. Thus, it is now added to the default parameters defined by
HAProxy. It is set to the string "HAProxy $version".
This patch should fix the issue #1285 and must be backported as far as 2.2.
(cherry picked from commit
5cd0e528cf2bed6eb1f79582ef89cf1667337e46)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
b2a50298b1f6f46c943b77685142d59d3e69e193)
[wt: minor ctx adjustments]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 9 Jun 2021 15:30:40 +0000 (17:30 +0200)]
BUG/MAJOR: htx: Fix htx_defrag() when an HTX block is expanded
When an HTX block is expanded, a defragmentation may be performed first to
have enough space to copy the new data. When it happens, the meta data of
the HTX message must take account of the new data length but copied data are
still unchanged at this stage (because we need more space to update the
message content). And here there is a bug because the meta data are updated
by the caller. It means that when the blocks content is copied, the new
length is already set. Thus a block larger than the reality is copied and
data outside the buffer may be accessed, leading to a crash.
To fix this bug, htx_defrag() is updated to use an extra argument with the
new meta data to use for the referenced block. Thus the caller does not need
to update the HTX message by itself. However, it still have to update the
data.
Most of time, the bug will be encountered in the HTTP compression
filter. But, even if it is highly unlikely, in theory it is also possible to
hit it when a HTTP header (or only its value) is replaced or when the
start-line is changed.
This patch must be backported as far as 2.0.
(cherry picked from commit
1cf414b522b465c97e5e87e4e38be93e6f634b32)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
159873a5b380933e69444ae9c42fce6cc116599d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 06:46:28 +0000 (08:46 +0200)]
CLEANUP: pools: remove now unused seq and pool_free_list
These ones were only used by the lockless implementation and are not
needed anymore.
(cherry picked from commit
1526ffe815ef6136eaabb9d1c188fe7e636f2062)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
47245188ace9165555dda4c8712527367de96c2c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 9 Jun 2021 16:59:58 +0000 (18:59 +0200)]
BUG/MAJOR: pools: fix possible race with free() in the lockless variant
In GH issue #1275, Fabiano Nunes Parente provided a nicely detailed
report showing reproducible crashes under musl. Musl is one of the libs
coming with a simple allocator for which we prefer to keep the shared
cache. On x86 we have a DWCAS so the lockless implementation is enabled
for such libraries.
And this implementation has had a small race since day one: the allocator
will need to read the first object's <next> pointer to place it into the
free list's head. If another thread picks the same element and immediately
releases it, while both the local and the shared pools are too crowded, it
will be freed to the OS. If the libc's allocator immediately releases it,
the memory area is unmapped and we can have a crash while trying to read
that pointer. However there is no problem as long as the item remains
mapped in memory because whatever value found there will not be placed
into the head since the counter will have changed.
The probability for this to happen is extremely low, but as analyzed by
Fabiano, it increases with the buffer size. On 16 threads it's relatively
easy to reproduce with 2MB buffers above 200k req/s, where it should
happen within the first 20 seconds of traffic usually.
This is a structural issue for which there are two non-trivial solutions:
- place a read lock in the alloc call and a barrier made of lock/unlock
in the free() call to force to serialize operations; this will have
a big performance impact since free() is already one of the contention
points;
- change the allocator to use a self-locked head, similar to what is
done in the MT_LISTS. This requires two memory writes to the head
instead of a single one, thus the overhead is exactly one memory
write during alloc and one during free;
This patch implements the second option. A new POOL_DUMMY pointer was
defined for the locked pointer value, allowing to both read and lock it
with a single xchg call. The code was carefully optimized so that the
locked period remains the shortest possible and that bus writes are
avoided as much as possible whenever the lock is held.
Tests show that while a bit slower than the original lockless
implementation on large buffers (2MB), it's 2.6 times faster than both
the no-cache and the locked implementation on such large buffers, and
remains as fast or faster than the all implementations when buffers are
48k or higher. Tests were also run on arm64 with similar results.
Note that this code is not used on modern libcs featuring a fast allocator.
A nice benefit of this change is that since it removes a dependency on
the DWCAS, it will be possible to remove the locked implementation and
replace it with this one, that is then usable on all systems, thus
significantly increasing their performance with large buffers.
Given that lockless pools were introduced in 1.9 (not supported anymore),
this patch will have to be backported as far as 2.0. The code changed
several times in this area and is subject to many ifdefs which will
complicate the backport. What is important is to remove all the DWCAS
code from the shared cache alloc/free lockless code and replace it with
this one. The pool_flush() code is basically the same code as the
allocator, retrieving the whole list at once. If in doubt regarding what
barriers to use in older versions, it's safe to use the generic ones.
This patch depends on the following previous commits:
- MINOR: pools: do not maintain the lock during pool_flush()
- MINOR: pools: call malloc_trim() under thread isolation
- MEDIUM: pools: use a single pool_gc() function for locked and lockless
The last one also removes one occurrence of an unneeded DWCAS in the
code that was incompatible with this fix. The removal of the now unused
seq field will happen in a future patch.
Many thanks to Fabiano for his detailed report, and to Olivier for
his help on this issue.
(cherry picked from commit
2a4523f6f4cba1cb9c899d0527a1bb4c5d5c0f2e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
d0cc3761495b55764e58aff97f7c82a687c5239f)
[wt: backported into the lockless functions only; kept pool_free_area()
and local accounting instead of pool_free_to_os(); tested with random
pool_flush() and heavy pool_gc() calls]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 08:21:35 +0000 (10:21 +0200)]
MEDIUM: pools: use a single pool_gc() function for locked and lockless
Locked and lockless shared pools don't need to use a different pool_gc()
function because this function isolates itself during the operation, so
we do not need to rely on DWCAS nor any atomic operation in fact. Let's
just get rid of the lockless one in favor of the simple one. This should
even result in a faster execution.
The ifdefs were slightly moved so that we can have pool_gc() defined
as soon as there are global pools, this avoids duplicating the function.
(cherry picked from commit
9b3ed51371693f70bab07e09cdd34e26e3a46d93)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a206cf93eb17e2d0f8184b91a0775ada763e7f71)
[wt: for 2.3, this was simplified, only the lockless version was
reworked to work without the DWCAS, very similarly to the one in
2.4 except that accounting still has to be done]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 06:40:16 +0000 (08:40 +0200)]
MINOR: pools: call malloc_trim() under thread isolation
pool_gc() was adjusted to run under thread isolation by commit
c0e2ff202
("MEDIUM: memory: make pool_gc() run under thread isolation") so that the
underlying malloc() and free() don't compete between threads during these
potentially aggressive moments (especially when mmap/munmap are involved).
Commit
88366c292 ("MEDIUM: pools: call malloc_trim() from pool_gc()")
later added a call to malloc_trim() but made it outside of the thread
isolation, which is contrary to the principle explained above. Also it
missed it in the locked version, meaning that those without a lockless
implementation cannot benefit from trimming.
This patch fixes that by calling it before thread_release() in both
places.
(cherry picked from commit
26ed183556f3df9e9bde6f6d47143aa4a18ae463)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
46b515ca55b113827ed3e52937cf7ad154d2f6c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 05:13:04 +0000 (07:13 +0200)]
MINOR: pools: do not maintain the lock during pool_flush()
The locked version of pool_flush() is absurd, it locks the pool for each
and every element to be released till the end. Not only this is extremely
inefficient, but it may even never finish if other threads spend their
time refilling the pool. The only case where this can happen is during
soft-stop so the risk remains limited, but it should be addressed.
(cherry picked from commit
c88914379da35c46d093f6d410b9507355aacd0a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
af8120a53cf6a7a8e244135893195c6b65ccc91a)
[wt: in 2.3 the pool_flush code is different because it still requires
to manipulate ->allocated under the lock, thus the lock covers the
counting and detaching, and the freeing is performed out of the lock]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 04:54:22 +0000 (06:54 +0200)]
BUG/MINOR: pools: fix a possible memory leak in the lockless pool_flush()
The lockless version of pool_flush() had a leftover of the original
version causing the pool's first entry to be set to NULL at the end.
The problem is that it does this outside of any lock and in a non-
atomic way, so that any concurrent alloc+free would result in a lost
object.
The risk is low and the consequence even lower, given that pool_flush()
is only used in pool_destroy() (hence single-threaded) or by stream_free()
during a soft-stop (not the place where most allocations happen), so in
the worst case it could result in valgrind complaining on soft-stop.
The bug was introduced with the first version of the code, in 1.9, so
the fix can be backported to all stable versions.
(cherry picked from commit
c239cde26fbf0b07abdf590b77c31154bd65c566)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
caf6555f418767a1298aa7e273609fc1510184db)
[wt: adjusted context: pool->allocated still updated last]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 9 Jun 2021 15:12:44 +0000 (17:12 +0200)]
BUG/MEDIUM: compression: Add a flag to know the filter is still processing data
Since the commit
acfd71b97 ("BUG/MINOR: http-comp: Preserve
HTTP_MSGF_COMPRESSIONG flag on the response"), there is no more flag to know
when the compression ends. This means it is possible to finish the
compression several time if there are trailers.
So, we reintroduce almost the same mechanism but with a dedicated flag. So
now, there is a bits field in the compression filter context.
The commit above is marked to be backported as far as 2.0. Thus this patch
must also be backported as far as 2.0.
(cherry picked from commit
12554d00f6ed16324a0dfbafb5230f8f0136e4ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
734863ecfa1f8a620904fd9332e01673728c784b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 9 Jun 2021 15:16:18 +0000 (17:16 +0200)]
BUG/MINOR: ssl: OCSP stapling does not work if expire too far in the future
The wey the "Next Update" field of the OCSP response is converted into a
timestamp relies on the use of signed integers for the year and month so
if the calculated timestamp happens to overflow INT_MAX, it ends up
being seen as negative and the OCSP response being dwignored in
ssl_sock_ocsp_stapling_cbk (because of the "ocsp->expire < now.tv_sec"
test).
It could be backported to all stable branches.
(cherry picked from commit
a3a0cce8ee8c142cd148090854ca8551a36d9bd7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
1adb439f84b3bb3a004736ef1fa88c899cd64f3e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Wed, 9 Jun 2021 14:46:12 +0000 (16:46 +0200)]
BUILD: make tune.ssl.keylog available again
Since commit 04a5a44 ("BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of
OpenSSL versions") the "tune.ssl.keylog" feature is broken because
HAVE_OPENSSL_KEYLOG does not exist.
Replace this by a HAVE_SSL_KEYLOG which is defined in openssl-compat.h.
Also add an error when not built with the right openssl version.
Must be backported as far as 2.3.
(cherry picked from commit
722180aca8757d8807b21cf125a2d68249be5bf8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
ca5cf0a196ef2e7d1a16ecaeda5f983551604a30)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Alex [Sat, 5 Jun 2021 11:23:08 +0000 (13:23 +0200)]
DOC: use the req.ssl_sni in examples
This patch should be backported to at least 2.0
(cherry picked from commit
5c866200d4ec61e02676596d473a2af88916d45e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
18c8877b34d3e376b9f15299ff92e8d641db30f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 1 Jun 2021 12:06:05 +0000 (14:06 +0200)]
BUG/MAJOR: stream-int: Release SI endpoint on server side ASAP on retry
When a connection attempt failed, if a retry is possible, the SI endpoint on
the server side is immediately released, instead of waiting to establish a
new connection to a server. Thus, when the backend SI is switched from
SI_ST_CER state to SI_ST_REQ, SI_ST_ASS or SI_ST_TAR, its endpoint is
released. It is expected because the SI is moved to a state prior to the
connection stage ( < SI_ST_CONN). So it seems logical to not have any server
connection.
It is especially important if the retry is delayed (SI_ST_TAR or
SI_ST_QUE). Because, if the server connection is preserved, any error at the
connection level is unexpectedly relayed to the stream, via the
stream-interface, leading to an infinite loop in process_stream(). if
SI_FL_ERR flag is set on the backend SI in another state than SI_ST_CLO, an
internal goto is performed to resync the stream-interfaces. In addtition,
some ressources are not released ASAP.
This bug is quite old and was reported 1 or 2 times per years since the 2.2
(at least) with not enough information to catch it. It must be backported as
far as 2.2 with a special care because this part has moved several times and
after some observation period and feedback from users to be sure. For info,
in 2.0 and prior, the connection is released when an error is encountered in
SI_ST_CON or SI_ST_RDY states.
(cherry picked from commit
f822decfda1f662147c02fd8b4f752da7500f374)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
d909078dec09ea7636bc1b0fce536303f146ed5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Alexandar Lazic [Mon, 31 May 2021 22:27:01 +0000 (00:27 +0200)]
DOC/MINOR: move uuid in the configuration to the right alphabetical order
This patch can be backported up to 2.1 where the uuid fetch was
introduced
(cherry picked from commit
528adc3b18fd610df807f14034acc212eaf685b3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
a429ad3fb0261bbe7ae0bc1cc247e4236739828f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Thu, 13 May 2021 11:30:12 +0000 (13:30 +0200)]
BUG/MINOR: lua/vars: prevent get_var() from allocating a new name
Variable names are stored into a unified list that helps compare them
just based on a pointer instead of duplicating their name with every
variable. This is convenient for those declared in the configuration
but this started to cause issues with Lua when random names would be
created upon each access, eating lots of memory and CPU for lookups,
hence the work in 2.2 with commit
4e172c93f ("MEDIUM: lua: Add
`ifexist` parameter to `set_var`") to address this.
But there remains a corner case with get_var(), which also allocates
a new variables. After a bit of thinking and discussion, it never
makes sense to allocate a new variable name on get_var():
- if the name exists, it will be returned ;
- if it does not exist, then the only way for it to appear will
be that some code calls set_var() on it
- a call to get_var() after a careful set_var(ifexist) ruins the
effort on set_var().
For this reason, this patch addresses this issue by making sure that
get_var() will never cause a variable to be allocated. This is done
by modifying vars_get_by_name() to always call register_name() with
alloc=0, since vars_get_by_name() is exclusively used by Lua and the
new CLI's "get/set var" which also benefit from this protection.
It probably makes sense to backport this as far as 2.2 after some
observation period and feedback from users.
For more context and discussions about the issues this was causing,
see https://www.mail-archive.com/haproxy@formilux.org/msg40451.html
and in issue #664.
(cherry picked from commit
89f6dedf486823237ffbe02f8d78df454332c615)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 14:40:28 +0000 (16:40 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in chash_init_server_tree
A memory allocation failure happening in chash_init_server_tree while
trying to allocate a server's lb_nodes item used in consistent hashing
would have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
476462010ecc2329e9ecba5afd20f84d9605d809)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
c5c5bc4e36ce4a6f3bc113c8e16824fdb276c220)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 10:00:54 +0000 (12:00 +0200)]
BUG/MINOR: http: Missing calloc return value check in make_arg_list
A memory allocation failure happening in make_arg_list when trying to
allocate the argument list would have resulted in a crash. This function
is only called during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
17acbab0ac5933205788eb0a134f9e918247efef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
6d303d66057957dead2b26e8cd865cd30054902d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 09:32:04 +0000 (11:32 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing redirect rule
A memory allocation failure happening in http_parse_redirect_rule when
trying to allocate a redirect_rule structure would have resulted in a
crash. This function is only called during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
b6864a5b6fa1ea6eb224710d00c897105b9e2a46)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
e10a8f6b9c9437efe8b78239c92ed18775353d1c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 08:45:12 +0000 (10:45 +0200)]
BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list
A memory allocation failure happening in mworker_env_to_proc_list when
trying to allocate a mworker_proc would have resulted in a crash. This
function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
1f4fa906c73e61dba74f9e8b762da12df3052f57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
d90aa347d2a3a81bae8461c009c9b4d69eb069a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Mon, 17 May 2021 08:35:08 +0000 (10:35 +0200)]
BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo
A memory allocation failure happening in comp_append_type or
comp_append_algo called while parsing compression options would have
resulted in a crash. These functions are only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
6443bcc2e1f2e1e11af76ef460d8241f06223de8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
a3a8d50d992ab145e5c353cc36e8a787d3aead57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Mon, 17 May 2021 08:08:16 +0000 (10:08 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule
A memory allocation failure happening in tcp_parse_request_rule while
processing the "capture" keyword and trying to allocate a cap_hdr
structure would have resulted in a crash. This function is only called
during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
8cb033643ff3235ac0d3887167ce06fefeaf850b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
31d170c581e8e4f726d7a54288b969cd38dde4b5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:24:18 +0000 (18:24 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing tcp-request/tcp-response
A memory allocation failure happening in tcp_parse_tcp_req or
tcp_parse_tcp_rep when trying to allocate an act_rule structure would
have resulted in a crash. These functions are only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
2ca42b4656f60655a5295a66f321239faee2d9fe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
8f86a29d20a094fede1b60695754aed3bebabe44)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:07:27 +0000 (18:07 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in proxy_defproxy_cpy
A memory allocation failure happening in proxy_defproxy_cpy while
copying the default compression options would have resulted in a crash.
This function is called for every new proxy found while parsing the
configuration.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
18a82ba690a6ff4adbf9702cefa6dc89eb36372d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
4d8e20f83cc07c08c1d562e6afc5b4d3a54f4058)
[Cf: Patch applied on cfgparse-listen.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:04:46 +0000 (18:04 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in proxy_parse_declare
A memory allocation failure happening during proxy_parse_declare while
processing the "capture" keyword and allocating a cap_hdr structure
would have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
55ba0d6865662036d1d137dc4aac3841d29ad6d3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
581d892c823ec18ff7066af696c2017c664959da)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:54:17 +0000 (17:54 +0200)]
BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture
A memory allocation failure happening in parse_http_req_capture while
processing a "len" keyword and allocating a cap_hdr structure would
have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
a4bf8a059dd9c783c00680aedcf055faa4b5d784)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
737d8b5c7bbcc38152b5621715f6f34b94d6fc37)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:45:21 +0000 (17:45 +0200)]
BUG/MINOR: ssl: Missing calloc return value check in ssl_init_single_engine
A memory allocation failure happening during ssl_init_single_engine
would have resulted in a crash. This function is only called during
init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
612b2c37be70636e73698c6c064af9522301be65)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
46697c80fec88e656a4f7280a2cfa9dedbfb33a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:39:04 +0000 (17:39 +0200)]
BUG/MINOR: peers: Missing calloc return value check in peers_register_table
A memory allocation failure happening during peers_register_table would
have resulted in a crash. This function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
208ff01b23eda19fe41780310f4d08ca86e28766)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
be5b1bb8cb683290f80856a3728ccaf3ad47b44b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 07:44:06 +0000 (09:44 +0200)]
BUG/MINOR: server: Missing calloc return value check in srv_parse_source
Two calloc calls were not checked in the srv_parse_source function.
Considering that this function could be called at runtime through a
dynamic server creation via the CLI, this could lead to an unfortunate
crash.
It was raised in GitHub issue #1233.
It could be backported to all stable branches even though the runtime
crash could only happen on branches where dynamic server creation is
possible.
(cherry picked from commit
f1800e64ef2428747e696b0ef2f78ab05a116dcc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
dc187917dfbcb851daf7f8dd78a1ade5331d5693)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 26 May 2021 10:15:37 +0000 (12:15 +0200)]
BUG/MINOR: http-ana: Handle L7 retries on refused early data before K/A aborts
When a network error occurred on the server side, if it is not the first
request (in case of keep-alive), nothing is returned to the client and its
connexion is closed to be sure it may retry. However L7 retries on refused
early data (0rtt-rejected) must be performed first.
In addition, such L7 retries must also be performed before incrementing the
failed responses counter.
This patch must be backported as far as 2.0.
(cherry picked from commit
d976923ab27e8aeb6cc6518232b5d2835401b1c8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
b187534982d285f569ec9731f4bce0287232fd25)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 21 May 2021 07:49:20 +0000 (09:49 +0200)]
BUG/MINOR: http-comp: Preserve HTTP_MSGF_COMPRESSIONG flag on the response
This flag is set on the response when its payload is compressed by HAProxy.
It must be preserved because it may be used when the log message is emitted.
When the compression filter was refactored to support the HTX, an
optimization was added to not perform extra proessing on the trailers.
HTTP_MSGF_COMPRESSIONG flag is removed when the last data block is
compressed. It is not required, it is just an optimization and unfortunately
a bug. This optimization must be removed to preserve the flag.
This patch must be backported as far as 2.0. On the HTX is affected.
(cherry picked from commit
acfd71b97aecea49bc3e3fad18d662960b750970)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
73f40e02b055f4fcc92fec9fe12e78034fde0260)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 20 May 2021 16:00:55 +0000 (18:00 +0200)]
BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter
For each filter, pre and post callback functions must only be called one
time. To do so, when one of them is finished, the corresponding analyser bit
must be removed from pre_analyzers or post_analyzers bit field. It is only
an issue with pre-analyser callback functions if the corresponding analyser
yields. It may happens with lua action for instance. In this case, the
filters pre analyser callback function is unexpectedly called several times.
This patch should fix the issue #1263. It must be backported is all stable
versions.
(cherry picked from commit
a6d3704e38b6c1cb09286e9778b6363825d65c4d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
476013c1213bd053c4c29984fc07c980e71893a2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 17 May 2021 08:47:18 +0000 (10:47 +0200)]
BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
A deadlock is possible with 'set maxconn server' command, if there is
pending connection ready to be dequeued. This is caused by the locking
of server spinlock in both cli_parse_set_maxconn_server and
process_srv_queue.
Fix this by reducing the scope of the server lock into
server_parse_maxconn_change_request. If connection are dequeued, the
lock is taken a second time. This can be seen as suboptimal but as it
happens only during 'set maxconn server' it can be considered as
tolerable.
This issue was reported on the mailing list, for the 1.8.x branch.
It must be backported up to the 1.8.
(cherry picked from commit
79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
3cf814498e8196a9bc5dade38a5de77bd4ee6431)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Tue, 18 May 2021 16:56:42 +0000 (18:56 +0200)]
BUG/MEDIUM: ebtree: Invalid read when looking for dup entry
The first item inserted into an ebtree will be inserted directly below
the root, which is a simple struct eb_root which only holds two branch
pointers (left and right).
If we try to find a duplicated entry to this first leaf through a
ebmb_next_dup, our leaf_p pointer will point to the eb_root instead of a
complete eb_node so we cannot look for the bit part of our leaf_p since
it would try to cast our eb_root into an eb_node and perform an out of
bounds access when reading "eb_root_to_node(eb_untag(t,EB_LEFT)))->bit".
This bug was found by address sanitizer running on a CRL hot update VTC
test.
Note that the bug has been there since the import of the eb_next_dup()
and eb_prev_dup() function in 1.5-dev19 by commit
2b5702030 ("MINOR:
ebtree: add new eb_next_dup/eb_prev_dup() functions to visit duplicates").
It can be backported to all stable branches.
(cherry picked from commit
2608e348bec8a533225424fad06760ce6e19f167)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
0179e1d7aa4a15b5325e8e3744ef302a488893d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 12:37:07 +0000 (14:37 +0200)]
REGTESTS: Add script to test abortonclose option
This script test abortonclose option for HTTP/1 client only. It may be
backported as far as 2.0. But on the 2.2 and prior, the syslog part must be
adapted to catch log messages emitted by proxy during HAProxy
startup. Following lines must be added :
recv
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe1 started."
recv
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe2 started."
(cherry picked from commit
16b37510bc1114ab7603240f8bbfeea20e5ee23b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:42:59 +0000 (18:42 +0200)]
BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
On client side, if CO_RFL_KEEP_RECV flags is set when h1_rcv_buf() is
called, we force subscription for reads to be able to catch read0. This way,
the event will be reported to upper layer to let the stream abort the
request.
This patch fixes the abortonclose option for H1 connections. It depends on
following patches :
* MEDIUM: mux-h1: Don't block reads when waiting for the other side
* MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
But to be sure the event is handled by the stream, the following patches are
also required :
* BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
* MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
All the series must be backported with caution as far as 2.0, and only after
a period of observation to be sure nothing broke.
(cherry picked from commit
1baef1523d1be1b6edf76c78688af932d4d36d8a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:34:30 +0000 (18:34 +0200)]
MEDIUM: mux-h1: Don't block reads when waiting for the other side
When we are waiting for the other side to read more data, or to read the
next request, we must only stop the processing of input data and not the
data receipt. This patch don't change anything on the subscribes for
reads. So it should not change anything. The only difference is that the H1
connection will try to read data if it is woken up for an I/O event and if
it was subscribed for reads.
This patch is required to fix abortonclose option for H1 client connections.
(cherry picked from commit
ec4207cb68b1d7d50e06d35aaa73586e2c7d46b0)
[Cf: H1C_F_IN_BUSY flag is used instead of H1C_F_WAIT_OUTPUT]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:13:25 +0000 (18:13 +0200)]
MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
When the abortonclose option is enabled, to be sure to be immediately
notified when a shutdown is received from the client, the frontend
conn-stream must be sure the mux will wait for read events. To do so, the
CO_RFL_KEEP_RECV flag is set when mux->rcv_buf() is called. This new flag
instructs the mux to wait for read events, regardless its internal state.
This patch is required to fix abortonclose option for H1 client connections.
(cherry picked from commit
d8219b31e7cfe0dec64c33e1186e5f33cf87191a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 06:45:05 +0000 (08:45 +0200)]
BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
In si_update_rx() function, the reads may be blocked because we explicitly
don't want to read or because of a lack of room in the input buffer. The
first condition is valid. However the second one only test if the channel is
empty or not. It means the reads are blocked if there are still some output
data in the input channel, in its buffer or its pipe. This condition is not
accurate. The reads must not be blocked if the channel can still receive
data. Thus instead of relying on channel_is_empty() function, we now call
channel_may_recv().
This patch is especially useful to be able to catch read0 on client side
when we are waiting for a connection to the server, when abortonclose option
is enabled. Otherwise, the client abort is not detected.
This patch depends on "MINOR: channel: Rely on HTX version if appropriate in
channel_may_recv()". Both must be backported as far as 2.0 after a period of
observation to be sure nothing broke.
(cherry picked from commit
e0dec4b7b258101f6d5faa15234103a45c16f0f8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 06:10:41 +0000 (08:10 +0200)]
MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
When channel_may_recv() is called for an HTX stream, the HTX version,
channel_htx_may_recv() is called. This patch is mandatory to fix a bug
related to the abortonclose option.
(cherry picked from commit
1c235e57d0087e86074178f682c24f8aa44e0fcd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 10 May 2021 09:23:34 +0000 (11:23 +0200)]
BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port
Check the return value of url2sa in smp_fetch_url_ip/port. If negative,
the address result is uninitialized and the sample fetch is aborted.
Also, the sockaddr is prelimiary zero'ed before calling url2sa to ensure
that it is not used by upper functions even if the sample returns 0.
Without the check, the value returned by the url_ip/url_port fetches is
unspecified. This can be triggered with the following curl :
$ curl -iv --request-target "xxx://127.0.0.1:20080/" http://127.0.0.1:20080/
This should be backported to all stable branches. However, note that
between the 1.8 and 2.0, the targetted functions have been extracted
from proto_http.c to http_fetch.c.
This should fix in part coverity report from the github issue #1244.
(cherry picked from commit
c89d5337ee5064091ceab7b2d967f99071dc39eb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 7 May 2021 09:45:26 +0000 (11:45 +0200)]
BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
On observe mode, if a server is marked as DOWN, the server's health-check is
rescheduled using the fastinter timeout if the new expiration date is newer
that the current one. But this must only be performed if the fastinter
timeout is defined.
Internally, tick_is_lt() function only checks the date and does not perform any
verification on the provided args. Thus, we must take care of it. However, it is
possible to disable the server health-check by setting its task expiration date
to TICK_ETERNITY.
This patch must be backported as far as 2.2. It is related to
(cherry picked from commit
ea860837183c4800c97d6e8e7fb417d8f1ab2ef0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 6 May 2021 14:01:18 +0000 (16:01 +0200)]
BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
A connection may be synchronously established. In the tcpcheck context, it
may be a problem if several connections come one after another. In this
case, there is no event to close the very first connection before starting
the next one. The checks is thus blocked and timed out, a L7 timeout error
is reported.
To fix the bug, when a tcpcheck is started, we immediately evaluate its
state. Most of time, nothing is performed and we must wait. But it is thus
possible to handle the result of a successfull connection.
This patch should fix the issue #1234. It must be backported as far as 2.2.
(cherry picked from commit
92017a3215545c26cd9baad11a1cdfb2859acc6c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 5 May 2021 19:05:09 +0000 (21:05 +0200)]
BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
Thanks to a previous fix, the stream error mask is now cleared on L7
retry. But the stream final state (SF_FINST_*) and the stream-interface
error type must also be reset to properly restart a new connection and be
sure to not inherit errors from the previous connection attempt.
In addition, SF_ADDR_SET flag is not systematically removed.
stream_choose_redispatch() already takes care to unset it if necessary. When
the connection is not redispatch, the server address can be preserved.
This patch must be backported as far as 2.0.
(cherry picked from commit
30aa0da532a9853b3ed5b89152fd57537d07ee88)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Fri, 7 May 2021 06:19:30 +0000 (08:19 +0200)]
BUG/MINOR: stream: properly clear the previous error mask on L7 retries
The cleanup of the previous error was incorrect on L7 retries, it would
OR two values while they're part of an enum, leaving some bits set.
Depending on the errors it was possible to occasionally see an internal
error ("I" flag) being logged.
This should be backported as far as 2.0, though the do_l7_retry() function
in in proto_htx.c in older versions.
(cherry picked from commit
75a4284babf50feecac17f9b32d07509b6c519ed)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 5 May 2021 16:23:59 +0000 (18:23 +0200)]
BUG/MINOR: stream: Decrement server current session counter on L7 retry
When a L7 retry is performed, we must not forget to decrement the current
session counter of the assigned server. Of course, it must only be done if
the current session is already counted on the server, thus if SF_CURR_SESS
flag is set on the stream.
This patch is related to the issue #1003. It must be backported as far as
2.0.
(cherry picked from commit
e763c8c99f7134a009117a59e6a6002b3c9e8c84)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 4 May 2021 16:16:53 +0000 (18:16 +0200)]
BUG/MEDIUM: dns: reset file descriptor if send returns an error
The fd is purged if send returns -1 without EAGAIN and the
sock.fd is set to -1 to recreate a new fd on next
sending attempt on this nameserver.
Before this fix we continue to send on the fd returning
an error indefinitely.
This patch applies on branch 2.3 and is fixed in newer versions
by this commit (
d26a6237ad):
'MEDIUM: resolvers: split resolving and dns message exchange layers'
This patch should be backported an all supported branches 1.6 <= v <= 2.3
Willy Tarreau [Tue, 4 May 2021 16:40:50 +0000 (18:40 +0200)]
MINOR: debug: add a new "debug dev sym" command in expert mode
This command attempts to resolve a pointer to a symbol name. This is
convenient during development as it's easier to get such pointers live
than by issuing a debugger or calling addr2line.
(cherry picked from commit
48129be18afd17688bced0bba8c0d98e2b85ef39)
[wt: backported as sometimes useful when debugging in field;
s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)/]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 5 May 2021 05:29:01 +0000 (07:29 +0200)]
MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS
The purpose of this debugging option was to prevent certain pools from
masking other ones when they were shared. For example, task, http_txn,
h2s, h1s, h1c, session, fcgi_strm, and connection are all 192 bytes and
would normally be mergedi, but not with this option. The problem is that
certain pools are declared multiple times with various parameters, which
are often very close, and due to the way the option works, they're not
shared either. Good examples of this are captures and stick tables. Some
configurations have large numbers of stick-tables of pretty similar types
and it's very common to end up with the following when the option is
enabled:
$ socat - /tmp/sock1 <<< "show pools" | grep stick
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753800=56
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753880=57
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753900=58
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753980=59
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a00=60
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a80=61
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753b00=62
- Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753780=55
In addition to not being convenient, it can have important effects on the
memory usage because these pools will not share their entries, so one stick
table cannot allocate from another one's pool.
This patch solves this by going back to the initial goal which was not to
have different pools in the same list. Instead of masking the MAP_F_SHARED
flag, it simply adds a test on the pool's name, and disables pool sharing
if the names differ. This way pools are not shared unless they're of the
same name and size, which doesn't hinder debugging. The same test above
now returns this:
$ socat - /tmp/sock1 <<< "show pools" | grep stick
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 7 users, @0x3fadb30 [SHARED]
- Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x3facaa0 [SHARED]
This is much better. This should probably be backported, in order to limit
the side effects of DEBUG_DONT_SHARE_POOLS being enabled in production.
(cherry picked from commit
1ab6c0bfd2b8e178d78955e5a8279fe04ac45da1)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Nov 2020 14:53:34 +0000 (15:53 +0100)]
MEDIUM: pools: call malloc_trim() from pool_gc()
If available it definitely makes sense to call it since it's also
called when stopping to reclaim the maximum possible memory.
(cherry picked from commit
88366c2926deac5ee257b6c541633b6a8b5111b3)
[wt: backported as it helps not only on reload but also on SIGQUIT to
try to return all unused memory]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Nov 2020 14:50:40 +0000 (15:50 +0100)]
MINOR: compat: automatically include malloc.h on glibc
This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.
(cherry picked from commit
1d3c7003d91a2c3b48f0f1fee3eb361f4121057b)
[wt: CONFIG_HAP_NO_GLOBAL_POOLS was backported via commitc0457dc81
but the optional HA_HAVE_MALLOC_TRIM was never set, and it was
shown to help in some cases]
Signed-off-by: Willy Tarreau <w@1wt.eu>
William Lallemand [Tue, 4 May 2021 14:17:27 +0000 (16:17 +0200)]
BUG/MINOR: ssl/cli: fix a lock leak when no memory available
This bug was introduced in e5ff4ad ("BUG/MINOR: ssl: fix a trash buffer
leak in some error cases").
When cli_parse_set_cert() returns because alloc_trash_chunk() failed, it
does not unlock the spinlock which can lead to a deadlock later.
Must be backported as far as 2.1 where e5ff4ad was backported.
(cherry picked from commit
5ba80d677d563517bb9754c272e6df94adae281b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 4 May 2021 14:27:45 +0000 (16:27 +0200)]
BUG/MEDIUM: cli: prevent memory leak on write errors
Since the introduction of payload support on the CLI in 1.9-dev1 by
commit
abbf60710 ("MEDIUM: cli: Add payload support"), a chunk is
temporarily allocated for the CLI to support defragmenting a payload
passed with a command. However it's only released when passing via
the CLI_ST_END state (i.e. on clean shutdown), but not on errors.
Something as trivial as:
$ while :; do ncat --send-only -U /path/to/cli <<< "show stat"; done
with a few hundreds of servers is enough see the number of allocated
trash chunks go through the roof in "show pools".
This needs to be backported as far as 2.0.
(cherry picked from commit
18b2a9dd874b66acca304580047fa6b3c16da1e3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 3 May 2021 08:11:13 +0000 (10:11 +0200)]
BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
When the lua buffers are used, a variable number of stack slots may be
used. Thus we cannot assume that we know where the top of the stack is. It
was not an issue for lua < 5.4.3 (at least for small buffers). But
'socket:receive()' now fails with lua 5.4.3 because a light userdata is
systematically pushed on the top of the stack when a buffer is initialized.
To fix the bug, in hlua_socket_receive(), we save the index of the top of
the stack before creating the buffer. This way, we can check the number of
arguments, regardless anything was pushed on the stack or not.
Note that the other buffer usages seem to be safe.
This patch should solve the issue #1240. It should be backport to all stable
branches.
(cherry picked from commit
c31b2008728babbc8738b88b593f2be40324369b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 29 Apr 2021 14:17:23 +0000 (16:17 +0200)]
REGTESTS: add minimal CLI "add map" tests
The map_redirect test already tests for "show map", "del map" and
"clear map" but doesn't have any "add map" command. Let's add some
trivial ones involving one regular entry and two other ones added as
payload, checking they are properly returned.
(cherry picked from commit
deee369cfabdea713e232c94c5d15f9613109fe9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Emeric Brun [Wed, 28 Apr 2021 10:59:35 +0000 (12:59 +0200)]
MINOR: peers: add informative flags about resync process for debugging
This patch adds miscellenous informative flags raised during the initial
full resync process performed during the reload for debugging purpose.
0x00000010: Timeout waiting for a full resync from a local node
0x00000020: Timeout waiting for a full resync from a remote node
0x00000040: Session aborted learning from a local node
0x00000080: Session aborted learning from a remote node
0x00000100: A local node teach us and was fully up to date
0x00000200: A remote node teach us and was fully up to date
0x00000400: A local node teach us but was partially up to date
0x00000800: A remote node teach us but was partially up to date
0x00001000: A local node was assigned for a full resync
0x00002000: A remote node was assigned for a full resync
0x00004000: A resync was explicitly requested
This patch could be backported on any supported branch
(cherry picked from commit
ccdfbae62cc5e5023bc857cbc04f680145435a1e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 20 Apr 2021 12:43:46 +0000 (14:43 +0200)]
BUG/MEDIUM: peers: reset tables stage flags stages on new conns
Flags used as context to know current status of each table pushing a
full resync to a peer were correctly reset receiving a new resync
request or confirmation message but in case of local peer sync during
reload the resync request is implicit and those flags were not
correctly reset in this case.
This could result to a partial initial resync of some tables after reload
if the connection with the old process was broken and retried.
This patch reset those flags at the end of the handshake for all new
connections to be sure to push a entire full resync if needed.
This patch should be backported on all supported branches ( v >= 1.6 )
(cherry picked from commit
1a6b43e13eeb4872dff9944064ddadc1ceb5178f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Wed, 28 Apr 2021 09:48:15 +0000 (11:48 +0200)]
BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
Only entries between the opposite of the last 'local update' rotating
counter were considered to be pushed. This processing worked in most
cases because updates are continually pushed trying to reach this point
but it remains some cases where updates id are more far away in the past
and appearing in futur and the push of updates is stuck until the head
reach again the tail which could take a very long time.
This patch re-work the lookup to consider that all positions on the
rotating counter is considered in the past until we reach exactly
the 'local update' value. Doing this, the updates push won't be stuck
anymore.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
8e7a13ed66aad97c550f3a6600145ed7c757a894)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>