Willy Tarreau [Wed, 16 Jun 2021 06:42:23 +0000 (08:42 +0200)]
BUG/MAJOR: queue: set SF_ASSIGNED when setting strm->target on dequeue
Commit
82cd5c13a ("OPTIM: backend: skip LB when we know the backend is
full") has uncovered a long-burried bug in the dequeing code: when a
server releases a connection, it picks a new one from the proxy's or
its queue. Technically speaking it only picks a pendconn which is a
link between a position in the queue and a stream. It then sets this
pendconn's target to itself, and wakes up the stream's task so that
it can try to connect again.
The stream then goes through the regular connection setup phases,
calls back_try_conn_req() which calls pendconn_dequeue(), which
sets the stream's target to the pendconn's and releases the pendconn.
It then reaches assign_server() which sees no SF_ASSIGNED and calls
assign_server_and_queue() to perform load balancing or queuing. This
one first destroys the stream's target and gets ready to perform load
balancing. At this point we're load-balancing for no reason since we
already knew what server was available. And this is where the commit
above comes into play: the check for the backend's queue above may
detect other connections that arrived in between, and will immediately
return FULL, forcing this request back into the queue. If the server
had a very low maxconn (e.g. 1 due to a long slowstart), it's possible
that this evicted connection was the last one on the server and that
no other one will ever be present to process the queue. Usually a
regularly processed request will still have its own srv_conn that will
be used during stream_free() to dequeue other connections. But if the
server had a down-up cycle, then a call to pendconn_grab_from_px()
may start to dequeue entries which had no srv_conn and which will have
no server slot to offer when they expire, thus maintaining the situation
above forever. Worse, as new requests arrive, there are always some
requests in the queue and the situation feeds on itself.
The correct fix here is to properly set SF_ASSIGNED in pendconn_dequeue()
when the stream's target is assigned (as it's what this flag means), so
as to avoid a load-balancing pass when dequeuing.
Many thanks to Pierre Cheynier for the numerous detailed traces he
provided that helped narrow this problem down.
This could be backported to all stable versions, but in practice only
2.3 and above are really affected since the presence of the commit
above. Given how tricky this code is it's better to limit it to those
versions that really need it.
(cherry picked from commit
7867cebf31403b6d7bab97cb3047e6d77238fcdc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a6cce193896a91c2cb7d0585191f51e1e0fc3fdd)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 06:59:19 +0000 (08:59 +0200)]
BUG/MINOR: mworker: fix typo in chroot error message
Since its introduction in 1.8 with commit
095ba4c24 ("MEDIUM: mworker:
replace systemd mode by master worker mode"), it says "cannot chroot1(...)"
which seems to be a leftover of a debug message. It could be backported but
probably nobody will notice.
(cherry picked from commit
e34cf2801154eae542191120d5165a963d2cb98c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
46317376f2f3be9c7e7df0c2b996716a5818fa77)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 14:39:22 +0000 (16:39 +0200)]
BUG/MINOR: ssl: use atomic ops to update global shctx stats
The global shctx lookups and misses was updated without using atomic
ops, so the stats available in "show info" are very likely off by a few
units over time. This should be backported as far as 1.8. Versions
without _HA_ATOMIC_INC() can use HA_ATOMIC_ADD(,1).
(cherry picked from commit
4c19e996218f6c205c1716a0b4718f9bced7f893)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
37d69399648e489fea9f93b7e9fae02dfa59acbb)
[wt: s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 15 Jun 2021 13:03:19 +0000 (15:03 +0200)]
BUG/MEDIUM: shctx: use at least thread-based locking on USE_PRIVATE_CACHE
Since threads were introduced in 1.8, the USE_PRIVATE_CACHE mode of the
shctx was not updated to use locks. Originally it was meant to disable
sharing between processes, so it removes the lock/unlock instructions.
But with threads enabled, it's not possible to work like this anymore.
It's easy to see that once built with private cache and threads enabled,
sending violent SSL traffic to the the process instantly makes it die.
The HTTP cache is very likely affected as well.
This patch addresses this by falling back to our native spinlocks when
USE_PRIVATE_CACHE is used. In practice we could use them also for other
modes and remove all older implementations, but this patch aims at keeping
the changes very low and easy to backport. A new SHCTX_LOCK label was
added to help with debugging, but OTHER_LOCK might be usable as well
for backports.
An even lighter approach for backports may consist in always declaring
the lock (or reusing "waiters"), and calling pl_take_s() for the lock()
and pl_drop_s() for the unlock() operation. This could even be used in
all modes (process and threads), even when thread support is disabled.
Subsequent patches will further clean up this area.
This patch must be backported to all supported versions since 1.8.
(cherry picked from commit
9e467af804b3dfa05eaa4677644f5ff0c0e0619f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
ca36771425fbced2c81de5259da304dc4fa0254c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Amaury Denoyelle [Mon, 14 Jun 2021 15:04:25 +0000 (17:04 +0200)]
BUG/MINOR: stick-table: insert srv in used_name tree even with fixed id
If the server id is fixed in the configuration, it is immediately
inserted in the 'used_server_id' backend tree via srv_parse_id. On
check_config_validity, the dynamic id generation is thus skipped for
fixed-id servers. However, it must nevertheless be inserted in the
'used_server_name' backend tree.
This bug seems to be not noticeable for the user. Indeed, before the
fix, the search in sticking_rule_find_target always returned NULL for
the name, then the fallback search with server id succeeded, so the
persistence is properly applied. However with the fix the fallback
search is not executed anymore, which saves from the locking of
STK_SESS.
This should be backported up to 2.0.
(cherry picked from commit
077c6b8d29ba63a625648ebb73bf151a793f0096)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
50779aefca2b685bf6e2c922d02bfab59195534a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 14 Jun 2021 09:43:18 +0000 (11:43 +0200)]
DOC: lua: Add a warning about buffers modification in HTTP
Since the 1.9, it is forbidden to alter the channel buffer from an HTTP
stream because there is no way to keep the HTTP parser synchronized if the
buffer content is altered. In addition, since the HTX is the only
reprensentation for HTTP messages, the data in HTTP buffers are structured
and cannot be read or updated in a raw fashion.
A warning is triggered when a user tries to alter an HTTP buffer. However,
it was not documented. This patch adds a warning in the lua documentation.
This patch is related to the issue #1287. It may be backported as far as
2.0.
(cherry picked from commit
095303956708a3f87594ae60cd9d7df12d12d905)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
1cda6c8bd2461819cfc7b3c9625b2cb49d808406)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 11 Jun 2021 11:34:42 +0000 (13:34 +0200)]
BUG/MINOR: mux-fcgi: Expose SERVER_SOFTWARE parameter by default
As specified in the RFC3875 (section 4.1.17), this parameter must be set to
the name and version of the information server software making the CGI
request. Thus, it is now added to the default parameters defined by
HAProxy. It is set to the string "HAProxy $version".
This patch should fix the issue #1285 and must be backported as far as 2.2.
(cherry picked from commit
5cd0e528cf2bed6eb1f79582ef89cf1667337e46)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
b2a50298b1f6f46c943b77685142d59d3e69e193)
[wt: minor ctx adjustments]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 9 Jun 2021 15:30:40 +0000 (17:30 +0200)]
BUG/MAJOR: htx: Fix htx_defrag() when an HTX block is expanded
When an HTX block is expanded, a defragmentation may be performed first to
have enough space to copy the new data. When it happens, the meta data of
the HTX message must take account of the new data length but copied data are
still unchanged at this stage (because we need more space to update the
message content). And here there is a bug because the meta data are updated
by the caller. It means that when the blocks content is copied, the new
length is already set. Thus a block larger than the reality is copied and
data outside the buffer may be accessed, leading to a crash.
To fix this bug, htx_defrag() is updated to use an extra argument with the
new meta data to use for the referenced block. Thus the caller does not need
to update the HTX message by itself. However, it still have to update the
data.
Most of time, the bug will be encountered in the HTTP compression
filter. But, even if it is highly unlikely, in theory it is also possible to
hit it when a HTTP header (or only its value) is replaced or when the
start-line is changed.
This patch must be backported as far as 2.0.
(cherry picked from commit
1cf414b522b465c97e5e87e4e38be93e6f634b32)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
159873a5b380933e69444ae9c42fce6cc116599d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 06:46:28 +0000 (08:46 +0200)]
CLEANUP: pools: remove now unused seq and pool_free_list
These ones were only used by the lockless implementation and are not
needed anymore.
(cherry picked from commit
1526ffe815ef6136eaabb9d1c188fe7e636f2062)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
47245188ace9165555dda4c8712527367de96c2c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 9 Jun 2021 16:59:58 +0000 (18:59 +0200)]
BUG/MAJOR: pools: fix possible race with free() in the lockless variant
In GH issue #1275, Fabiano Nunes Parente provided a nicely detailed
report showing reproducible crashes under musl. Musl is one of the libs
coming with a simple allocator for which we prefer to keep the shared
cache. On x86 we have a DWCAS so the lockless implementation is enabled
for such libraries.
And this implementation has had a small race since day one: the allocator
will need to read the first object's <next> pointer to place it into the
free list's head. If another thread picks the same element and immediately
releases it, while both the local and the shared pools are too crowded, it
will be freed to the OS. If the libc's allocator immediately releases it,
the memory area is unmapped and we can have a crash while trying to read
that pointer. However there is no problem as long as the item remains
mapped in memory because whatever value found there will not be placed
into the head since the counter will have changed.
The probability for this to happen is extremely low, but as analyzed by
Fabiano, it increases with the buffer size. On 16 threads it's relatively
easy to reproduce with 2MB buffers above 200k req/s, where it should
happen within the first 20 seconds of traffic usually.
This is a structural issue for which there are two non-trivial solutions:
- place a read lock in the alloc call and a barrier made of lock/unlock
in the free() call to force to serialize operations; this will have
a big performance impact since free() is already one of the contention
points;
- change the allocator to use a self-locked head, similar to what is
done in the MT_LISTS. This requires two memory writes to the head
instead of a single one, thus the overhead is exactly one memory
write during alloc and one during free;
This patch implements the second option. A new POOL_DUMMY pointer was
defined for the locked pointer value, allowing to both read and lock it
with a single xchg call. The code was carefully optimized so that the
locked period remains the shortest possible and that bus writes are
avoided as much as possible whenever the lock is held.
Tests show that while a bit slower than the original lockless
implementation on large buffers (2MB), it's 2.6 times faster than both
the no-cache and the locked implementation on such large buffers, and
remains as fast or faster than the all implementations when buffers are
48k or higher. Tests were also run on arm64 with similar results.
Note that this code is not used on modern libcs featuring a fast allocator.
A nice benefit of this change is that since it removes a dependency on
the DWCAS, it will be possible to remove the locked implementation and
replace it with this one, that is then usable on all systems, thus
significantly increasing their performance with large buffers.
Given that lockless pools were introduced in 1.9 (not supported anymore),
this patch will have to be backported as far as 2.0. The code changed
several times in this area and is subject to many ifdefs which will
complicate the backport. What is important is to remove all the DWCAS
code from the shared cache alloc/free lockless code and replace it with
this one. The pool_flush() code is basically the same code as the
allocator, retrieving the whole list at once. If in doubt regarding what
barriers to use in older versions, it's safe to use the generic ones.
This patch depends on the following previous commits:
- MINOR: pools: do not maintain the lock during pool_flush()
- MINOR: pools: call malloc_trim() under thread isolation
- MEDIUM: pools: use a single pool_gc() function for locked and lockless
The last one also removes one occurrence of an unneeded DWCAS in the
code that was incompatible with this fix. The removal of the now unused
seq field will happen in a future patch.
Many thanks to Fabiano for his detailed report, and to Olivier for
his help on this issue.
(cherry picked from commit
2a4523f6f4cba1cb9c899d0527a1bb4c5d5c0f2e)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
d0cc3761495b55764e58aff97f7c82a687c5239f)
[wt: backported into the lockless functions only; kept pool_free_area()
and local accounting instead of pool_free_to_os(); tested with random
pool_flush() and heavy pool_gc() calls]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 08:21:35 +0000 (10:21 +0200)]
MEDIUM: pools: use a single pool_gc() function for locked and lockless
Locked and lockless shared pools don't need to use a different pool_gc()
function because this function isolates itself during the operation, so
we do not need to rely on DWCAS nor any atomic operation in fact. Let's
just get rid of the lockless one in favor of the simple one. This should
even result in a faster execution.
The ifdefs were slightly moved so that we can have pool_gc() defined
as soon as there are global pools, this avoids duplicating the function.
(cherry picked from commit
9b3ed51371693f70bab07e09cdd34e26e3a46d93)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
a206cf93eb17e2d0f8184b91a0775ada763e7f71)
[wt: for 2.3, this was simplified, only the lockless version was
reworked to work without the DWCAS, very similarly to the one in
2.4 except that accounting still has to be done]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 06:40:16 +0000 (08:40 +0200)]
MINOR: pools: call malloc_trim() under thread isolation
pool_gc() was adjusted to run under thread isolation by commit
c0e2ff202
("MEDIUM: memory: make pool_gc() run under thread isolation") so that the
underlying malloc() and free() don't compete between threads during these
potentially aggressive moments (especially when mmap/munmap are involved).
Commit
88366c292 ("MEDIUM: pools: call malloc_trim() from pool_gc()")
later added a call to malloc_trim() but made it outside of the thread
isolation, which is contrary to the principle explained above. Also it
missed it in the locked version, meaning that those without a lockless
implementation cannot benefit from trimming.
This patch fixes that by calling it before thread_release() in both
places.
(cherry picked from commit
26ed183556f3df9e9bde6f6d47143aa4a18ae463)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
46b515ca55b113827ed3e52937cf7ad154d2f6c0)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 05:13:04 +0000 (07:13 +0200)]
MINOR: pools: do not maintain the lock during pool_flush()
The locked version of pool_flush() is absurd, it locks the pool for each
and every element to be released till the end. Not only this is extremely
inefficient, but it may even never finish if other threads spend their
time refilling the pool. The only case where this can happen is during
soft-stop so the risk remains limited, but it should be addressed.
(cherry picked from commit
c88914379da35c46d093f6d410b9507355aacd0a)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
af8120a53cf6a7a8e244135893195c6b65ccc91a)
[wt: in 2.3 the pool_flush code is different because it still requires
to manipulate ->allocated under the lock, thus the lock covers the
counting and detaching, and the freeing is performed out of the lock]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 10 Jun 2021 04:54:22 +0000 (06:54 +0200)]
BUG/MINOR: pools: fix a possible memory leak in the lockless pool_flush()
The lockless version of pool_flush() had a leftover of the original
version causing the pool's first entry to be set to NULL at the end.
The problem is that it does this outside of any lock and in a non-
atomic way, so that any concurrent alloc+free would result in a lost
object.
The risk is low and the consequence even lower, given that pool_flush()
is only used in pool_destroy() (hence single-threaded) or by stream_free()
during a soft-stop (not the place where most allocations happen), so in
the worst case it could result in valgrind complaining on soft-stop.
The bug was introduced with the first version of the code, in 1.9, so
the fix can be backported to all stable versions.
(cherry picked from commit
c239cde26fbf0b07abdf590b77c31154bd65c566)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit
caf6555f418767a1298aa7e273609fc1510184db)
[wt: adjusted context: pool->allocated still updated last]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Wed, 9 Jun 2021 15:12:44 +0000 (17:12 +0200)]
BUG/MEDIUM: compression: Add a flag to know the filter is still processing data
Since the commit
acfd71b97 ("BUG/MINOR: http-comp: Preserve
HTTP_MSGF_COMPRESSIONG flag on the response"), there is no more flag to know
when the compression ends. This means it is possible to finish the
compression several time if there are trailers.
So, we reintroduce almost the same mechanism but with a dedicated flag. So
now, there is a bits field in the compression filter context.
The commit above is marked to be backported as far as 2.0. Thus this patch
must also be backported as far as 2.0.
(cherry picked from commit
12554d00f6ed16324a0dfbafb5230f8f0136e4ba)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
734863ecfa1f8a620904fd9332e01673728c784b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 9 Jun 2021 15:16:18 +0000 (17:16 +0200)]
BUG/MINOR: ssl: OCSP stapling does not work if expire too far in the future
The wey the "Next Update" field of the OCSP response is converted into a
timestamp relies on the use of signed integers for the year and month so
if the calculated timestamp happens to overflow INT_MAX, it ends up
being seen as negative and the OCSP response being dwignored in
ssl_sock_ocsp_stapling_cbk (because of the "ocsp->expire < now.tv_sec"
test).
It could be backported to all stable branches.
(cherry picked from commit
a3a0cce8ee8c142cd148090854ca8551a36d9bd7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
1adb439f84b3bb3a004736ef1fa88c899cd64f3e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Wed, 9 Jun 2021 14:46:12 +0000 (16:46 +0200)]
BUILD: make tune.ssl.keylog available again
Since commit 04a5a44 ("BUILD: ssl: use HAVE_OPENSSL_KEYLOG instead of
OpenSSL versions") the "tune.ssl.keylog" feature is broken because
HAVE_OPENSSL_KEYLOG does not exist.
Replace this by a HAVE_SSL_KEYLOG which is defined in openssl-compat.h.
Also add an error when not built with the right openssl version.
Must be backported as far as 2.3.
(cherry picked from commit
722180aca8757d8807b21cf125a2d68249be5bf8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
ca5cf0a196ef2e7d1a16ecaeda5f983551604a30)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Alex [Sat, 5 Jun 2021 11:23:08 +0000 (13:23 +0200)]
DOC: use the req.ssl_sni in examples
This patch should be backported to at least 2.0
(cherry picked from commit
5c866200d4ec61e02676596d473a2af88916d45e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
18c8877b34d3e376b9f15299ff92e8d641db30f3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 1 Jun 2021 12:06:05 +0000 (14:06 +0200)]
BUG/MAJOR: stream-int: Release SI endpoint on server side ASAP on retry
When a connection attempt failed, if a retry is possible, the SI endpoint on
the server side is immediately released, instead of waiting to establish a
new connection to a server. Thus, when the backend SI is switched from
SI_ST_CER state to SI_ST_REQ, SI_ST_ASS or SI_ST_TAR, its endpoint is
released. It is expected because the SI is moved to a state prior to the
connection stage ( < SI_ST_CONN). So it seems logical to not have any server
connection.
It is especially important if the retry is delayed (SI_ST_TAR or
SI_ST_QUE). Because, if the server connection is preserved, any error at the
connection level is unexpectedly relayed to the stream, via the
stream-interface, leading to an infinite loop in process_stream(). if
SI_FL_ERR flag is set on the backend SI in another state than SI_ST_CLO, an
internal goto is performed to resync the stream-interfaces. In addtition,
some ressources are not released ASAP.
This bug is quite old and was reported 1 or 2 times per years since the 2.2
(at least) with not enough information to catch it. It must be backported as
far as 2.2 with a special care because this part has moved several times and
after some observation period and feedback from users to be sure. For info,
in 2.0 and prior, the connection is released when an error is encountered in
SI_ST_CON or SI_ST_RDY states.
(cherry picked from commit
f822decfda1f662147c02fd8b4f752da7500f374)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
d909078dec09ea7636bc1b0fce536303f146ed5f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Alexandar Lazic [Mon, 31 May 2021 22:27:01 +0000 (00:27 +0200)]
DOC/MINOR: move uuid in the configuration to the right alphabetical order
This patch can be backported up to 2.1 where the uuid fetch was
introduced
(cherry picked from commit
528adc3b18fd610df807f14034acc212eaf685b3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
a429ad3fb0261bbe7ae0bc1cc247e4236739828f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Thu, 13 May 2021 11:30:12 +0000 (13:30 +0200)]
BUG/MINOR: lua/vars: prevent get_var() from allocating a new name
Variable names are stored into a unified list that helps compare them
just based on a pointer instead of duplicating their name with every
variable. This is convenient for those declared in the configuration
but this started to cause issues with Lua when random names would be
created upon each access, eating lots of memory and CPU for lookups,
hence the work in 2.2 with commit
4e172c93f ("MEDIUM: lua: Add
`ifexist` parameter to `set_var`") to address this.
But there remains a corner case with get_var(), which also allocates
a new variables. After a bit of thinking and discussion, it never
makes sense to allocate a new variable name on get_var():
- if the name exists, it will be returned ;
- if it does not exist, then the only way for it to appear will
be that some code calls set_var() on it
- a call to get_var() after a careful set_var(ifexist) ruins the
effort on set_var().
For this reason, this patch addresses this issue by making sure that
get_var() will never cause a variable to be allocated. This is done
by modifying vars_get_by_name() to always call register_name() with
alloc=0, since vars_get_by_name() is exclusively used by Lua and the
new CLI's "get/set var" which also benefit from this protection.
It probably makes sense to backport this as far as 2.2 after some
observation period and feedback from users.
For more context and discussions about the issues this was causing,
see https://www.mail-archive.com/haproxy@formilux.org/msg40451.html
and in issue #664.
(cherry picked from commit
89f6dedf486823237ffbe02f8d78df454332c615)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 14:40:28 +0000 (16:40 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in chash_init_server_tree
A memory allocation failure happening in chash_init_server_tree while
trying to allocate a server's lb_nodes item used in consistent hashing
would have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
476462010ecc2329e9ecba5afd20f84d9605d809)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
c5c5bc4e36ce4a6f3bc113c8e16824fdb276c220)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 10:00:54 +0000 (12:00 +0200)]
BUG/MINOR: http: Missing calloc return value check in make_arg_list
A memory allocation failure happening in make_arg_list when trying to
allocate the argument list would have resulted in a crash. This function
is only called during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
17acbab0ac5933205788eb0a134f9e918247efef)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
6d303d66057957dead2b26e8cd865cd30054902d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 09:32:04 +0000 (11:32 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing redirect rule
A memory allocation failure happening in http_parse_redirect_rule when
trying to allocate a redirect_rule structure would have resulted in a
crash. This function is only called during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
b6864a5b6fa1ea6eb224710d00c897105b9e2a46)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
e10a8f6b9c9437efe8b78239c92ed18775353d1c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 19 May 2021 08:45:12 +0000 (10:45 +0200)]
BUG/MINOR: worker: Missing calloc return value check in mworker_env_to_proc_list
A memory allocation failure happening in mworker_env_to_proc_list when
trying to allocate a mworker_proc would have resulted in a crash. This
function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
1f4fa906c73e61dba74f9e8b762da12df3052f57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
d90aa347d2a3a81bae8461c009c9b4d69eb069a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Mon, 17 May 2021 08:35:08 +0000 (10:35 +0200)]
BUG/MINOR: compression: Missing calloc return value check in comp_append_type/algo
A memory allocation failure happening in comp_append_type or
comp_append_algo called while parsing compression options would have
resulted in a crash. These functions are only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
6443bcc2e1f2e1e11af76ef460d8241f06223de8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
a3a8d50d992ab145e5c353cc36e8a787d3aead57)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Mon, 17 May 2021 08:08:16 +0000 (10:08 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing tcp-request rule
A memory allocation failure happening in tcp_parse_request_rule while
processing the "capture" keyword and trying to allocate a cap_hdr
structure would have resulted in a crash. This function is only called
during configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
8cb033643ff3235ac0d3887167ce06fefeaf850b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
31d170c581e8e4f726d7a54288b969cd38dde4b5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:24:18 +0000 (18:24 +0200)]
BUG/MINOR: http: Missing calloc return value check while parsing tcp-request/tcp-response
A memory allocation failure happening in tcp_parse_tcp_req or
tcp_parse_tcp_rep when trying to allocate an act_rule structure would
have resulted in a crash. These functions are only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
2ca42b4656f60655a5295a66f321239faee2d9fe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
8f86a29d20a094fede1b60695754aed3bebabe44)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:07:27 +0000 (18:07 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in proxy_defproxy_cpy
A memory allocation failure happening in proxy_defproxy_cpy while
copying the default compression options would have resulted in a crash.
This function is called for every new proxy found while parsing the
configuration.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
18a82ba690a6ff4adbf9702cefa6dc89eb36372d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
4d8e20f83cc07c08c1d562e6afc5b4d3a54f4058)
[Cf: Patch applied on cfgparse-listen.c]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 16:04:46 +0000 (18:04 +0200)]
BUG/MINOR: proxy: Missing calloc return value check in proxy_parse_declare
A memory allocation failure happening during proxy_parse_declare while
processing the "capture" keyword and allocating a cap_hdr structure
would have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
55ba0d6865662036d1d137dc4aac3841d29ad6d3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
581d892c823ec18ff7066af696c2017c664959da)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:54:17 +0000 (17:54 +0200)]
BUG/MINOR: http: Missing calloc return value check in parse_http_req_capture
A memory allocation failure happening in parse_http_req_capture while
processing a "len" keyword and allocating a cap_hdr structure would
have resulted in a crash. This function is only called during
configuration parsing.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
a4bf8a059dd9c783c00680aedcf055faa4b5d784)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
737d8b5c7bbcc38152b5621715f6f34b94d6fc37)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:45:21 +0000 (17:45 +0200)]
BUG/MINOR: ssl: Missing calloc return value check in ssl_init_single_engine
A memory allocation failure happening during ssl_init_single_engine
would have resulted in a crash. This function is only called during
init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
612b2c37be70636e73698c6c064af9522301be65)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
46697c80fec88e656a4f7280a2cfa9dedbfb33a5)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 15:39:04 +0000 (17:39 +0200)]
BUG/MINOR: peers: Missing calloc return value check in peers_register_table
A memory allocation failure happening during peers_register_table would
have resulted in a crash. This function is only called during init.
It was raised in GitHub issue #1233.
It could be backported to all stable branches.
(cherry picked from commit
208ff01b23eda19fe41780310f4d08ca86e28766)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
be5b1bb8cb683290f80856a3728ccaf3ad47b44b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 12 May 2021 07:44:06 +0000 (09:44 +0200)]
BUG/MINOR: server: Missing calloc return value check in srv_parse_source
Two calloc calls were not checked in the srv_parse_source function.
Considering that this function could be called at runtime through a
dynamic server creation via the CLI, this could lead to an unfortunate
crash.
It was raised in GitHub issue #1233.
It could be backported to all stable branches even though the runtime
crash could only happen on branches where dynamic server creation is
possible.
(cherry picked from commit
f1800e64ef2428747e696b0ef2f78ab05a116dcc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
dc187917dfbcb851daf7f8dd78a1ade5331d5693)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 26 May 2021 10:15:37 +0000 (12:15 +0200)]
BUG/MINOR: http-ana: Handle L7 retries on refused early data before K/A aborts
When a network error occurred on the server side, if it is not the first
request (in case of keep-alive), nothing is returned to the client and its
connexion is closed to be sure it may retry. However L7 retries on refused
early data (0rtt-rejected) must be performed first.
In addition, such L7 retries must also be performed before incrementing the
failed responses counter.
This patch must be backported as far as 2.0.
(cherry picked from commit
d976923ab27e8aeb6cc6518232b5d2835401b1c8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
b187534982d285f569ec9731f4bce0287232fd25)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 21 May 2021 07:49:20 +0000 (09:49 +0200)]
BUG/MINOR: http-comp: Preserve HTTP_MSGF_COMPRESSIONG flag on the response
This flag is set on the response when its payload is compressed by HAProxy.
It must be preserved because it may be used when the log message is emitted.
When the compression filter was refactored to support the HTX, an
optimization was added to not perform extra proessing on the trailers.
HTTP_MSGF_COMPRESSIONG flag is removed when the last data block is
compressed. It is not required, it is just an optimization and unfortunately
a bug. This optimization must be removed to preserve the flag.
This patch must be backported as far as 2.0. On the HTX is affected.
(cherry picked from commit
acfd71b97aecea49bc3e3fad18d662960b750970)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
73f40e02b055f4fcc92fec9fe12e78034fde0260)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 20 May 2021 16:00:55 +0000 (18:00 +0200)]
BUG/MEDIUM: filters: Exec pre/post analysers only one time per filter
For each filter, pre and post callback functions must only be called one
time. To do so, when one of them is finished, the corresponding analyser bit
must be removed from pre_analyzers or post_analyzers bit field. It is only
an issue with pre-analyser callback functions if the corresponding analyser
yields. It may happens with lua action for instance. In this case, the
filters pre analyser callback function is unexpectedly called several times.
This patch should fix the issue #1263. It must be backported is all stable
versions.
(cherry picked from commit
a6d3704e38b6c1cb09286e9778b6363825d65c4d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
476013c1213bd053c4c29984fc07c980e71893a2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 17 May 2021 08:47:18 +0000 (10:47 +0200)]
BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'
A deadlock is possible with 'set maxconn server' command, if there is
pending connection ready to be dequeued. This is caused by the locking
of server spinlock in both cli_parse_set_maxconn_server and
process_srv_queue.
Fix this by reducing the scope of the server lock into
server_parse_maxconn_change_request. If connection are dequeued, the
lock is taken a second time. This can be seen as suboptimal but as it
happens only during 'set maxconn server' it can be considered as
tolerable.
This issue was reported on the mailing list, for the 1.8.x branch.
It must be backported up to the 1.8.
(cherry picked from commit
79a88ba3d09f7e2b73ae27cb5d24cc087a548fa6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
3cf814498e8196a9bc5dade38a5de77bd4ee6431)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Tue, 18 May 2021 16:56:42 +0000 (18:56 +0200)]
BUG/MEDIUM: ebtree: Invalid read when looking for dup entry
The first item inserted into an ebtree will be inserted directly below
the root, which is a simple struct eb_root which only holds two branch
pointers (left and right).
If we try to find a duplicated entry to this first leaf through a
ebmb_next_dup, our leaf_p pointer will point to the eb_root instead of a
complete eb_node so we cannot look for the bit part of our leaf_p since
it would try to cast our eb_root into an eb_node and perform an out of
bounds access when reading "eb_root_to_node(eb_untag(t,EB_LEFT)))->bit".
This bug was found by address sanitizer running on a CRL hot update VTC
test.
Note that the bug has been there since the import of the eb_next_dup()
and eb_prev_dup() function in 1.5-dev19 by commit
2b5702030 ("MINOR:
ebtree: add new eb_next_dup/eb_prev_dup() functions to visit duplicates").
It can be backported to all stable branches.
(cherry picked from commit
2608e348bec8a533225424fad06760ce6e19f167)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit
0179e1d7aa4a15b5325e8e3744ef302a488893d6)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 12:37:07 +0000 (14:37 +0200)]
REGTESTS: Add script to test abortonclose option
This script test abortonclose option for HTTP/1 client only. It may be
backported as far as 2.0. But on the 2.2 and prior, the syslog part must be
adapted to catch log messages emitted by proxy during HAProxy
startup. Following lines must be added :
recv
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe1 started."
recv
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Proxy fe2 started."
(cherry picked from commit
16b37510bc1114ab7603240f8bbfeea20e5ee23b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:42:59 +0000 (18:42 +0200)]
BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
On client side, if CO_RFL_KEEP_RECV flags is set when h1_rcv_buf() is
called, we force subscription for reads to be able to catch read0. This way,
the event will be reported to upper layer to let the stream abort the
request.
This patch fixes the abortonclose option for H1 connections. It depends on
following patches :
* MEDIUM: mux-h1: Don't block reads when waiting for the other side
* MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
But to be sure the event is handled by the stream, the following patches are
also required :
* BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
* MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
All the series must be backported with caution as far as 2.0, and only after
a period of observation to be sure nothing broke.
(cherry picked from commit
1baef1523d1be1b6edf76c78688af932d4d36d8a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:34:30 +0000 (18:34 +0200)]
MEDIUM: mux-h1: Don't block reads when waiting for the other side
When we are waiting for the other side to read more data, or to read the
next request, we must only stop the processing of input data and not the
data receipt. This patch don't change anything on the subscribes for
reads. So it should not change anything. The only difference is that the H1
connection will try to read data if it is woken up for an I/O event and if
it was subscribed for reads.
This patch is required to fix abortonclose option for H1 client connections.
(cherry picked from commit
ec4207cb68b1d7d50e06d35aaa73586e2c7d46b0)
[Cf: H1C_F_IN_BUSY flag is used instead of H1C_F_WAIT_OUTPUT]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 8 Apr 2021 16:13:25 +0000 (18:13 +0200)]
MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
When the abortonclose option is enabled, to be sure to be immediately
notified when a shutdown is received from the client, the frontend
conn-stream must be sure the mux will wait for read events. To do so, the
CO_RFL_KEEP_RECV flag is set when mux->rcv_buf() is called. This new flag
instructs the mux to wait for read events, regardless its internal state.
This patch is required to fix abortonclose option for H1 client connections.
(cherry picked from commit
d8219b31e7cfe0dec64c33e1186e5f33cf87191a)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 06:45:05 +0000 (08:45 +0200)]
BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
In si_update_rx() function, the reads may be blocked because we explicitly
don't want to read or because of a lack of room in the input buffer. The
first condition is valid. However the second one only test if the channel is
empty or not. It means the reads are blocked if there are still some output
data in the input channel, in its buffer or its pipe. This condition is not
accurate. The reads must not be blocked if the channel can still receive
data. Thus instead of relying on channel_is_empty() function, we now call
channel_may_recv().
This patch is especially useful to be able to catch read0 on client side
when we are waiting for a connection to the server, when abortonclose option
is enabled. Otherwise, the client abort is not detected.
This patch depends on "MINOR: channel: Rely on HTX version if appropriate in
channel_may_recv()". Both must be backported as far as 2.0 after a period of
observation to be sure nothing broke.
(cherry picked from commit
e0dec4b7b258101f6d5faa15234103a45c16f0f8)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 7 Apr 2021 06:10:41 +0000 (08:10 +0200)]
MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
When channel_may_recv() is called for an HTX stream, the HTX version,
channel_htx_may_recv() is called. This patch is mandatory to fix a bug
related to the abortonclose option.
(cherry picked from commit
1c235e57d0087e86074178f682c24f8aa44e0fcd)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Mon, 10 May 2021 09:23:34 +0000 (11:23 +0200)]
BUG/MINOR: http_fetch: fix possible uninit sockaddr in fetch_url_ip/port
Check the return value of url2sa in smp_fetch_url_ip/port. If negative,
the address result is uninitialized and the sample fetch is aborted.
Also, the sockaddr is prelimiary zero'ed before calling url2sa to ensure
that it is not used by upper functions even if the sample returns 0.
Without the check, the value returned by the url_ip/url_port fetches is
unspecified. This can be triggered with the following curl :
$ curl -iv --request-target "xxx://127.0.0.1:20080/" http://127.0.0.1:20080/
This should be backported to all stable branches. However, note that
between the 1.8 and 2.0, the targetted functions have been extracted
from proto_http.c to http_fetch.c.
This should fix in part coverity report from the github issue #1244.
(cherry picked from commit
c89d5337ee5064091ceab7b2d967f99071dc39eb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Fri, 7 May 2021 09:45:26 +0000 (11:45 +0200)]
BUG/MINOR: checks: Reschedule check on observe mode only if fastinter is set
On observe mode, if a server is marked as DOWN, the server's health-check is
rescheduled using the fastinter timeout if the new expiration date is newer
that the current one. But this must only be performed if the fastinter
timeout is defined.
Internally, tick_is_lt() function only checks the date and does not perform any
verification on the provided args. Thus, we must take care of it. However, it is
possible to disable the server health-check by setting its task expiration date
to TICK_ETERNITY.
This patch must be backported as far as 2.2. It is related to
(cherry picked from commit
ea860837183c4800c97d6e8e7fb417d8f1ab2ef0)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 6 May 2021 14:01:18 +0000 (16:01 +0200)]
BUG/MINOR: checks: Handle synchronous connect when a tcpcheck is started
A connection may be synchronously established. In the tcpcheck context, it
may be a problem if several connections come one after another. In this
case, there is no event to close the very first connection before starting
the next one. The checks is thus blocked and timed out, a L7 timeout error
is reported.
To fix the bug, when a tcpcheck is started, we immediately evaluate its
state. Most of time, nothing is performed and we must wait. But it is thus
possible to handle the result of a successfull connection.
This patch should fix the issue #1234. It must be backported as far as 2.2.
(cherry picked from commit
92017a3215545c26cd9baad11a1cdfb2859acc6c)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 5 May 2021 19:05:09 +0000 (21:05 +0200)]
BUG/MINOR: stream: Reset stream final state and si error type on L7 retry
Thanks to a previous fix, the stream error mask is now cleared on L7
retry. But the stream final state (SF_FINST_*) and the stream-interface
error type must also be reset to properly restart a new connection and be
sure to not inherit errors from the previous connection attempt.
In addition, SF_ADDR_SET flag is not systematically removed.
stream_choose_redispatch() already takes care to unset it if necessary. When
the connection is not redispatch, the server address can be preserved.
This patch must be backported as far as 2.0.
(cherry picked from commit
30aa0da532a9853b3ed5b89152fd57537d07ee88)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Fri, 7 May 2021 06:19:30 +0000 (08:19 +0200)]
BUG/MINOR: stream: properly clear the previous error mask on L7 retries
The cleanup of the previous error was incorrect on L7 retries, it would
OR two values while they're part of an enum, leaving some bits set.
Depending on the errors it was possible to occasionally see an internal
error ("I" flag) being logged.
This should be backported as far as 2.0, though the do_l7_retry() function
in in proto_htx.c in older versions.
(cherry picked from commit
75a4284babf50feecac17f9b32d07509b6c519ed)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 5 May 2021 16:23:59 +0000 (18:23 +0200)]
BUG/MINOR: stream: Decrement server current session counter on L7 retry
When a L7 retry is performed, we must not forget to decrement the current
session counter of the assigned server. Of course, it must only be done if
the current session is already counted on the server, thus if SF_CURR_SESS
flag is set on the stream.
This patch is related to the issue #1003. It must be backported as far as
2.0.
(cherry picked from commit
e763c8c99f7134a009117a59e6a6002b3c9e8c84)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 4 May 2021 16:16:53 +0000 (18:16 +0200)]
BUG/MEDIUM: dns: reset file descriptor if send returns an error
The fd is purged if send returns -1 without EAGAIN and the
sock.fd is set to -1 to recreate a new fd on next
sending attempt on this nameserver.
Before this fix we continue to send on the fd returning
an error indefinitely.
This patch applies on branch 2.3 and is fixed in newer versions
by this commit (
d26a6237ad):
'MEDIUM: resolvers: split resolving and dns message exchange layers'
This patch should be backported an all supported branches 1.6 <= v <= 2.3
Willy Tarreau [Tue, 4 May 2021 16:40:50 +0000 (18:40 +0200)]
MINOR: debug: add a new "debug dev sym" command in expert mode
This command attempts to resolve a pointer to a symbol name. This is
convenient during development as it's easier to get such pointers live
than by issuing a debugger or calling addr2line.
(cherry picked from commit
48129be18afd17688bced0bba8c0d98e2b85ef39)
[wt: backported as sometimes useful when debugging in field;
s/_HA_ATOMIC_INC/_HA_ATOMIC_ADD(,1)/]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Wed, 5 May 2021 05:29:01 +0000 (07:29 +0200)]
MINOR: pools/debug: slightly relax DEBUG_DONT_SHARE_POOLS
The purpose of this debugging option was to prevent certain pools from
masking other ones when they were shared. For example, task, http_txn,
h2s, h1s, h1c, session, fcgi_strm, and connection are all 192 bytes and
would normally be mergedi, but not with this option. The problem is that
certain pools are declared multiple times with various parameters, which
are often very close, and due to the way the option works, they're not
shared either. Good examples of this are captures and stick tables. Some
configurations have large numbers of stick-tables of pretty similar types
and it's very common to end up with the following when the option is
enabled:
$ socat - /tmp/sock1 <<< "show pools" | grep stick
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753800=56
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753880=57
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753900=58
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753980=59
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a00=60
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753a80=61
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753b00=62
- Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x753780=55
In addition to not being convenient, it can have important effects on the
memory usage because these pools will not share their entries, so one stick
table cannot allocate from another one's pool.
This patch solves this by going back to the initial goal which was not to
have different pools in the same list. Instead of masking the MAP_F_SHARED
flag, it simply adds a test on the pool's name, and disables pool sharing
if the names differ. This way pools are not shared unless they're of the
same name and size, which doesn't hinder debugging. The same test above
now returns this:
$ socat - /tmp/sock1 <<< "show pools" | grep stick
- Pool sticktables (160 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 7 users, @0x3fadb30 [SHARED]
- Pool sticktables (224 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x3facaa0 [SHARED]
This is much better. This should probably be backported, in order to limit
the side effects of DEBUG_DONT_SHARE_POOLS being enabled in production.
(cherry picked from commit
1ab6c0bfd2b8e178d78955e5a8279fe04ac45da1)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Nov 2020 14:53:34 +0000 (15:53 +0100)]
MEDIUM: pools: call malloc_trim() from pool_gc()
If available it definitely makes sense to call it since it's also
called when stopping to reclaim the maximum possible memory.
(cherry picked from commit
88366c2926deac5ee257b6c541633b6a8b5111b3)
[wt: backported as it helps not only on reload but also on SIGQUIT to
try to return all unused memory]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 3 Nov 2020 14:50:40 +0000 (15:50 +0100)]
MINOR: compat: automatically include malloc.h on glibc
This is in order to access malloc_trim() which is convenient after
clearing huge maps to reclaim memory. When this is detected, we also
define HA_HAVE_MALLOC_TRIM.
(cherry picked from commit
1d3c7003d91a2c3b48f0f1fee3eb361f4121057b)
[wt: CONFIG_HAP_NO_GLOBAL_POOLS was backported via commitc0457dc81
but the optional HA_HAVE_MALLOC_TRIM was never set, and it was
shown to help in some cases]
Signed-off-by: Willy Tarreau <w@1wt.eu>
William Lallemand [Tue, 4 May 2021 14:17:27 +0000 (16:17 +0200)]
BUG/MINOR: ssl/cli: fix a lock leak when no memory available
This bug was introduced in e5ff4ad ("BUG/MINOR: ssl: fix a trash buffer
leak in some error cases").
When cli_parse_set_cert() returns because alloc_trash_chunk() failed, it
does not unlock the spinlock which can lead to a deadlock later.
Must be backported as far as 2.1 where e5ff4ad was backported.
(cherry picked from commit
5ba80d677d563517bb9754c272e6df94adae281b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Tue, 4 May 2021 14:27:45 +0000 (16:27 +0200)]
BUG/MEDIUM: cli: prevent memory leak on write errors
Since the introduction of payload support on the CLI in 1.9-dev1 by
commit
abbf60710 ("MEDIUM: cli: Add payload support"), a chunk is
temporarily allocated for the CLI to support defragmenting a payload
passed with a command. However it's only released when passing via
the CLI_ST_END state (i.e. on clean shutdown), but not on errors.
Something as trivial as:
$ while :; do ncat --send-only -U /path/to/cli <<< "show stat"; done
with a few hundreds of servers is enough see the number of allocated
trash chunks go through the roof in "show pools".
This needs to be backported as far as 2.0.
(cherry picked from commit
18b2a9dd874b66acca304580047fa6b3c16da1e3)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Christopher Faulet [Mon, 3 May 2021 08:11:13 +0000 (10:11 +0200)]
BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
When the lua buffers are used, a variable number of stack slots may be
used. Thus we cannot assume that we know where the top of the stack is. It
was not an issue for lua < 5.4.3 (at least for small buffers). But
'socket:receive()' now fails with lua 5.4.3 because a light userdata is
systematically pushed on the top of the stack when a buffer is initialized.
To fix the bug, in hlua_socket_receive(), we save the index of the top of
the stack before creating the buffer. This way, we can check the number of
arguments, regardless anything was pushed on the stack or not.
Note that the other buffer usages seem to be safe.
This patch should solve the issue #1240. It should be backport to all stable
branches.
(cherry picked from commit
c31b2008728babbc8738b88b593f2be40324369b)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Willy Tarreau [Thu, 29 Apr 2021 14:17:23 +0000 (16:17 +0200)]
REGTESTS: add minimal CLI "add map" tests
The map_redirect test already tests for "show map", "del map" and
"clear map" but doesn't have any "add map" command. Let's add some
trivial ones involving one regular entry and two other ones added as
payload, checking they are properly returned.
(cherry picked from commit
deee369cfabdea713e232c94c5d15f9613109fe9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
Emeric Brun [Wed, 28 Apr 2021 10:59:35 +0000 (12:59 +0200)]
MINOR: peers: add informative flags about resync process for debugging
This patch adds miscellenous informative flags raised during the initial
full resync process performed during the reload for debugging purpose.
0x00000010: Timeout waiting for a full resync from a local node
0x00000020: Timeout waiting for a full resync from a remote node
0x00000040: Session aborted learning from a local node
0x00000080: Session aborted learning from a remote node
0x00000100: A local node teach us and was fully up to date
0x00000200: A remote node teach us and was fully up to date
0x00000400: A local node teach us but was partially up to date
0x00000800: A remote node teach us but was partially up to date
0x00001000: A local node was assigned for a full resync
0x00002000: A remote node was assigned for a full resync
0x00004000: A resync was explicitly requested
This patch could be backported on any supported branch
(cherry picked from commit
ccdfbae62cc5e5023bc857cbc04f680145435a1e)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 20 Apr 2021 12:43:46 +0000 (14:43 +0200)]
BUG/MEDIUM: peers: reset tables stage flags stages on new conns
Flags used as context to know current status of each table pushing a
full resync to a peer were correctly reset receiving a new resync
request or confirmation message but in case of local peer sync during
reload the resync request is implicit and those flags were not
correctly reset in this case.
This could result to a partial initial resync of some tables after reload
if the connection with the old process was broken and retried.
This patch reset those flags at the end of the handshake for all new
connections to be sure to push a entire full resync if needed.
This patch should be backported on all supported branches ( v >= 1.6 )
(cherry picked from commit
1a6b43e13eeb4872dff9944064ddadc1ceb5178f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Wed, 28 Apr 2021 09:48:15 +0000 (11:48 +0200)]
BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
Only entries between the opposite of the last 'local update' rotating
counter were considered to be pushed. This processing worked in most
cases because updates are continually pushed trying to reach this point
but it remains some cases where updates id are more far away in the past
and appearing in futur and the push of updates is stuck until the head
reach again the tail which could take a very long time.
This patch re-work the lookup to consider that all positions on the
rotating counter is considered in the past until we reach exactly
the 'local update' value. Doing this, the updates push won't be stuck
anymore.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
8e7a13ed66aad97c550f3a6600145ed7c757a894)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 23 Feb 2021 16:08:08 +0000 (17:08 +0100)]
BUG/MEDIUM: peers: reset commitupdate value in new conns
The commitupdate value of the table is used to check if the update
is still pending for a push for all peers. To be sure to not miss a
push we reset it just after a handshake success.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
cc9cce9351c2d64476dbe2f35c43cde8a3f97ebe)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Tue, 23 Feb 2021 15:50:53 +0000 (16:50 +0100)]
BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
If two peers are disconnected and during this period they continue to
process a large amount of local updates, after a reconnection they
may take a long time before restarting to push their updates. because
the last pushed update would appear internally in futur.
This patch fix this resetting the cursor on acked updates at the maximum
point considered in the past if it appears in futur but it means we
may lost some updates. A clean fix would be to update the protocol to
be able to signal a remote peer that is was not updated for a too long
period and needs a full resync but this is not yet supported by the
protocol.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
d9729da98262f2136ad4eac44c3ec2f710cb4a49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Thu, 4 Mar 2021 09:27:10 +0000 (10:27 +0100)]
BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
The re-con cursor was updated receiving any ack message
even if we are pushing a complete resync to a peer. This cursor
is reset at the end of the resync but if the connection is broken
during resync, we could re-start at an unwanted point.
With this patch, the peer stops to consider ack messages pushing
a resync since the resync process has is own acknowlegement and
is always restarted from the beginning in case of broken connection.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
b0d60bed36cfbfead5a35e6a9520e8d5e9345a7f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Wed, 28 Apr 2021 07:49:33 +0000 (09:49 +0200)]
BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
Receiving a resync request, the origins to start the full sync and
to reset after the full resync are mistakenly computed based on
the last update on the table instead of computed based on the
the last update acked by the node requesting the resync.
It could result in disordered or missing updates pushing to the
requester
This patch sets correctly those origins.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
437e48ad9244eaea63881b6c8a86a380c4eecc23)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Wed, 21 Apr 2021 14:06:35 +0000 (16:06 +0200)]
BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
If a reload is performed and there is no incoming connections
from the old process to push a full resync, the new process
can be stuck waiting indefinitely for this conn and it never tries a
fallback requesting a full resync from a remote peer because the resync
timer was init to TICK_ETERNITY.
This patch forces a reset of the resync timer to default value (5 secs)
if we detect value is TICK_ETERNITY.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit
2c4ab4181601b1956095eb9b2c9d99984828ccf9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 27 Apr 2021 15:08:10 +0000 (17:08 +0200)]
BUG/MINOR: applet: Notify the other side if data were consumed by an applet
If an applet consumed output data (the amount of output data has changed
between before and after the call to the applet), the producer is
notified. It means CF_WRITE_PARTIAL and CF_WROTE_DATA are set on the output
channel and the opposite stream interface is notified some room was made in
its input buffer. This way, it is no longer the applet responsibility to
take care of it. However, it doesn't matter if the applet does the same.
Said like that, it looks like an improvement not a bug. But it really fixes
a bug in the lua, for HTTP applets. Indeed, applet:receive() and
applet:getline() are buggy for HTTP applets. Data are consumed but the
producer is not notified. It means if the payload is not fully received in
one time, the applet may be blocked because the producer remains blocked (it
is time dependent).
This patch must be backported as far as 2.0 (only for the HTX part).
(cherry picked from commit
1eedf9b4cb71da642399ae1efbfad2c1cdad6c33)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 22 Apr 2021 07:43:47 +0000 (09:43 +0200)]
BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
When all data of an HTX message are drained, we rely on htx_reset() to
reinit the message state. However, the flags must be preserved. It is, among
other things, important to preserve processing or parsing errors.
This patch must be backported as far as 2.0.
(cherry picked from commit
5e9b24f4b4e31958d97ce445bb993feb7ebc1421)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Mon, 26 Apr 2021 07:38:55 +0000 (09:38 +0200)]
BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
A normalized URI is the internal term used to specify an URI is stored using
the absolute format (scheme + authority + path). For now, it is only used
for H2 clients. It is the default and recommended format for H2 request.
However, it is unusual for H1 servers to receive such URI. So in this case,
we only send the path of the absolute URI. It is performed for H1 servers,
but not for FCGI applications. This patch fixes the difference.
Note that it is not a real bug, because FCGI applications should support
abosolute URI.
Note also a normalized URI is only detected for H2 clients when a request is
received. There is no such test on the H1 side. It means an absolute URI
received from an H1 client will be sent without modification to an H1 server
or a FCGI application.
To make it possible, a dedicated function has been added to get the H1
URI. This function is called by the H1 and the FCGI multiplexer when a
request is sent to a server.
This patch should fix the issue #1232. It must be backported as far as 2.2.
(cherry picked from commit
fb38c910f8f2bdf1b32ec74824d8f8a8f9a29a8f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 23 Apr 2021 16:51:30 +0000 (18:51 +0200)]
[RELEASE] Released version 2.3.10
Released version 2.3.10 with the following main changes :
- BUILD: backend: fix build breakage in idle conn locking fix
- BUG/MINOR: tcp: fix silent-drop workaround for IPv6
- BUILD: tcp: use IPPROTO_IPV6 instead of SOL_IPV6 on FreeBSD/MacOS
- BUG/MINOR: ssl: Fix update of default certificate
- BUG/MINOR: ssl: Prevent removal of crt-list line if the instance is a default one
- BUG/MINOR: http_fetch: make hdr_ip() resistant to empty fields
- BUG/MINOR: ssl: Add missing free on SSL_CTX in ckch_inst_free
- REGTESTS: ssl: "set ssl cert" and multi-certificates bundle
- DOC: Explicitly state only IPv4 are supported by forwardfor/originalto options
- REGTESTS: ssl: mark set_ssl_cert_bundle.vtc as broken
- CONTRIB: halog: fix issue with array of type char
- BUG/MINOR: tools: fix parsing "us" unit for timers
- DOC: clarify that compression works for HTTP/2
- MINOR: No longer rely on deprecated sample fetches for predefined ACLs
- BUG/MEDIUM: sample: Fix adjusting size in field converter
- DOC: ssl: Certificate hot update only works on fronted certificates
- BUG/MEDIUM: threads: Ignore current thread to end its harmless period
- BUG/MINOR: checks: Set missing id to the dummy checks frontend
- MINOR: logs: Add support of checks as session origin to format lf strings
- BUG/MINOR: connection: Fix fc_http_major and bc_http_major for TCP connections
- MINOR: connection: Make bc_http_major compatible with tcp-checks
- BUG/MINOR: ssl-samples: Fix ssl_bc_* samples when called from a health-check
- BUG/MINOR: http-fetch: Make method smp safe if headers were already forwarded
- BUG/MINOR: http_htx: Remove BUG_ON() from http_get_stline() function
- BUG/MINOR: logs: Report the true number of retries if there was no connection
- BUG/MINOR: mux-h1: Release idle server H1 connection if data are received
- BUG/MINOR: server: free srv.lb_nodes in free_server
- BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
- BUG/MEDIUM: mux-h2: Fix dfl calculation when merging CONTINUATION frames
- BUG/MEDIUM: config: fix cpu-map notation with both process and threads
- BUG/MINOR: mworker/init: don't reset nb_oldpids in non-mworker cases
- BUG/MINOR: mworker: don't use oldpids[] anymore for reload
- BUG/MEDIUM: mux-h2: Properly handle shutdowns when received with data
- BUG/MINOR: peers: remove useless table check if initial resync is finished
- BUG/MEDIUM: peers: re-work connection to new process during reload.
- BUG/MEDIUM: peers: re-work refcnt on table to protect against flush
Emeric Brun [Fri, 23 Apr 2021 15:02:22 +0000 (17:02 +0200)]
BUG/MEDIUM: peers: re-work refcnt on table to protect against flush
In proxy.c, when process is stopping we try to flush tables content
using 'stktable_trash_oldest'. A check on a counter "table->syncing" was
made to verify if there is no pending resync in progress.
But using multiple threads this counter can be increased by an other thread
only after some delay, so the content of some tables can be trashed earlier and
won't be pushed to the new process (after reload, some tables appear reset and
others don't).
This patch re-names the counter "table->syncing" to "table->refcnt" and
the counter is increased during configuration parsing (registering a table to
a peer section) to protect tables during runtime and until resync of a new
process has succeeded or failed.
The inc/dec operations are now made using atomic operations
because multiple peer sections could refer to the same table in futur.
This fix addresses github #1216.
This patch should be backported on all branches multi-thread support (v >= 1.8)
(cherry picked from commit
2cc201f97e409f926b9221fdafe7431877ba1dc6)
Signed-off-by: Emeric Brun <ebrun@haproxy.com>
Emeric Brun [Thu, 22 Apr 2021 16:20:37 +0000 (18:20 +0200)]
BUG/MEDIUM: peers: re-work connection to new process during reload.
The peers task handling the "stopping" could wake up multiple
times in stopping state with WOKEN_SIGNAL: the connection to the
local peer initiated on the first processing was immediatly
shutdown by the next processing of the task and the old process
exits considering it is unable to connect. It results on
empty stick-tables after a reload.
This patch checks the flag 'PEERS_F_DONOTSTOP' to know if the
signal is considered and if remote peers connections shutdown
is already done or if a connection to the local peer must be
established.
This patch should be backported on all supported branches (v >= 1.6)
(cherry picked from commit
cbfe5ebc1cc362b0ca33206ba6f49923587a372b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Emeric Brun [Thu, 22 Apr 2021 16:13:13 +0000 (18:13 +0200)]
BUG/MINOR: peers: remove useless table check if initial resync is finished
The old process checked each table resync status even if
the resync process is finished. This behavior had no known impact
except useless processing and was discovered during debugging on
an other issue.
This patch could be backported in all supported branches (v >= 1.6)
but once again, it has no impact except avoid useless processing.
(cherry picked from commit
1675ada4f4fda2dfc51c8bd1a4f24ec8513ae473)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 23 Apr 2021 10:25:18 +0000 (12:25 +0200)]
BUG/MEDIUM: mux-h2: Properly handle shutdowns when received with data
The H2_CF_RCVD_SHUT flag is used to report a read0 was encountered. It is
used by the H2 mux to properly handle shutdowns. However, this flag is only
set when no data are received. If it is detected at the socket level when
some data are received, it is not handled. And because the event was
reported on the connection, any other read attempts are blocked. In this
case, we are unable to close the connection and release the mux
immediately. We must wait the mux timeout expires.
This patch should fix the issue #1231. It must be backported as far as 2.0.
(cherry picked from commit
de9d605aa56d257e3720f080a8c7f30b54fa6543)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Wed, 21 Apr 2021 14:55:34 +0000 (16:55 +0200)]
BUG/MINOR: mworker: don't use oldpids[] anymore for reload
Since commit 3f12887 ("MINOR: mworker: don't use children variable
anymore"), the oldpids array is not used anymore to generate the new -sf
parameters. So we don't need to set nb_oldpids to 0 during the first
start of the master process.
This patch fixes a bug when 2 masters process tries to synchronize their
peers, there is a small chances that it won't work because nb_oldpids
equals 0.
Should be backported as far as 2.0.
(cherry picked from commit
aba7f8b3132de3b29efe634cad8fb574f259549b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Wed, 21 Apr 2021 14:42:18 +0000 (16:42 +0200)]
BUG/MINOR: mworker/init: don't reset nb_oldpids in non-mworker cases
This bug affects the peers synchronisation code which rely on the
nb_oldpids variable to synchronize the peer from the old PID.
In the case the process is not started in master-worker mode and tries
to synchronize using the peers, there is a small chance that won't work
because nb_oldpids equals 0.
Fix the bug by setting the variable to 0 only in the case of the
master-worker when not reloaded.
It could also be a problem when trying to synchronize the peers between
2 masters process which should be fixed in another patch.
Bug exists since commit 8a361b5 ("BUG/MEDIUM: mworker: don't reuse PIDs
passed to the master").
Sould be backported as far as 1.8.
(cherry picked from commit
ea6bf83d621ceeaf3f7e7de59b45fa00940b784d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Thu, 15 Apr 2021 14:29:58 +0000 (16:29 +0200)]
BUG/MEDIUM: config: fix cpu-map notation with both process and threads
The application of a cpu-map statement with both process and threads
is broken (P-Q/1 or 1/P-Q notation).
For example, before the fix, when using P-Q/1, proc_t1 would be updated.
Then it would be AND'ed with thread which is still 0 and thus does
nothing.
Another problem is when using 1/1[-Q], thread[0] is defined. But if
there is multiple processes, every processes will use this define
affinity even if it should be applied only to 1st process.
The solution to the fix is a little bit too complex for my taste and
there is maybe a simpler solution but I did not wish to break the
storage of global.cpu_map, as it is quite painful to test all the
use-cases. Besides, this code will probably be clean up when
multiprocess support removed on the future version.
Let's try to explain my logic.
* either haproxy runs in multiprocess or multithread mode. If on
multiprocess, we should consider proc_t1 (P-Q/1 notation). If on
multithread, we should consider thread (1/P-Q notation). However
during parsing, the final number of processes or threads is unknown,
thus we have to consider the two possibilities.
* there is a special case for the first thread / first process which is
present in both execution modes. And as a matter of fact cpu-map 1 or
1/1 notation represents the same thing. Thus, thread[0] and proc_t1[0]
represents the same thing. To solve this problem, only thread[0] is
used for this special case.
This fix must be backported up to 2.0.
(cherry picked from commit
af02c574064873cac3fb10993293d34a4eb94d2f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 21 Apr 2021 09:11:21 +0000 (11:11 +0200)]
BUG/MEDIUM: mux-h2: Fix dfl calculation when merging CONTINUATION frames
When header are splitted over several frames, payload of HEADERS and
CONTINUATION frames are merged to form a unique HEADERS frame before
decoding the payload. To do so, info about the current frame are updated
(dff, dfl..) with info of the next one. Here there is a bug when the frame
length (dfl) is update. We must add the next frame length (hdr.dfl) and not
only the amount of data found in the buffer (clen). Because HEADERS frames
are decoded in one pass, dfl value is the whole frame length or 0. nothing
intermediary.
This patch must be backported as far as 2.0.
(cherry picked from commit
cb1847c77285ba6dbd413774fcf2282cafa19bd2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 21 Apr 2021 08:39:53 +0000 (10:39 +0200)]
BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
In the function decoding payload of HEADERS frames, an internal error is
returned if the frame length is too large. it cannot exceed the buffer
size. The same is true when headers are splitted on several frames. The
payload of HEADERS and CONTINUATION frames are merged and the overall size
must not exceed the buffer size.
However, there is a bug when the current frame is big enough to only have
the space for a part of the header of the next frame. Because, in this case,
we wait for more data, to have the whole frame header. We don't properly
detect that the headers are too large to be stored in one buffer. In fact
the test to trigger this error is not accurate. When the buffer is full, the
error is reported if the frame length exceeds the amount of data in the
buffer. But in reality, an error must be reported when we are unable to
decode the current frame while the buffer is full. Because, in this case, we
know there is no way to change this state.
When the bug happens, the H2 connection is woken up in loop, consumming all
the CPU. But the traffic is not blocked for all that.
This patch must be backported as far as 2.0.
(cherry picked from commit
07f88d7582c80522b1e83b9bbc473d338e48fb85)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Amaury Denoyelle [Tue, 20 Apr 2021 14:48:22 +0000 (16:48 +0200)]
BUG/MINOR: server: free srv.lb_nodes in free_server
lb_nodes is allocated for servers using lb_chash (balance random or
hash-type consistent).
It can be backported up to 1.8.
(cherry picked from commit
fb247946a1b0a6dc6b1e6b27631adf1ef40c0ee3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
[Cf: Patch applied on haproxy.c because servers are freed in deinit()]
Christopher Faulet [Mon, 19 Apr 2021 17:57:43 +0000 (19:57 +0200)]
BUG/MINOR: mux-h1: Release idle server H1 connection if data are received
When data are received on an idle H1 connection on server side, we must take
care to release the connection. Most of time, it will be a
408-Request-Time-out response. But, in all cases, these data should never be
processed as a response to a client.
There is no upstream ID because the 2.4 is not affected by this bug. It must
be backported as far as 2.0.
Christopher Faulet [Fri, 16 Apr 2021 09:24:20 +0000 (11:24 +0200)]
BUG/MINOR: logs: Report the true number of retries if there was no connection
When the session is aborted before any connection attempt to any server, the
number of connection retries reported in the logs is wrong. It happens
because when the retries counter is not strictly positive, we consider the
max number of retries was reached and the backend retries value is used. It
is obviously wrong when no connectioh was performed.
In fact, at this stage, the retries counter is initialized to 0. But the
backend stream-interface is in the INI state. Once it is set to SI_ST_REQ,
the counter is set to the backend value. And it is the only possible state
transition from INI state. Thus it is safe to rely on it to fix the bug.
This patch must be backported to all stable versions.
(cherry picked from commit
1d26f22e0568d6761b1e0a522079b23387bace80)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 15 Apr 2021 08:25:35 +0000 (10:25 +0200)]
BUG/MINOR: http_htx: Remove BUG_ON() from http_get_stline() function
The http_get_stline() was designed to be called from HTTP analyzers. Thus
before any data forwarding. To prevent any invalid usage, two BUG_ON()
statements were added. However, it is not a good idea because it is pretty
hard to be sure no HTTP sample fetch will never be called outside the
analyzers context. Especially because there is at least one possible area
where it may happens. An HTTP sample fetch may be used inside the unique-id
format string. On the normal case, it is generated in AN_REQ_HTTP_INNER
analyzer. But if an error is reported too early, the id is generated when
the log is emitted.
So, it is safer to remove the BUG_ON() statements and consider the normal
behavior is to return NULL if the first block is not a start-line. Of
course, this means all calling functions must test the return value or be
sure the start-line is really there.
This patch must be backported as far as 2.0.
(cherry picked from commit
a7d6cf24fb11b723f926e6e562f6c8e97b2935d3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 15 Apr 2021 07:28:02 +0000 (09:28 +0200)]
BUG/MINOR: http-fetch: Make method smp safe if headers were already forwarded
When method sample fetch is called, if an exotic method is found
(HTTP_METH_OTHER), when smp_prefetch_htx() is called, we must be sure the
start-line is still there. Otherwise, HAproxy may crash because of a NULL
pointer dereference, for instance if the method sample fetch is used inside
a unique-id format string. Indeed, the unique id may be generated when the
log message is emitted. At this stage, the request channel is empty.
This patch must be backported as far as 2.0. But the bug exists in all
stable versions for the legacy HTTP mode too. Thus it must be adapted to the
legacy HTTP mode and backported to all other stable versions.
(cherry picked from commit
6f97a611c83d7135d77407f7327980994413b842)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 14 Apr 2021 13:49:41 +0000 (15:49 +0200)]
BUG/MINOR: ssl-samples: Fix ssl_bc_* samples when called from a health-check
For all ssl_bc_* sample fetches, the test on the keyword when called from a
health-check is inverted. We must be sure the 5th charater is a 'b' to
retrieve a connection.
This patch must be backported as far as 2.2.
(cherry picked from commit
4bef8d1d46ba596f1cc1595a63a01b6d9ae0f6cc)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 14 Apr 2021 13:46:49 +0000 (15:46 +0200)]
MINOR: connection: Make bc_http_major compatible with tcp-checks
bc_http_major sample fetch now works when it is called from a
tcp-check. When it happens, the session origin is a check. The backend
connection is retrieved from the conn-stream attached to the check.
If required, this path may easily be backported as far as 2.2.
(cherry picked from commit
242f8ce060e4d77da16ae43a58598a28b556ee5b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 14 Apr 2021 13:40:30 +0000 (15:40 +0200)]
BUG/MINOR: connection: Fix fc_http_major and bc_http_major for TCP connections
fc_http_major and bc_http_major sample fetches return the major digit of the
HTTP version used, respectively, by the frontend and the backend
connections, based on the mux. However, in reality, "2" is returned if the
H2 mux is detected, otherwise "1" is inconditionally returned, regardless
the mux used. Thus, if called for a raw TCP connection, "1" is returned.
To fix this bug, we now get the multiplexer flags, if there is one, to be
sure MX_FL_HTX is set.
I guess it was made this way on purpose when the H2 multiplexer was
introduced in the 1.8 and with the legacy HTTP mode there is no other
solution at the connection level. Thus this patch should be backported as
far as 2.2. For the 2.0, it must be evaluated first because of the legacy
HTTP mode.
(cherry picked from commit
f4dd9ae5c7ca65cd5c2652343e4af5ca7a97ab32)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 14 Apr 2021 12:01:41 +0000 (14:01 +0200)]
MINOR: logs: Add support of checks as session origin to format lf strings
When a log-format string is built from an health-check, the session origin
is the health-check itself and not a connection. In addition, there is no
stream. It means for now some formats are not supported: %s, %sc, %b, %bi,
%bp, %si and %sp.
Thanks to this patch, the session origin is converted to a check. So it is
possible to retrieve the backend and the backend connection. Note this
session have no listener, thus %ft format must be guarded.
This patch is light and standalone, thus it may be backported as far as 2.2
if required. However, because the error is human, it is probably better to
wait a bit to be sure everything is properly protected.
(cherry picked from commit
fd81848c2210fd88fdaa088723240930f3db4d63)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Fri, 16 Apr 2021 08:49:07 +0000 (10:49 +0200)]
BUG/MINOR: checks: Set missing id to the dummy checks frontend
The dummy frontend used to create the session of the tcp-checks is
initialized without identifier. However, it is required because this id may
be used without any guard, for instance in log-format string via "%f" or
when fe_name sample fetch is called. Thus, an unset id may lead to crashes.
This patch must be backported as far as 2.2.
(cherry picked from commit
0f1fc23d4e3060c4277c11c9f1f41c27d8e54fbf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Wed, 14 Apr 2021 12:02:25 +0000 (14:02 +0200)]
BUG/MEDIUM: threads: Ignore current thread to end its harmless period
A previous patch was pushed to fix a deadlock when an isolated thread ends
its harmless period (
a9a9e9aac ["BUG/MEDIUM: thread: Fix a deadlock if an
isolated thread is marked as harmless"]). But, unfortunately, the fix is
incomplete. The same must be done in the outer loop, in
thread_harmless_end() function. The current thread must be ignored when
threads_want_rdv_mask mask is tested.
This patch must also be backported as far as 2.0.
(cherry picked from commit
f63a18550073e821976606f9602c261976939ae9)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Remi Tricot-Le Breton [Wed, 14 Apr 2021 14:19:28 +0000 (16:19 +0200)]
DOC: ssl: Certificate hot update only works on fronted certificates
The CLI's "set ssl cert" command only works on frontend certificates but
the documentation did not specify this limitations yet.
This patch can be backported to all stable branches.
(cherry picked from commit
3445909a63697c46a86961a93fc3e4a79a835b87)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Thayne McCombs [Mon, 12 Apr 2021 05:26:59 +0000 (23:26 -0600)]
BUG/MEDIUM: sample: Fix adjusting size in field converter
Adjust the size of the sample buffer before we change the "area"
pointer. The change in size is calculated as the difference between the
original pointer and the new start pointer. But since the
`smp->data.u.str.area` assignment results in `smp->data.u.str.area` and
`start` being the same pointer, we always ended up substracting zero.
This changes it to change the size by the actual amount it changed.
I'm not entirely sure what the impact of this is, but the previous code
seemed wrong.
[wt: from what I can see the only harmful case is when the output is
converted to a stick-table key, it could result in zeroing past the
end of the buffer; other cases do not touch beyond ->data]
(cherry picked from commit
b28430591d18f7fda5bac2e0ea590b3a34f04601)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Thu, 1 Apr 2021 15:24:04 +0000 (17:24 +0200)]
MINOR: No longer rely on deprecated sample fetches for predefined ACLs
Some predefined ACLs were still based on deprecated sample fetches, like
req_proto_http or req_ver. Now, they use non-deprecated sample fetches. In
addition, the usage lines in the configuration manual have been updated to
be more explicit.
(cherry picked from commit
779184e35e4468d86419fb1935ca3491f1081d0b)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Julien Pivotto [Mon, 29 Mar 2021 10:41:40 +0000 (12:41 +0200)]
DOC: clarify that compression works for HTTP/2
This patch clarifies that compression also works with HTTP/2. I have
picked the wording "HTTP/1.1 or above" because it is already used
elsewhere in the documentation.
I have tested that compression indeed works in HTTP/2.
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
(cherry picked from commit
ff80c82877ee9765b1bde3b03a5d61be8d3bf9d2)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Thayne McCombs [Fri, 2 Apr 2021 20:12:43 +0000 (14:12 -0600)]
BUG/MINOR: tools: fix parsing "us" unit for timers
Commit
c20ad0d8dbd1bb5707bbfe23632415c3062e046c (BUG/MINOR: tools: make
parse_time_err() more strict on the timer validity) broke parsing the "us"
unit in timers. It caused `parse_time_err()` to return the string "s",
which indicates an error.
Now if the "u" is followed by an "s" we properly continue processing the
time instead of immediately failing.
This fixes #1209. It must be backported to all stable versions.
(cherry picked from commit
a68380524b5b47cd77f5f4a47c1441b3c5b2cf93)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Willy Tarreau [Fri, 2 Apr 2021 15:47:21 +0000 (17:47 +0200)]
CONTRIB: halog: fix issue with array of type char
I just noticed this in the windows build after moving the file to dev/:
In file included from include/import/ist.h:32,
from include/haproxy/connection-t.h:32,
from dev/flags/flags.c:5:
dev/flags/flags.c: In function `main':
dev/flags/flags.c:442:20: error: array subscript has type `char' [-Werror=char-subscripts]
442 | (isalnum(*err) && toupper(*err) != 'U' && toupper(*err) != 'L'))
| ^~~~
LD haproxy
cc1: all warnings being treated as errors
make: *** [Makefile:932: dev/flags/flags.o] Error 1
make: *** Waiting for unfinished jobs....
Error: Process completed with exit code 2.
Let's just cast it to uchar as is done everywhere else.
(cherry picked from commit
b00c00e82ccffe61cb6c9de78c0a6dd6a6a40207)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
William Lallemand [Fri, 2 Apr 2021 15:01:25 +0000 (17:01 +0200)]
REGTESTS: ssl: mark set_ssl_cert_bundle.vtc as broken
set_ssl_cert_bundle.vtc requires at least OpenSSL 1.1.0 and we don't
have any way to check this when launching the reg-tests suite.
Mark the reg-test as broken since it will fails on old versions of
openSSL and libreSSL.
(cherry picked from commit
a1e832b8671b12c703e5cb6315afadba4ce73c12)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
Christopher Faulet [Tue, 6 Apr 2021 07:01:09 +0000 (09:01 +0200)]
DOC: Explicitly state only IPv4 are supported by forwardfor/originalto options
IPv6 support was added in the 2.4. However, on previous versions, it is not
explicit that only IPv4 addresses are supported. In addition, a workaround
solution is proposed to add IPv6 addresses.
This patch is related to issue #1145. There is no upstream commit ID. It may
be backported to any stable versions.