From: Willy Tarreau Date: Fri, 10 Oct 2025 09:28:35 +0000 (+0200) Subject: BUILD: makefile: disable tail calls optimizations with memory profiling X-Git-Url: http://git.haproxy.org/?a=commitdiff_plain;h=696514866998bdc1d97158aa79d9ee65d2f7707d;p=haproxy-3.0.git BUILD: makefile: disable tail calls optimizations with memory profiling The purpose of memory profiling precisely is to figure what function allocates and what function frees for specific objects. It turns out that a non-negligible number of release callbacks basically do nothing but a free() or pool_free() call and return, which the compiler happily turns into a jump, making the caller of that callback appear as the real one. That's how we can see libcrypto release to pools such as ssl-capture for example, which also makes the per-DSO calls appear wrong: 10000 0 10720000 0| 0x448c8d ssl_async_fd_free+0x3b9d p_alloc(1072) [pool=ssl-capture] 50000 0 6800000 0| 0x4456b9 ssl_async_fd_free+0x5c9 p_alloc(136) [pool=ssl-keylogf] 10072 0 644608 0| 0x447f14 ssl_async_fd_free+0x2e24 p_alloc(64) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445987 ssl_async_fd_free+0x897 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x4459b8 ssl_async_fd_free+0x8c8 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x4459e9 ssl_async_fd_free+0x8f9 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445a1a ssl_async_fd_free+0x92a p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445a4b ssl_async_fd_free+0x95b p_free(-136) [pool=ssl-keylogf] 0 20072 0 11364608| 0x7f5f1397db62 libcrypto:CRYPTO_free_ex_data+0xf2/0x261 p_free(-566) [pool=ssl-keylogf] [locked=72 (0.3 %)] Worse, as can be seen on the last line above, there can be a single pool per call place (since we don't release to arbitrary pools), and the stats are misleading by reporting the first used pool only when a same function can call multiple release callbacks. This is why the free call totals 10k ssl-capture and 10072 ssl-keylogfile. Let's just disable tail call optimization when using memory profiling. The gains are only very marginal and complicate so much the debugging that it's not worth it. Now the output is correct, and no longer claims that libcrypto is the caller: 10000 0 10720000 0| 0x448c9f ssl_async_fd_free+0x3b9f p_alloc(1072) [pool=ssl-capture] 0 10000 0 10720000| 0x445af0 ssl_async_fd_free+0x9f0 p_free(-1072) [pool=ssl-capture] 50000 0 6800000 0| 0x4456c9 ssl_async_fd_free+0x5c9 p_alloc(136) [pool=ssl-keylogf] 10177 0 1221240 0| 0x45543d ssl_async_fd_handler+0xb51d p_alloc(120) [pool=ssl_sock_ct] [locked=165 (1.6 %)] 10061 0 643904 0| 0x447f1c ssl_async_fd_free+0x2e1c p_alloc(64) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445987 ssl_async_fd_free+0x887 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x4459b8 ssl_async_fd_free+0x8b8 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x4459e9 ssl_async_fd_free+0x8e9 p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445a1a ssl_async_fd_free+0x91a p_free(-136) [pool=ssl-keylogf] 0 10000 0 1360000| 0x445a4b ssl_async_fd_free+0x94b p_free(-136) [pool=ssl-keylogf] 0 10188 0 1222560| 0x44f518 ssl_async_fd_handler+0x55f8 p_free(-120) [pool=ssl_sock_ct] [locked=176 (1.7 %)] 0 10072 0 644608| 0x445aa6 ssl_async_fd_free+0x9a6 p_free(-64) [pool=ssl-keylogf] [locked=72 (0.7 %)] An attempt was made to only instrument pool_free() to place a compiler barrier, but that resulted in much larger code and wouldn't cover functions ending with a simple "free()" call. "ha_free()" however is already immune against tail call optimization since it has to write the NULL when returning from free(). This should be backported to recent stable releases that are still regularly being debugged. (cherry picked from commit dfe7fa9349b73858fa40e3ddcc2cce913c5f8af8) Signed-off-by: Willy Tarreau (cherry picked from commit 1e8bd2b0e46c873c7e329bf75a4e603f628b6afc) Signed-off-by: Willy Tarreau (cherry picked from commit f5d0aaf639263c3fdc65055aca99319c270d1c32) Signed-off-by: Willy Tarreau --- diff --git a/Makefile b/Makefile index d2f56b9..199d94a 100644 --- a/Makefile +++ b/Makefile @@ -593,6 +593,10 @@ ifneq ($(USE_BACKTRACE:0=),) BACKTRACE_CFLAGS = -fno-omit-frame-pointer endif +ifneq ($(USE_MEMORY_PROFILING:0=),) + MEMORY_PROFILING_CFLAGS = -fno-optimize-sibling-calls +endif + ifneq ($(USE_CPU_AFFINITY:0=),) OPTIONS_OBJS += src/cpuset.o endif