Repositories - haproxy-3.0.git/commit

author	Willy Tarreau <w@1wt.eu>
	Mon, 11 Mar 2024 13:57:37 +0000 (14:57 +0100)
committer	Willy Tarreau <w@1wt.eu>
	Mon, 25 Mar 2024 17:34:19 +0000 (17:34 +0000)
commit	1e2311edbcccf31cbbd9d9ed637eb21187b8910d
tree	a5a0bd3d40b8baffc51ba1935f8b5a987fbd0b70	tree \| snapshot
parent	6c1b29d06fc7824d69d14599fe41daf79178bec7	commit \| diff

MAJOR: ring: implement a waiting queue in front of the ring

The queue-based approach consists in forcing threads to wait away from
the work area so as not to disturb the current writer, and to prepare
the work by grouping them in a queue. The last arrived takes the head
of the queue by placing its preinitialized ring cell there, becomes the
queue's leader, informs itself about the amount of previously accumulated
bytes so that when its turn comes, it immediately knows how much room is
needed to be released.

It can then take the whole queue with it, leaving an empty one for new
threads to come while it's releasing the room needed to copy everything.

By doing so we're cascading contention areas so that multiple parts can
work in parallel.

Note that we must never leave a write counter set to 0xFF at tail, and
this happens when a message cannot fit and we give up, because in this
case we're writing back tail_ofs, and only later we restore the counter.

The solution here is to make a special case when we're going to drop
the messages, and to write the readers count before restoring tail.

This already shows a tremendous performance gain on ARM (385k -> 4.8M),
thanks to the fact that now all waiting threads wait on the queue's
head instead of polluting the tail lock. On x86_64, the EPYC sees a big
boost at 24C48T (1.88M -> 3.82M) and a slowdown at 3C6T (6.0->4.45)
though this one is much less of a concern as so few threads need less
bandwidth than bigger counts.