From: Willy Tarreau Date: Mon, 15 Mar 2021 08:44:53 +0000 (+0100) Subject: MINOR: tools: do not sum squares of differences for word fingerprints X-Git-Tag: v2.4-dev13~37 X-Git-Url: http://git.haproxy.org/?a=commitdiff_plain;h=714c4c14d1cde8a9027c9806f74821b08f8a03ad;p=haproxy-2.5.git MINOR: tools: do not sum squares of differences for word fingerprints While sums of squares usually give excellent results in fixed-sise patterns, they don't work well to compare different sized ones such as when some sub-words are missing, because a word such as "server" contains "er" twice, which will rsult in an extra distance of at least 4 for just this e->r transition compared to another one missing it. This is one of the main reasons why "show conn" only proposes "show info" on the CLI. Maybe an improved approach consisting in using squares only for exact same lengths would work, but it would still make it difficult to spot reversed characters. --- diff --git a/src/tools.c b/src/tools.c index ffd167a..f39ec1e 100644 --- a/src/tools.c +++ b/src/tools.c @@ -5411,7 +5411,7 @@ void make_word_fingerprint(uint8_t *fp, const char *word) /* Return the distance between two word fingerprints created by function * make_word_fingerprint(). It's a positive integer calculated as the sum of - * the squares of the differences between each location. + * the differences between each location. */ int word_fingerprint_distance(const uint8_t *fp1, const uint8_t *fp2) { @@ -5419,7 +5419,7 @@ int word_fingerprint_distance(const uint8_t *fp1, const uint8_t *fp2) for (i = 0; i < 1024; i++) { k = (int)fp1[i] - (int)fp2[i]; - dist += k * k; + dist += abs(k); } return dist; }