path: root/tests
diff options
authorJozsef Kadlecsik <>2020-02-07 20:41:32 +0100
committerJozsef Kadlecsik <>2020-02-18 11:19:20 +0100
commit33f08da283244d87a5fdfd281d04548bd68f6a93 (patch)
treeba995db74e7fc44450c451f26a086489343e5f85 /tests
parent61a8afcc9cc0311a607688923c76337f266b05b1 (diff)
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: Reported-by: Reported-by: Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <>
Diffstat (limited to 'tests')
1 files changed, 10 insertions, 0 deletions
diff --git a/tests/ b/tests/
index bca3253..f101ab4 100755
--- a/tests/
+++ b/tests/
@@ -125,11 +125,21 @@ counter)
./ -p ipv4 -id -is -p udp -ud 80 -us 1025 >/dev/null 2>&1
./ -p ipv4 -id -is -p udp -ud 80 -us 1025 >/dev/null 2>&1
+ $ipset n test hash:ip hashsize 4096 maxelem 655360 2>/dev/null
+ $cmd -t raw -A OUTPUT -j SET --add-set test src
+ $cmd -t raw -A OUTPUT -s -j DROP
+ $cmd -t raw -A OUTPUT -s -j DROP
+ ./ &
+ $ipset restore < resize_target.set
+ ;;
$cmd -F
$cmd -X
$cmd -F -t mangle
$cmd -X -t mangle
+ $cmd -F -t raw
+ $cmd -X -t raw
$ipset -F 2>/dev/null
$ipset -X 2>/dev/null