summaryrefslogtreecommitdiffstats
path: root/kernel/net/netfilter/ipset/ip_set_core.c
Commit message (Collapse)AuthorAgeFilesLines
...
* netfilter: ipset: fix ip_set_byindex functionFlorent Fourcot2018-11-281-1/+1
| | | | | | | | | New function added by "Introduction of new commands and protocol version 7" is not working, since we return skb2 to user Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Correct workaround in patch "Fix calling ip_set() macro at dumping"Jozsef Kadlecsik2018-10-301-15/+4
| | | | | | As Pablo pointed out, in order to fix the bogus warnings, there's no need for the non-useful rcu_read_lock/unlock dancing. Call rcu_dereference_raw() instead, the ref_netlink protects the set.
* Introduction of new commands and protocol version 7Jozsef Kadlecsik2018-10-271-17/+149
| | | | | | | | | | | Two new commands (IPSET_CMD_GET_BYNAME, IPSET_CMD_GET_BYINDEX) are introduced. The new commands makes possible to eliminate the getsockopt operation (in iptables set/SET match/target) and thus use only netlink communication between userspace and kernel for ipset. With the new protocol version, userspace can exactly know which functionality is supported by the running kernel. Both the kernel and userspace is fully backward compatible.
* net: Convert ip_set_net_opsKirill Tkhai2018-10-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These pernet_operations initialize and destroy net_generic(net, ip_set_net_id)-related data. Since ip_set is under CONFIG_IP_SET, it's easy to watch drivers, which depend on this config. All of them are in net/netfilter/ipset directory, except of net/netfilter/xt_set.c. There are no more drivers, which use ip_set, and all of the above don't register another pernet_operations. Also, there are is no indirect users, as header file include/linux/netfilter/ipset/ip_set.h does not define indirect users by something like this: #ifdef CONFIG_IP_SET extern func(void); #else static inline func(void); #endif So, there are no more pernet operations, dereferencing net_generic(net, ip_set_net_id). ip_set_net_ops are OK to be executed in parallel for several net, so we mark them as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix calling ip_set() macro at dumpingJozsef Kadlecsik2018-10-191-4/+19
| | | | | | | | The ip_set() macro is called when either ip_set_ref_lock held only or no lock/nfnl mutex is held at dumping. Take this into account properly. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: fix ip_set_list allocation failureAndrey Ryabinin2018-09-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_set_create() and ip_set_net_init() attempt to allocate physically contiguous memory for ip_set_list. If memory is fragmented, the allocations could easily fail: vzctl: page allocation failure: order:7, mode:0xc0d0 Call Trace: dump_stack+0x19/0x1b warn_alloc_failed+0x110/0x180 __alloc_pages_nodemask+0x7bf/0xc60 alloc_pages_current+0x98/0x110 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x26/0xa0 __kmalloc+0x279/0x290 ip_set_net_init+0x4b/0x90 [ip_set] ops_init+0x3b/0xb0 setup_net+0xbb/0x170 copy_net_ns+0xf1/0x1c0 create_new_namespaces+0xf9/0x180 copy_namespaces+0x8e/0xd0 copy_process+0xb61/0x1a00 do_fork+0x91/0x320 Use kvcalloc() to fallback to 0-order allocations if high order page isn't available. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: list:set: Decrease refcount synchronously on deletion and replaceStefano Brivio2018-07-161-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") postponed decreasing set reference counters to the RCU callback. An 'ipset del' command can terminate before the RCU grace period is elapsed, and if sets are listed before then, the reference counter shown in userspace will be wrong: # ipset create h hash:ip; ipset create l list:set; ipset add l # ipset del l h; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 1 Number of entries: 0 Members: # sleep 1; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 0 Number of entries: 0 Members: Fix this by making the reference count update synchronous again. As a result, when sets are listed, ip_set_name_byindex() might now fetch a set whose reference count is already zero. Instead of relying on the reference count to protect against concurrent set renaming, grab ip_set_ref_lock as reader and copy the name, while holding the same lock in ip_set_rename() as writer instead. Reported-by: Li Shuang <shuali@redhat.com> Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Remove duplicate module descriptionJozsef Kadlecsik2018-01-291-8/+1
|
* netfilter: remove messages print and boot/module load timePablo Neira Ayuso2018-01-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Several reasons for this: * Several modules maintain internal version numbers, that they print at boot/module load time, that are not exposed to userspace, as a primitive mechanism to make revision number control from the earlier days of Netfilter. * IPset shows the protocol version at boot/module load time, instead display this via module description, as Jozsef suggested. * Remove copyright notice at boot/module load time in two spots, the Netfilter codebase is a collective development effort, if we would have to display copyrights for each contributor at boot/module load time for each extensions we have, we would probably fill up logs with lots of useless information - from a technical standpoint. So let's be consistent and remove them all. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: mark expected switch fall-throughsGustavo A. R. Silva2018-01-061-1/+1
| | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Missing nfnl_lock()/nfnl_unlock() is added to ip_set_net_exit()Jozsef Kadlecsik2018-01-041-0/+2
| | | | | Patch "netfilter: ipset: use nfnl_mutex_is_locked" is added the real mutex locking check, which revealed the missing locking in ip_set_net_exit().
* netfilter: ipset: use nfnl_mutex_is_lockedFlorian Westphal2018-01-041-1/+1
| | | | | | | | Check that we really hold nfnl mutex here instead of relying on correct usage alone. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: add resched points during set listingFlorian Westphal2018-01-041-2/+0
| | | | | | | | | | | | | | When sets are extremely large we can get softlockup during ipset -L. We could fix this by adding cond_resched_rcu() at the right location during iteration, but this only works if RCU nesting depth is 1. At this time entire variant->list() is called under under rcu_read_lock_bh. This used to be a read_lock_bh() but as rcu doesn't really lock anything, it does not appear to be needed, so remove it (ipset increments set reference count before this, so a set deletion should not be possible). Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* Fix "don't update counters" mode when counters used at the matchingJozsef Kadlecsik2018-01-041-0/+25
| | | | The matching of the counters was not taken into account, fixed.
* netfilter: ipset: Fix race between dump and swapRoss Lagerwall2017-09-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a race between ip_set_dump_start() and ip_set_swap(). The race is as follows: * Without holding the ref lock, ip_set_swap() checks ref_netlink of the set and it is 0. * ip_set_dump_start() takes a reference on the set. * ip_set_swap() does the swap (even though it now has a non-zero reference count). * ip_set_dump_start() gets the set from ip_set_list again which is now a different set since it has been swapped. * ip_set_dump_start() calls __ip_set_put_netlink() and hits a BUG_ON due to the reference count being 0. Fix this race by extending the critical region in which the ref lock is held to include checking the ref counts. The race can be reproduced with the following script: while :; do ipset destroy hash_ip1 ipset destroy hash_ip2 ipset create hash_ip1 hash:ip family inet hashsize 1024 \ maxelem 500000 ipset create hash_ip2 hash:ip family inet hashsize 300000 \ maxelem 500000 ipset create hash_ip3 hash:ip family inet hashsize 1024 \ maxelem 500000 ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 wait done Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: pernet ops must be unregistered lastFlorian Westphal2017-09-261-17/+26
| | | | | | | | | | | | | | | | Removing the ipset module leaves a small window where one cpu performs module removal while another runs a command like 'ipset flush'. ipset uses net_generic(), unregistering the pernet ops frees this storage area. Fix it by first removing the user-visible api handlers and the pernet ops last. Fixes: 1785e8f473082 ("netfiler: ipset: Add net namespace for ipset") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Backport patch: netfilter: nfnetlink: extended ACK reportingJozsef Kadlecsik2017-09-111-13/+26
|
* ipset: remove unused function __ip_set_get_netlinkAaron Conole2017-09-111-8/+0
| | | | | | | | There are no in-tree callers. Signed-off-by: Aaron Conole <aconole@bytheb.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Backport patch: netlink: pass extended ACK struct to parsing functionsJozsef Kadlecsik2017-09-111-19/+20
|
* Backport patch netlink: extended ACK reportingJozsef Kadlecsik2017-09-111-1/+1
|
* netfilter: Remove exceptional & on function nameArushi Singhal2017-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | Remove & from function pointers to conform to the style found elsewhere in the file. Done using the following semantic patch // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Backport nfnl_msg_type()Jozsef Kadlecsik2017-09-111-1/+1
|
* Fix sparse warningsJozsef Kadlecsik2017-03-231-1/+1
|
* netfilter: ipset: Remove unnecessary cast on void pointersimran singhal2017-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The following Coccinelle script was used to detect this: @r@ expression x; void* e; type T; identifier f; @@ ( *((T *)e) | ((T *)x)[...] | ((T*)x)->f | - (T*) e ) Signed-off-by: simran singhal <singhalsimran0@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: x_tables: Use par->net instead of computing from the passed net ↵Eric W. Biederman2016-10-131-6/+3
| | | | | | | | | | devices Backported from kernel tree. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: fix race condition in ipset save, swap and deleteVishwanath Pai2016-03-161-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine */ Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to *_head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix set:list type crash when flush/dump set in parallelJozsef Kadlecsik2016-02-241-0/+3
| | | | | | | Flushing/listing entries was not RCU safe, so parallel flush/dump could lead to kernel crash. Bug reported by Deniz Eren. Fixes netfilter bugzilla id #1050.
* netfilter: nfnetlink: pass down netns pointer to call() and call_rcu()Jozsef Kadlecsik2016-02-161-48/+49
| | | | Backport patch from Pablo Neira Ayuso <pablo@netfilter.org>
* Fix extension alignmentJozsef Kadlecsik2015-11-071-6/+8
| | | | | | | | | | | | | The data extensions in ipset lacked the proper memory alignment and thus could lead to kernel crash on several architectures. Therefore the structures have been reorganized and alignment attributes added where needed. The patch was tested on armv7h by Gerhard Wiesinger and on x86_64, sparc64 by Jozsef Kadlecsik. Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Count non-static extension memory into the set memory size for userspaceJozsef Kadlecsik2015-06-261-1/+1
| | | | | | | | Non-static (i.e. comment) extension was not counted into the memory size. A new internal counter is introduced for this. In the case of the hash types the sizes of the arrays are counted there as well so that we can avoid to scan the whole set when just the header data is requested.
* netfilter: ipset: deinline ip_set_put_extensions()Denys Vlasenko2015-06-131-0/+25
| | | | | | | | | | | | | | | | | | | | | | n x86 allyesconfig build: The function compiles to 489 bytes of machine code. It has 25 callsites. text data bss dec hex filename 82441375 22255384 20627456 125324215 7784bb7 vmlinux.before 82434909 22255384 20627456 125317749 7783275 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> CC: Eric W. Biederman <ebiederm@xmission.com> CC: David S. Miller <davem@davemloft.net> CC: Jan Engelhardt <jengelh@medozas.de> CC: Jiri Pirko <jpirko@redhat.com> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: netfilter-devel@vger.kernel.org Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* There is no need to call synchronize_rcu() after list_add_rcu()Jozsef Kadlecsik2015-06-131-1/+1
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Improve skbinfo get/init helpersJozsef Kadlecsik2015-05-051-6/+6
| | | | | | | | | | | Use struct ip_set_skbinfo in struct ip_set_ext instead of open coded fields and assign structure members in get/init helpers instead of copying members one by one. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* When a single set is destroyed, make sure it can't be grabbed by dumpJozsef Kadlecsik2015-04-261-9/+11
|
* Fix coding styles reported by the most recent checkpatch.pl.Jozsef Kadlecsik2015-04-171-63/+63
|
* Make sure the proper is_destroyed value is checked at dumpingJozsef Kadlecsik2015-03-291-2/+4
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix broken commit "Check extensions attributes before getting extensions."Jozsef Kadlecsik2015-03-291-8/+8
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Check extensions attributes before getting extensions.Sergey Popovich2015-03-201-0/+9
| | | | | | | | Make all extensions attributes checks within ip_set_get_extensions() and reduce number of duplicated code. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Use SET_WITH_*() helpers to test set extensionsSergey Popovich2015-03-201-6/+6
| | | | | Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Properly calculate extensions offsets and total lengthSergey Popovich2015-03-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offsets and total length returned by the ip_set_elem_len() calculated incorrectly as initial set element length (i.e. len parameter) is used multiple times in offset calculations, also affecting set element total length. Use initial set element length as start offset, do not add aligned extension offset to the offset. Return offset as total length of the set element. This reduces memory requirements on per element basic for the hash:* type of sets. For example output from 'ipset -terse list test-1' on 64-bit PC, where test-1 is generated via following script: #!/bin/bash set_name='test-1' ipset create "$set_name" hash:net family inet \ timeout 10800 counters comment \ hashsize 65536 maxelem 65536 declare -i o3 o4 fmt="add $set_name 192.168.%u.%u\n" for ((o3 = 0; o3 < 256; o3++)); do for ((o4 = 0; o4 < 256; o4++)); do printf "$fmt" $o3 $o4 done done |ipset -exist restore BEFORE this patch is applied # ipset -terse list test-1 Name: test-1 Type: hash:net Revision: 6 Header: family inet hashsize 65536 maxelem 65536 timeout 10800 counters comment Size in memory: 26348440 and AFTER applying patch # ipset -terse list test-1 Name: test-1 Type: hash:net Revision: 6 Header: family inet hashsize 65536 maxelem 65536 timeout 10800 counters comment Size in memory: 7706392 References: 0 Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Make sure listing doesn't grab a set which is just being destroyed.Jozsef Kadlecsik2015-01-081-6/+21
| | | | | There was a small window when all sets are destroyed and a concurrent listing of all sets could grab a set which is just being destroyed.
* More compatibility checking and simplificationsJozsef Kadlecsik2015-01-061-18/+13
| | | | | Try hard to keep the support of the 2.6.32 kernel tree and simplify the code with self-referential macros.
* Fix coding styles reported by checkpatch.plJozsef Kadlecsik2015-01-061-44/+32
|
* Use nlmsg_total_size instead of NLMSG_SPACE in ip_set_core.c.Jozsef Kadlecsik2015-01-061-2/+2
|
* Call synchronize_rcu() in set type (un)register functions only when neededJozsef Kadlecsik2014-12-101-5/+4
|
* Give a better name to a macro in ip_set_core.cJozsef Kadlecsik2014-12-101-9/+9
|
* netfilter: ipset: small potential read beyond the end of bufferDan Carpenter2014-11-181-0/+5
| | | | | | | | | | We could be reading 8 bytes into a 4 byte buffer here. It seems harmless but adding a check is the right thing to do and it silences a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Fix parallel resizing and listing of the same setJozsef Kadlecsik2014-11-181-13/+17
| | | | | | | | When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying the original hash table may happen from the resizing or listing functions.
* styles warned by checkpatch.pl fixedJozsef Kadlecsik2014-11-181-3/+8
|
* Introduce RCU in all set types instead of rwlock per setJozsef Kadlecsik2014-11-181-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple || iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 || iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>