summaryrefslogtreecommitdiffstats
path: root/kernel/net/netfilter/ipset/ip_set_hash_gen.h
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: ipset: Suppress false sparse warningsJozsef Kadlecsik2024-02-121-2/+2
| | | | | | | | Due to the code reorganization the functions in question now run by call_rcu(), not under rcu locking and pointer access. This produces false sparse warning which are suppressed by the patch. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Missing gc cancellations fixedJozsef Kadlecsik2024-02-041-2/+2
| | | | | | | | | | | | | | | | | | The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") missed to add the calls to gc cancellations at the error path of create operations and at module unload. Also, because the half of the destroy operations now executed by a function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex or rcu read lock is held and therefore the checking of them results false warnings. Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com Reported-by: Brad Spengler <spender@grsecurity.net> Reported-by: Стас Ничипорович <stasn77@gmail.com> Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") Tested-by: Brad Spengler <spender@grsecurity.net> Tested-by: Стас Ничипорович <stasn77@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: fix race condition between swap/destroy and kernel side ↵Jozsef Kadlecsik2024-01-291-3/+12
| | | | | | | | | | | | | | | | | | | | | | | add/del/test v4 The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback. Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/ Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test") Reported-by: Ale Crismani <ale.crismani@automattic.com> Reported-by: David Wang <00107082@163.com> Tested-by: David Wang <00107082@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: restore allowing 64 clashing elements in hash:net,ifaceJozsef Kadlecsik2022-11-211-1/+1
| | | | | | | The patch "netfilter: ipset: enforce documented limit to prevent allocating huge memory" was too strict and prevented to add up to 64 clashing elements to a hash:net,iface type of set. This patch fixes the issue and now the type behaves as documented.
* netfilter: ipset: Add support for new bitmask parameterVishwanath Pai2022-11-201-9/+62
| | | | | | | | | | | | | | | Add a new parameter to complement the existing 'netmask' option. The main difference between netmask and bitmask is that bitmask takes any arbitrary ip address as input, it does not have to be a valid netmask. The name of the new parameter is 'bitmask'. This lets us mask out arbitrary bits in the ip address, for example: ipset create set1 hash:ip bitmask 255.128.255.0 ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80 Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: enforce documented limit to prevent allocating huge memoryJozsef Kadlecsik2022-11-071-24/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Xu reported that the hash:net,iface type of the ipset subsystem does not limit adding the same network with different interfaces to a set, which can lead to huge memory usage or allocation failure. The quick reproducer is $ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0 $ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done The backtrace when vmalloc fails: [Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages <...> [Tue Oct 25 00:13:08 2022] Call Trace: [Tue Oct 25 00:13:08 2022] <TASK> [Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60 [Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180 [Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20 [Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90 [Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0 [Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710 <...> The fix is to enforce the limit documented in the ipset(8) manpage: > The internal restriction of the hash:net,iface set type is that the same > network prefix cannot be stored with more than 64 different interfaces > in a single set. Reported-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix oversized kvmalloc() callsJozsef Kadlecsik2022-10-171-2/+2
| | | | | | | | | | | | | | | | commit 7661809d493b426e979f39ab512e3adf41fbcc69 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 14 09:45:49 2021 -0700 mm: don't allow oversized kvmalloc() calls limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the same limit in ipset. Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: fix shift-out-of-bounds in htable_bits()Vasily Averin2020-12-201-19/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 8498 Comm: syz-executor519 Not tainted 5.10.0-rc7-next-20201208-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline] hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524 ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115 nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2345 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch replaces htable_bits() by simple fls(hashsize - 1) call: it alone returns valid nbits both for round and non-round hashsizes. It is normal to set any nbits here because it is validated inside following htable_size() call which returns 0 for nbits>31. Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation") Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: fixes possible oops in mtype_resizeVasily Averin2020-12-201-9/+17
| | | | | | | | | | | | | | | | | | currently mtype_resize() can cause oops t = ip_set_alloc(htable_size(htable_bits)); if (!t) { ret = -ENOMEM; goto out; } t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits)); Increased htable_bits can force htable_size() to return 0. In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR, so follwoing access to t->hregion should trigger an OOPS. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Expose the initval hash parameter to userspaceJozsef Kadlecsik2020-09-211-4/+9
| | | | | | It makes possible to reproduce exactly the same set after a save/restore. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Add bucketsize parameter to all hash typesJozsef Kadlecsik2020-09-211-15/+23
| | | | | | | | | The parameter defines the upper limit in any hash bucket at adding new entries from userspace - if the limit would be exceeded, ipset doubles the hash size and rehashes. It means the set may consume more memory but gives faster evaluation at matching in the set. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: Replace zero-length array with flexible-array memberGustavo A. R. Silva2020-09-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) in net/bridge/netfilter/ebtables.c This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: call ip_set_free() instead of kfree()Eric Dumazet2020-09-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever ip_set_alloc() is used, allocated memory can either use kmalloc() or vmalloc(). We should call kvfree() or ip_set_free() invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 21935 Comm: syz-executor.3 Not tainted 5.8.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__phys_addr+0xa7/0x110 arch/x86/mm/physaddr.c:28 Code: 1d 7a 09 4c 89 e3 31 ff 48 d3 eb 48 89 de e8 d0 58 3f 00 48 85 db 75 0d e8 26 5c 3f 00 4c 89 e0 5b 5d 41 5c c3 e8 19 5c 3f 00 <0f> 0b e8 12 5c 3f 00 48 c7 c0 10 10 a8 89 48 ba 00 00 00 00 00 fc RSP: 0000:ffffc900018572c0 EFLAGS: 00010046 RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc9000fac3000 RDX: 0000000000040000 RSI: ffffffff8133f437 RDI: 0000000000000007 RBP: ffffc90098aff000 R08: 0000000000000000 R09: ffff8880ae636cdb R10: 0000000000000000 R11: 0000000000000000 R12: 0000408018aff000 R13: 0000000000080000 R14: 000000000000001d R15: ffffc900018573d8 FS: 00007fc540c66700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc9dcd67200 CR3: 0000000059411000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: virt_to_head_page include/linux/mm.h:841 [inline] virt_to_cache mm/slab.h:474 [inline] kfree+0x77/0x2c0 mm/slab.c:3749 hash_net_create+0xbb2/0xd70 net/netfilter/ipset/ip_set_hash_gen.h:1536 ip_set_create+0x6a2/0x13c0 net/netfilter/ipset/ip_set_core.c:1128 nfnetlink_rcv_msg+0xbe8/0xea0 net/netfilter/nfnetlink.c:230 netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:564 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2352 ___sys_sendmsg+0xf3/0x170 net/socket.c:2406 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45cb19 Code: Bad RIP value. RSP: 002b:00007fc540c65c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004fed80 RCX: 000000000045cb19 RDX: 0000000000000000 RSI: 0000000020001080 RDI: 0000000000000003 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000095e R14: 00000000004cc295 R15: 00007fc540c666d4 Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports") Fixes: 03c8b234e61a ("netfilter: ipset: Generalize extensions support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix forceadd evaluation pathJozsef Kadlecsik2020-02-221-0/+2
| | | | | | | | | | | When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries. Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Correct the reported memory sizeJozsef Kadlecsik2020-02-211-1/+1
| | | | | | | | | | | The patch netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports did not include the size of the comment extensions from the memory size for set listing. Add it, so the proper size is printed. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik2020-02-181-199/+438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Fix compatibility support for netlink extended ACK and add ↵Jozsef Kadlecsik2019-11-011-1/+1
| | | | synchronize_rcu_bh() checking
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2019-10-311-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: remove unnecessary spacesyangxingwu2019-10-311-1/+1
| | | | | | | | This patch removes extra spaces. Signed-off-by: yangxingwu <xingwu.yang@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: remove inline from static functions in .c files.Jeremy Sowden2019-10-071-2/+2
| | | | | | | | | | | | | The inline function-specifier should not be used for static functions defined in .c files since it bloats the kernel. Instead leave the compiler to decide which functions to inline. While a couple of the files affected (ip_set_*_gen.h) are technically headers, they contain templates for generating the common parts of particular set-types and so we treat them like .c files. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: inlined four headers files into another one.Jeremy Sowden2019-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | linux/netfilter/ipset/ip_set.h included four other header files: include/linux/netfilter/ipset/ip_set_comment.h include/linux/netfilter/ipset/ip_set_counter.h include/linux/netfilter/ipset/ip_set_skbinfo.h include/linux/netfilter/ipset/ip_set_timeout.h Of these the first three were not included anywhere else. The last, ip_set_timeout.h, was included in a couple of other places, but defined inline functions which call other inline functions defined in ip_set.h, so ip_set.h had to be included before it. Inlined all four into ip_set.h, and updated the other files that included ip_set_timeout.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: update my email addressJozsef Kadlecsik2019-06-051-1/+1
| | | | | | | | | It's better to use my kadlec@netfilter.org email address in the source code. I might not be able to use kadlec@blackhole.kfki.hu in the future. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: Fix memory accounting for hash types on resizeStefano Brivio2019-06-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a fresh array block is allocated during resize, the current in-memory set size should be increased by the size of the block, not replaced by it. Before the fix, adding entries to a hash set type, leading to a table resize, caused an inconsistent memory size to be reported. This becomes more obvious when swapping sets with similar sizes: # cat hash_ip_size.sh #!/bin/sh FAIL_RETRIES=10 tries=0 while [ ${tries} -lt ${FAIL_RETRIES} ]; do ipset create t1 hash:ip for i in `seq 1 4345`; do ipset add t1 1.2.$((i / 255)).$((i % 255)) done t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset create t2 hash:ip for i in `seq 1 4360`; do ipset add t2 1.2.$((i / 255)).$((i % 255)) done t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset swap t1 t2 t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset destroy t1 ipset destroy t2 tries=$((tries + 1)) if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then echo "FAIL after ${tries} tries:" echo "T1 size ${t1_init}, after swap ${t1_swap}" echo "T2 size ${t2_init}, after swap ${t2_swap}" exit 1 fi done echo "PASS" # echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control # ./hash_ip_size.sh [ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa [ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163) [ 2035.080353] Table destroy by resize 00000000fe6551fa FAIL after 4 tries: T1 size 9064, after swap 71128 T2 size 71128, after swap 9064 Reported-by: NOYB <JunkYardMail1@Frontier.com> Fixes: 9e41f26a505c ("netfilter: ipset: Count non-static extension memory for userspace") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: Replace spin_is_locked() with lockdepLance Roy2018-10-221-1/+1
| | | | | | | | | | | | | | | | | lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: <netfilter-devel@vger.kernel.org> Cc: <coreteam@netfilter.org> Cc: <netdev@vger.kernel.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: ipset: forbid family for hash:mac setsFlorent Fourcot2018-06-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | Userspace `ipset` command forbids family option for hash:mac type: ipset create test hash:mac family inet4 ipset v6.30: Unknown argument: `family' However, this check is not done in kernel itself. When someone use external netlink applications (pyroute2 python library for example), one can create hash:mac with invalid family and inconsistant results from userspace (`ipset` command cannot read set content anymore). This patch enforce the logic in kernel, and forbids insertion of hash:mac with a family set. Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no impact on other hash:* sets Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: add resched points during set listingFlorian Westphal2018-01-041-0/+1
| | | | | | | | | | | | | | When sets are extremely large we can get softlockup during ipset -L. We could fix this by adding cond_resched_rcu() at the right location during iteration, but this only works if RCU nesting depth is 1. At this time entire variant->list() is called under under rcu_read_lock_bh. This used to be a read_lock_bh() but as rcu doesn't really lock anything, it does not appear to be needed, so remove it (ipset increments set reference count before this, so a set deletion should not be possible). Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* Fix "don't update counters" mode when counters used at the matchingJozsef Kadlecsik2018-01-041-23/+14
| | | | The matching of the counters was not taken into account, fixed.
* Backport patch: netfilter: ipset: Convert timers to use timer_setup()Jozsef Kadlecsik2018-01-031-5/+10
|
* netfilter: ipset: ipset list may return wrong member count for set with timeoutVishwanath Pai2017-09-061-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Simple testcase: $ ipset create test hash:ip timeout 5 $ ipset add test 1.2.3.4 $ ipset add test 1.2.2.2 $ sleep 5 $ ipset l Name: test Type: hash:ip Revision: 5 Header: family inet hashsize 1024 maxelem 65536 timeout 5 Size in memory: 296 References: 0 Number of entries: 2 Members: We return "Number of entries: 2" but no members are listed. That is because mtype_list runs "ip_set_timeout_expired" and does not list the expired entries, but set->elements is never upated (until mtype_gc cleans it up later). Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com>
* Fix bug: sometimes valid entries in hash:* types of sets were evictedJozsef Kadlecsik2017-02-161-1/+1
| | | | | | | | Wrong index was used and therefore when shrinking a hash bucket at deleting an entry, valid entries could be evicted as well. Thanks to Eric Ewanco for the thorough bugreport. Fixes netfilter bugzilla #1119
* Revert patch "Correct rcu_dereference_bh_nfnl() usage"Jozsef Kadlecsik2016-11-101-6/+4
| | | | | | | The susbsystem param cannot be used to rely on subsystem mutex locking because the call is used in netlink dump context as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: use setup_timer() and mod_timer().Muhammad Falak R Wani2016-05-191-5/+2
| | | | | | | | | | | | | | | Use setup_timer() and instead of init_timer(), being the preferred way of setting up a timer. Also, quoting the mod_timer() function comment: -> mod_timer() is a more efficient way to update the expire field of an active timer (if the timer is inactive it will be activated). Use setup_timer() and mod_timer() to setup and arm a timer, making the code compact and easier to read. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: fix race condition in ipset save, swap and deleteVishwanath Pai2016-03-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine */ Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to *_head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix reported memory size for hash:* typesJozsef Kadlecsik2015-11-071-7/+9
| | | | | | | | The calculation of the full allocated memory did not take into account the size of the base hash bucket structure at some places. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix hash type expire: release empty hash bucket blockJozsef Kadlecsik2015-11-071-4/+10
| | | | | | When all entries are expired/all slots are empty, release the bucket. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix hash:* type expirationJozsef Kadlecsik2015-11-071-1/+1
| | | | | | | | Incorrect index was used when the data blob was shrinked at expiration, which could lead to falsely expired entries and memory leak when the comment extension was used too. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Collapse same condition body to a single oneJozsef Kadlecsik2015-11-071-7/+1
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix extension alignmentJozsef Kadlecsik2015-11-071-4/+7
| | | | | | | | | | | | | The data extensions in ipset lacked the proper memory alignment and thus could lead to kernel crash on several architectures. Therefore the structures have been reorganized and alignment attributes added where needed. The patch was tested on armv7h by Gerhard Wiesinger and on x86_64, sparc64 by Jozsef Kadlecsik. Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Out of bound access in hash:net* types fixedJozsef Kadlecsik2015-08-251-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Jones reported that KASan detected out of bounds access in hash:net* types: [ 23.139532] ================================================================== [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58 [ 23.152937] Write of size 4 by task ipset/457 [ 23.159742] ============================================================================= [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 23.173641] ----------------------------------------------------------------------------- [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456 [ 23.201836] __slab_alloc.constprop.66+0x554/0x620 [ 23.208994] __kmalloc+0x2f2/0x360 [ 23.216105] hash_net_create+0x16a/0x470 [ 23.223238] ip_set_create+0x3e6/0x740 [ 23.230343] nfnetlink_rcv_msg+0x599/0x640 [ 23.237454] netlink_rcv_skb+0x14f/0x190 [ 23.244533] nfnetlink_rcv+0x3f6/0x790 [ 23.251579] netlink_unicast+0x272/0x390 [ 23.258573] netlink_sendmsg+0x5a1/0xa50 [ 23.265485] SYSC_sendto+0x1da/0x2c0 [ 23.272364] SyS_sendto+0xe/0x10 [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f The bug is fixed in the patch and the testsuite is extended in ipset to check cidr handling more thoroughly.
* Make struct htype per ipset familyJozsef Kadlecsik2015-06-261-31/+20
| | | | | | | | | | | Before this patch struct htype created at the first source of ip_set_hash_gen.h and it is common for both IPv4 and IPv6 set variants. Make struct htype per ipset family and use NLEN to make nets array fixed size to simplify struct htype allocation. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
* Optimize hash creation routineJozsef Kadlecsik2015-06-261-34/+29
| | | | | | | Exit as easly as possible on error and use RCU_INIT_POINTER() as set is not seen at creation time. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
* Make sure element data size is a multiple of u32Jozsef Kadlecsik2015-06-261-2/+8
| | | | | | | Data for hashing required to be array of u32. Make sure that element data always multiple of u32. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
* Make NLEN compile time constant for hash typesJozsef Kadlecsik2015-06-261-26/+21
| | | | | | | | Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h and the only place where NLEN needed to be calculated at runtime is *_create() method. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
* Simplify mtype_expire() for hash typesJozsef Kadlecsik2015-06-261-17/+17
| | | | | | | | | | Remove redundant parameters nets_length and dsize: they could be get from other parameters. Remove one leve of intendation by using continue while iterating over elements in bucket. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>.
* Count non-static extension memory into the set memory size for userspaceJozsef Kadlecsik2015-06-261-12/+14
| | | | | | | | Non-static (i.e. comment) extension was not counted into the memory size. A new internal counter is introduced for this. In the case of the hash types the sizes of the arrays are counted there as well so that we can avoid to scan the whole set when just the header data is requested.
* Add element count to all set types headerJozsef Kadlecsik2015-06-251-11/+10
| | | | | | It is better to list the set elements for all set types, thus the header information is uniform. Element counts are therefore added to the bitmap and list types.
* Add element count to hash headersEric B Munson2015-06-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | It would be useful for userspace to query the size of an ipset hash, however, this data is not exposed to userspace outside of counting the number of member entries. This patch uses the attribute IPSET_ATTR_ELEMENTS to indicate the size in the the header that is exported to userspace. This field is then printed by the userspace tool for hashes. Because it is only meaningful for hashes to report their size, the output is conditional on the set type. To do this checking the MATCH_TYPENAME macro was moved to utils.h. The bulk of this patch changes the expected test suite to account for the change in output. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Josh Hunt <johunt@akamai.com> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix error path in mtype_resize() when new hash bucket cannot be allocatedJozsef Kadlecsik2015-06-131-10/+15
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Correct rcu_dereference_bh_nfnl() usageJozsef Kadlecsik2015-05-051-4/+6
| | | | | | | | | | | | | When rcu_dereference_bh_nfnl() macro would be defined on the target system if will accept pointer and subsystem id. Check if rcu_dereference_bh_nfnl() is defined and make it accepting two arguments. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix coding styles reported by the most recent checkpatch.pl.Jozsef Kadlecsik2015-04-171-16/+16
|