ipset - ipset tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	ipset 7.24 releasedv7.24	Jozsef Kadlecsik	2025-05-17	1	-0/+6
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: fix region locking in hash types	Jozsef Kadlecsik	2025-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Region locking introduced in v5.6-rc4 contained three macros to handle the region locks: ahash_bucket_start(), ahash_bucket_end() which gave back the start and end hash bucket values belonging to a given region lock and ahash_region() which should give back the region lock belonging to a given hash bucket. The latter was incorrect which can lead to a race condition between the garbage collector and adding new elements when a hash type of set is defined with timeouts. Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports") Reported-by: Kota Toda <kota.toda@gmo-cybersecurity.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Handle "netfilter: ipset: Fix for recursive locking warning" patch for ↵	Jozsef Kadlecsik	2024-12-19	2	-0/+5
\| \| \| \| \| \|	backward compatibility Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Fix for recursive locking warning	Phil Sutter	2024-12-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding it to a set of type list:set and populating it from iptables SET target triggers a kernel warning: \| WARNING: possible recursive locking detected \| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted \| -------------------------------------------- \| ping/4018 is trying to acquire lock: \| ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] \| \| but task is already holding lock: \| ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set] This is a false alarm: ipset does not allow nested list:set type, so the loop in list_set_kadd() can never encounter the outer set itself. No other set type supports embedded sets, so this is the only case to consider. To avoid the false report, create a distinct lock class for list:set type ipset locks. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.23 releasedv7.23	Jozsef Kadlecsik	2024-12-16	1	-0/+8
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Hold module reference while requesting a module	Phil Sutter	2024-12-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	User space may unload ip_set.ko while it is itself requesting a set type backend module, leading to a kernel crash. The race condition may be provoked by inserting an mdelay() right after the nfnl_unlock() call. Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: add missing range check in bitmap_ip_uadt	Jeongjun Park	2024-12-15	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists, the values of ip and ip_to are slightly swapped. Therefore, the range check for ip should be done later, but this part is missing and it seems that the vulnerability occurs. So we should add missing range checks and remove unnecessary range checks. Cc: <stable@vger.kernel.org> Reported-by: syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com Fixes: 72205fc68bd1 ("netfilter: ipset: bitmap:ip set type support") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Fix suspicious rcu_dereference_protected()	Jozsef Kadlecsik	2024-12-15	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When destroying all sets, we are either in pernet exit phase or are executing a "destroy all sets command" from userspace. The latter was taken into account in ip_set_dereference() (nfnetlink mutex is held), but the former was not. The patch adds the required check to rcu_dereference_protected() in ip_set_dereference(). Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type") Reported-by: syzbot+b62c37cdd58103293a5a@syzkaller.appspotmail.com Reported-by: syzbot+cfbe1da5fdfc39efc293@syzkaller.appspotmail.com Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Replace BUG_ON() with WARN_ON_ONCE() according to usage policy.	Jozsef Kadlecsik	2024-06-06	1	-1/+1
\|
*	ipset 7.22 releasedv7.22	Jozsef Kadlecsik	2024-06-05	1	-0/+9
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type	Jozsef Kadlecsik	2024-06-04	2	-73/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lion Ackermann reported that there is a race condition between namespace cleanup in ipset and the garbage collection of the list:set type. The namespace cleanup can destroy the list:set type of sets while the gc of the set type is waiting to run in rcu cleanup. The latter uses data from the destroyed set which thus leads use after free. The patch contains the following parts: - When destroying all sets, first remove the garbage collectors, then wait if needed and then destroy the sets. - Fix the badly ordered "wait then remove gc" for the destroy a single set case. - Fix the missing rcu locking in the list:set type in the userspace test case. - Use proper RCU list handlings in the list:set type. The patch depends on 975403cda657 (netfilter: ipset: Add list flush to cancel_gc). Fixes: fdb8e12cc2cc (netfilter: ipset: fix performance regression in swap operation) Reported-by: Lion Ackermann <nnamrec@gmail.com> Tested-by: Lion Ackermann <nnamrec@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Add list flush to cancel_gc	Alexander Maltsev	2024-05-28	1	-0/+3
\| \| \| \| \| \| \| \| \|	Flushing list in cancel_gc drops references to other lists right away, without waiting for RCU to destroy list. Fixes race when referenced ipsets can't be destroyed while referring list is scheduled for destroy. Signed-off-by: Alexander Maltsev <keltar.gw@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Kill sched.h dependency on rcupdate.h	Kent Overstreet	2024-05-22	3	-0/+10
\| \| \| \| \| \| \| \|	by moving cond_resched_rcu() to rcupdate_wait.h, we can kill another big sched.h dependency. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Handle "netfilter: propagate net to nf_bridge_get_physindev" patch	Jozsef Kadlecsik	2024-05-22	1	-0/+1
\| \| \| \| \| \|	Handle backward compatibility with regard of the patch. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: propagate net to nf_bridge_get_physindev	Pavel Tikhomirov	2024-05-22	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \|	This is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Revert "netfilter: ipset: remove set destroy at ip_set module removal"	Jozsef Kadlecsik	2024-05-21	1	-3/+24
\| \| \| \| \| \| \| \| \|	In case of namespace exit the modules are not unloaded but the sets belonging to the namespace must be destroyed. This reverts commit 099916e8f2c0a9c84f79469a8db49f775d4af16e. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.21 releasedv7.21	Jozsef Kadlecsik	2024-02-12	1	-0/+8
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Suppress false sparse warnings	Jozsef Kadlecsik	2024-02-12	1	-2/+2
\| \| \| \| \| \| \| \|	Due to the code reorganization the functions in question now run by call_rcu(), not under rcu locking and pointer access. This produces false sparse warning which are suppressed by the patch. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: remove set destroy at ip_set module removal	Jozsef Kadlecsik	2024-02-05	1	-24/+3
\| \| \| \| \| \| \| \| \|	The ip_set module can only be removed when all set module type modules are already removed. A set type module can only be removed when all sets belonging to the given type are already removed. So it is not possible that there's any set defined at ip_set module removal. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Cleanup the code of destroy operation and explain the two ↵	Jozsef Kadlecsik	2024-02-05	1	-11/+33
\| \| \| \| \| \|	stages in comments Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Missing gc cancellations fixed	Jozsef Kadlecsik	2024-02-04	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") missed to add the calls to gc cancellations at the error path of create operations and at module unload. Also, because the half of the destroy operations now executed by a function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex or rcu read lock is held and therefore the checking of them results false warnings. Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com Reported-by: Brad Spengler <spender@grsecurity.net> Reported-by: Стас Ничипорович <stasn77@gmail.com> Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") Tested-by: Brad Spengler <spender@grsecurity.net> Tested-by: Стас Ничипорович <stasn77@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.20 releasedv7.20	Jozsef Kadlecsik	2024-01-31	1	-0/+12
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	treewide: Convert del_timer() to timer_shutdown()	Steven Rostedt (Google)	2024-01-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); \| - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); \| kmem_cache_free(slab, ptr); \| kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	Use timer_shutdown_sync() when available, instead of del_timer_sync()	Jozsef Kadlecsik	2024-01-29	1	-0/+5
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: fix race condition between swap/destroy and kernel side ↵	Jozsef Kadlecsik	2024-01-29	5	-19/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add/del/test v4 The patch "netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test", commit 28628fa9 fixes a race condition. But the synchronize_rcu() added to the swap function unnecessarily slows it down: it can safely be moved to destroy and use call_rcu() instead. Eric Dumazet pointed out that simply calling the destroy functions as rcu callback does not work: sets with timeout use garbage collectors which need cancelling at destroy which can wait. Therefore the destroy functions are split into two: cancelling garbage collectors safely at executing the command received by netlink and moving the remaining part only into the rcu callback. Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/ Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test") Reported-by: Ale Crismani <ale.crismani@automattic.com> Reported-by: David Wang <00107082@163.com> Tested-by: David Wang <00107082@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: fix race condition between swap/destroy and kernel side ↵	Jozsef Kadlecsik	2023-12-11	1	-24/+6
\| \| \| \| \| \| \| \| \| \|	add/del/test v3 Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair. So there's no need to extend the rcu read locked area in ipset itself. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: fix race condition between swap/destroy and kernel side ↵	Jozsef Kadlecsik	2023-11-04	1	-3/+3
\| \| \| \| \| \| \| \| \|	add/del/test v2 synchronize_rcu() is moved into ip_set_swap() in order not to burden ip_set_destroy() unnecessarily when all sets are destroyed Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: fix race condition between swap/destroy and kernel side ↵	Jozsef Kadlecsik	2023-10-19	1	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add/del/test Linkui Xiao reported that there's a race condition when ipset swap and destroy is called, which can lead to crash in add/del/test element operations. Swap then destroy are usual operations to replace a set with another one in a production system. The issue can in some cases be reproduced with the script: ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip1 172.20.0.0/16 ipset add hash_ip1 192.168.0.0/16 iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT while [ 1 ] do # ... Ongoing traffic... ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip2 172.20.0.0/16 ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 sleep 0.05 done In the race case the possible order of the operations are CPU0 CPU1 ip_set_test ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 hash_net_kadt Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy removed it, hash_net_kadt crashes. The fix is to protect both the list of the sets and the set pointers in an extended RCU region and before calling destroy, wait to finish all started rcu_read_lock(). The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/ Reported by: Linkui Xiao <xiaolinkui@kylinos.cn> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.18 releasedv7.18	Jozsef Kadlecsik	2023-09-19	1	-0/+15
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP	Jozsef Kadlecsik	2023-09-18	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it. The race is caused by using the wrong reference counter, i.e. the ref counter instead of ref_netlink. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Tested-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ↵	Kyle Zeng	2023-09-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip_set_hash_netportnet.c The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can lead to the use of wrong `CIDR_POS(c)` for calculating array offsets, which can lead to integer underflow. As a result, it leads to slab out-of-bound access. This patch adds back the IP_SET_HASH_WITH_NET0 macro to ip_set_hash_netportnet to address the issue. Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net") Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	compatibility: handle strscpy_pad()	Jozsef Kadlecsik	2023-09-18	1	-0/+16
\|
*	netfilter: ipset: refactor deprecated strncpy	Justin Stitt	2023-09-18	1	-6/+6
\| \| \| \| \| \| \| \| \|	Use `strscpy_pad` instead of `strncpy`. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	netfilter: ipset: remove rcu_read_lock_bh pair from ip_set_test	Florian Westphal	2023-09-18	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Callers already hold rcu_read_lock. Prior to RCU conversion this used to be a read_lock_bh(), but now the bh-disable isn't needed anymore. Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Replace strlcpy with strscpy	Azeem Shaikh	2023-09-18	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since return value from all callers of STRLCPY macro were ignored. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230613003437.3538694-1-azeemshaikh38@gmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Add schedule point in call_ad().	Kuniyuki Iwashima	2023-09-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	syzkaller found a repro that causes Hung Task [0] with ipset. The repro first creates an ipset and then tries to delete a large number of IPs from the ipset concurrently: IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187 IPSET_ATTR_CIDR : 2 The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET) held, and other threads wait for it to be released. Previously, the same issue existed in set->variant->uadt() that could run so long under ip_set_lock(set). Commit 5e29dc36bd5e ("netfilter: ipset: Rework long task execution when adding/deleting entries") tried to fix it, but the issue still exists in the caller with another mutex. While adding/deleting many IPs, we should release the CPU periodically to prevent someone from abusing ipset to hang the system. Note we need to increment the ipset's refcnt to prevent the ipset from being destroyed while rescheduling. [0]: INFO: task syz-executor174:268 blocked for more than 143 seconds. Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d Call trace: __switch_to+0x308/0x714 arch/arm64/kernel/process.c:556 context_switch kernel/sched/core.c:5343 [inline] __schedule+0xd84/0x1648 kernel/sched/core.c:6669 schedule+0xf0/0x214 kernel/sched/core.c:6745 schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804 __mutex_lock_common kernel/locking/mutex.c:679 [inline] __mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747 __mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035 mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286 nfnl_lock net/netfilter/nfnetlink.c:98 [inline] nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295 netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546 nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365 netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x4b8/0x810 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193 el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	net: Kconfig: fix spellos	Randy Dunlap	2023-09-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix spelling in net/ Kconfig files. (reported by codespell) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: coreteam@netfilter.org Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
*	netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.	Gavrilov Ilia	2023-01-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of an arithmetic expression 2 << (netmask - mask_bits - 1) is subject to overflow due to a failure casting operands to a larger data type before performing the arithmetic. Note that it's harmless since the value will be checked at the next step. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9fed748185a ("netfilter: ipset: Check and reject crazy /0 input parameters") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.17 releasedv7.17	Jozsef Kadlecsik	2022-12-30	1	-0/+4
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: Rework long task execution when adding/deleting entries	Jozsef Kadlecsik	2022-12-30	11	-81/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adding/deleting large number of elements in one step in ipset, it can take a reasonable amount of time and can result in soft lockup errors. The patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") tried to fix it by limiting the max elements to process at all. However it was not enough, it is still possible that we get hung tasks. Lowering the limit is not reasonable, so the approach in this patch is as follows: rely on the method used at resizing sets and save the state when we reach a smaller internal batch limit, unlock/lock and proceed from the saved state. Thus we can avoid long continuous tasks and at the same time removed the limit to add/delete large number of elements in one step. The nfnl mutex is held during the whole operation which prevents one to issue other ipset commands in parallel. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
*	netfilter: ipset: fix hash:net,port,net hang with /0 subnet	Jozsef Kadlecsik	2022-12-30	1	-19/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	The hash:net,port,net set type supports /0 subnets. However, the patch commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range of consecutive elements to add/delete" did not take into account it and resulted in an endless loop. The bug is actually older but the patch 5f7b51bf09baca8e brings it out earlier. Handle /0 subnets properly in hash:net,port,net set types. Reported-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	ipset 7.16 releasedv7.16	Jozsef Kadlecsik	2022-11-21	1	-0/+21
\| \| \| \|	Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface	Jozsef Kadlecsik	2022-11-21	1	-1/+1
\| \| \| \| \| \| \|	The patch "netfilter: ipset: enforce documented limit to prevent allocating huge memory" was too strict and prevented to add up to 64 clashing elements to a hash:net,iface type of set. This patch fixes the issue and now the type behaves as documented.
*	netfilter: ipset: Add support for new bitmask parameter	Vishwanath Pai	2022-11-20	6	-26/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new parameter to complement the existing 'netmask' option. The main difference between netmask and bitmask is that bitmask takes any arbitrary ip address as input, it does not have to be a valid netmask. The name of the new parameter is 'bitmask'. This lets us mask out arbitrary bits in the ip address, for example: ipset create set1 hash:ip bitmask 255.128.255.0 ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80 Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: ipset: regression in ip_set_hash_ip.c	Vishwanath Pai	2022-11-07	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduced a regression: commit 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") The variable e.ip is passed to adtfn() function which finally adds the ip address to the set. The patch above refactored the for loop and moved e.ip = htonl(ip) to the end of the for loop. What this means is that if the value of "ip" changes between the first assignement of e.ip and the forloop, then e.ip is pointing to a different ip address than "ip". Test case: $ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000 $ ipset add jdtest_tmp 10.0.1.1/31 ipset v6.21.1: Element cannot be added to the set: it's already added The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])" block but e.ip is still pointing to the old value. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
*	netfilter: move from strlcpy with unused retval to strscpy	Wolfram Sang	2022-11-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Florian Westphal <fw@strlen.de>
*	compatibility: handle unsafe_memcpy()	Jozsef Kadlecsik	2022-11-07	1	-0/+6
\|
*	netlink: Bounds-check struct nlmsgerr creation	Kees Cook	2022-11-07	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(), switch from __nlmsg_put to nlmsg_put(), and explain the bounds check for dealing with the memcpy() across a composite flexible array struct. Avoids this future run-time warning: memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16) Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	compatibility: move to skb_protocol in the code from tc_skb_protocol	Jozsef Kadlecsik	2022-11-07	2	-6/+4
\| \| \| \|	And fix a typo committed by me in em_sched.c too.
*	sched: consistently handle layer3 header accesses in the presence of VLANs	Toke Høiland-Jørgensen	2022-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>