| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
The patch "netfilter: ipset: enforce documented limit to prevent allocating
huge memory" was too strict and prevented to add up to 64 clashing elements
to a hash:net,iface type of set. This patch fixes the issue and now the type
behaves as documented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new parameter to complement the existing 'netmask' option. The
main difference between netmask and bitmask is that bitmask takes any
arbitrary ip address as input, it does not have to be a valid netmask.
The name of the new parameter is 'bitmask'. This lets us mask out
arbitrary bits in the ip address, for example:
ipset create set1 hash:ip bitmask 255.128.255.0
ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduced a regression: commit 48596a8ddc46 ("netfilter:
ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
The variable e.ip is passed to adtfn() function which finally adds the
ip address to the set. The patch above refactored the for loop and moved
e.ip = htonl(ip) to the end of the for loop.
What this means is that if the value of "ip" changes between the first
assignement of e.ip and the forloop, then e.ip is pointing to a
different ip address than "ip".
Test case:
$ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000
$ ipset add jdtest_tmp 10.0.1.1/31
ipset v6.21.1: Element cannot be added to the set: it's already added
The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])"
block but e.ip is still pointing to the old value.
Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:
memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: syzbot <syzkaller@googlegroups.com>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
| |
And fix a typo committed by me in em_sched.c too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
| |
em_sched.c was left out, fix it now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does
not limit adding the same network with different interfaces to a set, which
can lead to huge memory usage or allocation failure.
The quick reproducer is
$ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0
$ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done
The backtrace when vmalloc fails:
[Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages
<...>
[Tue Oct 25 00:13:08 2022] Call Trace:
[Tue Oct 25 00:13:08 2022] <TASK>
[Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60
[Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180
[Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20
[Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90
[Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710
<...>
The fix is to enforce the limit documented in the ipset(8) manpage:
> The internal restriction of the hash:net,iface set type is that the same
> network prefix cannot be stored with more than 64 different interfaces
> in a single set.
Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 7661809d493b426e979f39ab512e3adf41fbcc69
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed Jul 14 09:45:49 2021 -0700
mm: don't allow oversized kvmalloc() calls
limits the max allocatable memory via kvmalloc() to MAX_INT. Apply the
same limit in ipset.
Reported-by: syzbot+3493b1873fb3ea827986@syzkaller.appspotmail.com
Reported-by: syzbot+2b8443c35458a617c904@syzkaller.appspotmail.com
Reported-by: syzbot+ee5cb15f4a0e85e0d54e@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang warns:
net/netfilter/ipset/ip_set_hash_ipportnet.c:249:29: warning: variable
'port_to' is uninitialized when used here [-Wuninitialized]
if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE)
^~~~~~~
net/netfilter/ipset/ip_set_hash_ipportnet.c:167:45: note: initialize the
variable 'port_to' to silence this warning
u32 ip = 0, ip_to = 0, p = 0, port, port_to;
^
= 0
net/netfilter/ipset/ip_set_hash_ipportnet.c:249:39: warning: variable
'port' is uninitialized when used here [-Wuninitialized]
if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE)
^~~~
net/netfilter/ipset/ip_set_hash_ipportnet.c:167:36: note: initialize the
variable 'port' to silence this warning
u32 ip = 0, ip_to = 0, p = 0, port, port_to;
^
= 0
2 warnings generated.
The range check was added before port and port_to are initialized.
Shuffle the check after the initialization so that the check works
properly.
Fixes: 7fb6c63025ff ("netfilter: ipset: Limit the maximal range of consecutive elements to
add/delete")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
The number of hosts in a netblock must be a power of two,
so use shift instead of division.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
| |
Avoid possible number overflows when calculating the number of
consecutive elements. Also, compute properly the consecutive
elements in the case of hash:net* types.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
The range size of consecutive elements were not limited. Thus one
could define a huge range which may result soft lockup errors due
to the long execution time. Now the range size is limited to 2^20
entries. Reported by Brad Spengler.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
Backport patch "netfilter: use nfnetlink_unicast()" from
Pablo Neira Ayuso <pablo@netfilter.org>.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
Backport patch "netfilter: nfnetlink: consolidate callback type"
from Pablo Neira Ayuso <pablo@netfilter.org>.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
Backport patch "netfilter: nfnetlink: add struct nfnl_info and pass it
to callbacks" from Pablo Neira Ayuso <pablo@netfilter.org>.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
use it"
Backport patch "netfilter: add helper function to set up the nfnetlink
header and use it" from Pablo Neira Ayuso <pablo@netfilter.org>.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
| |
Return code of strscpy() was not handled properly.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds
UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 8498 Comm: syz-executor519
Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch replaces htable_bits() by simple fls(hashsize - 1) call:
it alone returns valid nbits both for round and non-round hashsizes.
It is normal to set any nbits here because it is validated inside
following htable_size() call which returns 0 for nbits>31.
Fixes: 1feab10d7e6d("netfilter: ipset: Unified hash type generation")
Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently mtype_resize() can cause oops
t = ip_set_alloc(htable_size(htable_bits));
if (!t) {
ret = -ENOMEM;
goto out;
}
t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));
Increased htable_bits can force htable_size() to return 0.
In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
so follwoing access to t->hregion should trigger an OOPS.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
| |
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
| |
Kernel compatibility support was broken in 7.9, reported by Ed W.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
An unnecessary condition prevented to compile pfxlen.c with the patch
202cfef66b3a1e0988d applied, it's fixed.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently netadmin inside non-trusted container can quickly allocate
whole node's memory via request of huge ipset hashtable.
Other ipset-related memory allocations should be restricted too.
v2: fixed typo ALLOC -> ACCOUNT
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syzbot found that we are not validating user input properly
before copying 16 bytes [1].
Using NLA_BINARY in ipaddr_policy[] for IPv6 address is not correct,
since it ensures at most 16 bytes were provided.
We should instead make sure user provided exactly 16 bytes.
In old kernels (before v4.20), fix would be to remove the NLA_BINARY,
since NLA_POLICY_EXACT_LEN() was not yet available.
[1]
BUG: KMSAN: uninit-value in hash_ip6_add+0x1cba/0x3a50
net/netfilter/ipset/ip_set_hash_gen.h:892
CPU: 1 PID: 11611 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x21c/0x280 lib/dump_stack.c:118
kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
__msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
hash_ip6_add+0x1cba/0x3a50 net/netfilter/ipset/ip_set_hash_gen.h:892
hash_ip6_uadt+0x976/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:267
call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
___sys_sendmsg net/socket.c:2407 [inline]
__sys_sendmsg+0x6d5/0x830 net/socket.c:2440
__do_sys_sendmsg net/socket.c:2449 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89
d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3
fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe2e503fc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000029ec0 RCX: 000000000045deb9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 000000000169fb7f R14: 00007fe2e50409c0 R15: 000000000118bf2c
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
__msan_chain_origin+0x57/0xa0 mm/kmsan/kmsan_instr.c:147
ip6_netmask include/linux/netfilter/ipset/pfxlen.h:49 [inline]
hash_ip6_netmask net/netfilter/ipset/ip_set_hash_ip.c:185 [inline]
hash_ip6_uadt+0xb1c/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:263
call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
___sys_sendmsg net/socket.c:2407 [inline]
__sys_sendmsg+0x6d5/0x830 net/socket.c:2440
__do_sys_sendmsg net/socket.c:2449 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Uninit was stored to memory at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:289
kmsan_memcpy_memmove_metadata+0x25e/0x2d0 mm/kmsan/kmsan.c:226
kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:246
__msan_memcpy+0x46/0x60 mm/kmsan/kmsan_instr.c:110
ip_set_get_ipaddr6+0x2cb/0x370 net/netfilter/ipset/ip_set_core.c:310
hash_ip6_uadt+0x439/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255
call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
___sys_sendmsg net/socket.c:2407 [inline]
__sys_sendmsg+0x6d5/0x830 net/socket.c:2440
__do_sys_sendmsg net/socket.c:2449 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
slab_alloc_node mm/slub.c:2906 [inline]
__kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512
__kmalloc_reserve net/core/skbuff.c:142 [inline]
__alloc_skb+0x309/0xae0 net/core/skbuff.c:210
alloc_skb include/linux/skbuff.h:1094 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
____sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
___sys_sendmsg net/socket.c:2407 [inline]
__sys_sendmsg+0x6d5/0x830 net/socket.c:2440
__do_sys_sendmsg net/socket.c:2449 [inline]
__se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
__x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
| |
A forth argument of list_for_each_entry_rcu() was introduced, handle the
compatibility issue.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
| |
It makes possible to reproduce exactly the same set after a save/restore.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
The parameter defines the upper limit in any hash bucket at adding new entries
from userspace - if the limit would be exceeded, ipset doubles the hash size
and rehashes. It means the set may consume more memory but gives faster
evaluation at matching in the set.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
| |
The -exist flag was supported with the create, add and delete commands.
In order to gracefully handle the destroy command with nonexistent sets,
the -exist flag is added to destroy too.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whenever ip_set_alloc() is used, allocated memory can either
use kmalloc() or vmalloc(). We should call kvfree() or
ip_set_free()
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 21935 Comm: syz-executor.3 Not tainted 5.8.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__phys_addr+0xa7/0x110 arch/x86/mm/physaddr.c:28
Code: 1d 7a 09 4c 89 e3 31 ff 48 d3 eb 48 89 de e8 d0 58 3f 00 48 85 db 75 0d e8 26 5c 3f 00 4c 89 e0 5b 5d 41 5c c3 e8 19 5c 3f 00 <0f> 0b e8 12 5c 3f 00 48 c7 c0 10 10 a8 89 48 ba 00 00 00 00 00 fc
RSP: 0000:ffffc900018572c0 EFLAGS: 00010046
RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffc9000fac3000
RDX: 0000000000040000 RSI: ffffffff8133f437 RDI: 0000000000000007
RBP: ffffc90098aff000 R08: 0000000000000000 R09: ffff8880ae636cdb
R10: 0000000000000000 R11: 0000000000000000 R12: 0000408018aff000
R13: 0000000000080000 R14: 000000000000001d R15: ffffc900018573d8
FS: 00007fc540c66700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc9dcd67200 CR3: 0000000059411000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
virt_to_head_page include/linux/mm.h:841 [inline]
virt_to_cache mm/slab.h:474 [inline]
kfree+0x77/0x2c0 mm/slab.c:3749
hash_net_create+0xbb2/0xd70 net/netfilter/ipset/ip_set_hash_gen.h:1536
ip_set_create+0x6a2/0x13c0 net/netfilter/ipset/ip_set_core.c:1128
nfnetlink_rcv_msg+0xbe8/0xea0 net/netfilter/nfnetlink.c:230
netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:564
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2352
___sys_sendmsg+0xf3/0x170 net/socket.c:2406
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb19
Code: Bad RIP value.
RSP: 002b:00007fc540c65c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004fed80 RCX: 000000000045cb19
RDX: 0000000000000000 RSI: 0000000020001080 RDI: 0000000000000003
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 000000000000095e R14: 00000000004cc295 R15: 00007fc540c666d4
Fixes: f66ee0410b1c ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
Fixes: 03c8b234e61a ("netfilter: ipset: Generalize extensions support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using ip_set with counters and comment, traffic causes the kernel
to panic on 32-bit ARM:
Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
Unhandled fault: alignment exception (0x221) at 0xea08133c
PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
The problem occurs when we try to update the 64-bit counters - the
faulting address above is not 64-bit aligned. The problem occurs
due to the way elements are allocated, for example:
set->dsize = ip_set_elem_len(set, tb, 0, 0);
map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
If the element has a requirement for a member to be 64-bit aligned,
and set->dsize is not a multiple of 8, but is a multiple of four,
then every odd numbered elements will be misaligned - and hitting
an atomic64_add() on that element will cause the kernel to panic.
ip_set_elem_len() must return a size that is rounded to the maximum
alignment of any extension field stored in the element. This change
ensures that is the case.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not
update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE
must be set, not unset.
Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ip_set_match_extensions(), for sets with counters, we take care of
updating counters themselves by calling ip_set_update_counter(), and of
checking if the given comparison and values match, by calling
ip_set_match_counter() if needed.
However, if a given comparison on counters doesn't match the configured
values, that doesn't mean the set entry itself isn't matching.
This fix restores the behaviour we had before commit 4750005a85f7
("netfilter: ipset: Fix "don't update counters" mode when counters used
at the matching"), without reintroducing the issue fixed there: back
then, mtype_data_match() first updated counters in any case, and then
took care of matching on counters.
Now, if the IPSET_FLAG_SKIP_COUNTER_UPDATE flag is set,
ip_set_update_counter() will anyway skip counter updates if desired.
The issue observed is illustrated by this reproducer:
ipset create c hash:ip counters
ipset add c 192.0.2.1
iptables -I INPUT -m set --match-set c src --bytes-gt 800 -j DROP
if we now send packets from 192.0.2.1, bytes and packets counters
for the entry as shown by 'ipset list' are always zero, and, no
matter how many bytes we send, the rule will never match, because
counters themselves are not updated.
Reported-by: Mithil Mhatre <mmhatre@redhat.com>
Fixes: 4750005a85f7 ("netfilter: ipset: Fix "don't update counters" mode when counters used at the matching")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ip_set_type_list is traversed using list_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of ip_set_type_mutex.
Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
| |
This reverts commit c2dfb6432aea5bb0a3522901b0c44f42d8adbd49.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These kernels does not have in their @struct netlink_dump_control method
that is used to prepare for netlink dump ->start(). This affects all
kernels that does not contain commit fc9e50f5a5a4 ("netlink: add a start
callback for starting a netlink dump").
Introduce fake value of HAVE_NETLINK_DUMP_START_ARGS equal to 7 that
never spot in the wild and set HAVE_NETLINK_DUMP_START_ARGS to 4 only
after explicit test if ->start() is available.
Fixes: 7725bf5ba041 ("netfilter: ipset: fix suspicious RCU usage in find_set_and_id")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stefano Brivio reported that the patch 'netfilter: ipset:
Fix "don't update counters" mode when counters used at the matching'
changed the semantic of when the counters are updated. Before the patch
the counters were updated regardless of the results of the counter
matches, after the patch the counters were updated only if the counter
match conditions (if specified) matched the packet. In order to handle
both ways, the --update-counters-first flag is introduced: when the flag
is specified, the counters are updated before checking the counter match
conditions. Without the flag the current evaluation path (i.e. update
only if counter conditions match) works.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.
Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The patch
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
did not include the size of the comment extensions from the memory size
for set listing. Add it, so the proper size is printed.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
However, in the error path there can be a follow-up recvmsg() without
the mutex held. Use the start() function of struct netlink_dump_control
instead of dump() to verify and report if the specified set does not
exist.
Thanks to Pablo Neira Ayuso for helping me to understand the subleties
of the netlink protocol.
Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bitmap allocation did not use full unsigned long sizes
when calculating the required size and that was triggered by KASAN
as slab-out-of-bounds read in several places. The patch fixes all
of them.
Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com
Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com
Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com
Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com
Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com
Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com
Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
map->members is freed by ip_set_free() right before using it in
mtype_ext_cleanup() again. So we just have to move it down.
Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com
Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The set uadt functions assume lineno is never NULL, but it is in
case of ip_set_utest().
syzkaller managed to generate a netlink message that calls this with
LINENO attr present:
general protection fault: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104
Call Trace:
ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867
nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563
pass a dummy lineno storage, its easier than patching all set
implementations.
This seems to be a day-0 bug.
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com
Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is better to restrict ipsets to default network namespace on old
kernels that does not contain @net parameter in @struct xt_mtchk_param
(i.e. ones prior to commit a83d8e8d099f ("netfilter: xtables:
add struct xt_mtchk_param::net"), tag v2.6.34) instead of panicing
on them.
Found and tested on RHEL 6 with 2.6.32 kernels.
Fixes: 90e279db0cf5 ("Add more compatibility checkings to support older kernel releases")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While mangle table is primary place for packet modification setting
mark, traffic class priority or hardware NIC queue can be done in any
table with exception similar to using mark in policy-based routing
setups (configured with ip-rule(8)) should be done before routing
happens (i.e. in PREROUTING chain that usable in mangle or raw tables
only).
There is no such restriction in MARK target used to set packet mark and
CLASSIFY target used to set traffic class priority. Both are free to use
in any table. There is no known target that can modify hardware queue
for packet.
This helps in keeping filtering and packet modification rules together
in filter table.
Tested with rule in filter table with SET target using --map-prio and
HTB for scheduling packets at egress.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|