summaryrefslogtreecommitdiffstats
path: root/kernel/net
Commit message (Collapse)AuthorAgeFilesLines
* ip_set: Fix compatibility with kernels between v3.3 and v4.5HEADmasterSerhey Popovych2020-03-091-0/+2
| | | | | | | | | | | | | | | These kernels does not have in their @struct netlink_dump_control method that is used to prepare for netlink dump ->start(). This affects all kernels that does not contain commit fc9e50f5a5a4 ("netlink: add a start callback for starting a netlink dump"). Introduce fake value of HAVE_NETLINK_DUMP_START_ARGS equal to 7 that never spot in the wild and set HAVE_NETLINK_DUMP_START_ARGS to 4 only after explicit test if ->start() is available. Fixes: 7725bf5ba041 ("netfilter: ipset: fix suspicious RCU usage in find_set_and_id") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Introduce --update-counters-first flag for the set targetJozsef Kadlecsik2020-03-092-4/+30
| | | | | | | | | | | | | | | Stefano Brivio reported that the patch 'netfilter: ipset: Fix "don't update counters" mode when counters used at the matching' changed the semantic of when the counters are updated. Before the patch the counters were updated regardless of the results of the counter matches, after the patch the counters were updated only if the counter match conditions (if specified) matched the packet. In order to handle both ways, the --update-counters-first flag is introduced: when the flag is specified, the counters are updated before checking the counter match conditions. Without the flag the current evaluation path (i.e. update only if counter conditions match) works. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix forceadd evaluation pathJozsef Kadlecsik2020-02-221-0/+2
| | | | | | | | | | | When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries. Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Correct the reported memory sizeJozsef Kadlecsik2020-02-211-1/+1
| | | | | | | | | | | The patch netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports did not include the size of the comment extensions from the memory size for set listing. Add it, so the proper size is printed. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik2020-02-182-209/+462
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: fix suspicious RCU usage in find_set_and_idJozsef Kadlecsik2020-01-261-17/+32
| | | | | | | | | | | | | | find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held. However, in the error path there can be a follow-up recvmsg() without the mutex held. Use the start() function of struct netlink_dump_control instead of dump() to verify and report if the specified set does not exist. Thanks to Pablo Neira Ayuso for helping me to understand the subleties of the netlink protocol. Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: use bitmap infrastructure completelyJozsef Kadlecsik2020-01-194-10/+10
| | | | | | | | | | | | | | | | The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: fix a use-after-free in mtype_destroy()Cong Wang2020-01-151-1/+1
| | | | | | | | | | | | map->members is freed by ip_set_free() right before using it in mtype_ext_cleanup() again. So we just have to move it down. Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is presentFlorian Westphal2020-01-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The set uadt functions assume lineno is never NULL, but it is in case of ip_set_utest(). syzkaller managed to generate a netlink message that calls this with LINENO attr present: general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104 Call Trace: ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867 nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563 pass a dummy lineno storage, its easier than patching all set implementations. This seems to be a day-0 bug. Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com Fixes: a7b4f989a6294 ("netfilter: ipset: IP set core support") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ip_set: Pass init_net when @net is missing in match check params data structureSerhey Popovych2019-12-091-1/+1
| | | | | | | | | | | | | | It is better to restrict ipsets to default network namespace on old kernels that does not contain @net parameter in @struct xt_mtchk_param (i.e. ones prior to commit a83d8e8d099f ("netfilter: xtables: add struct xt_mtchk_param::net"), tag v2.6.34) instead of panicing on them. Found and tested on RHEL 6 with 2.6.32 kernels. Fixes: 90e279db0cf5 ("Add more compatibility checkings to support older kernel releases") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: xt_set: Do not restrict --map-set to the mangle tableSerhey Popovych2019-12-091-5/+0
| | | | | | | | | | | | | | | | | | | | | | | While mangle table is primary place for packet modification setting mark, traffic class priority or hardware NIC queue can be done in any table with exception similar to using mark in policy-based routing setups (configured with ip-rule(8)) should be done before routing happens (i.e. in PREROUTING chain that usable in mangle or raw tables only). There is no such restriction in MARK target used to set packet mark and CLASSIFY target used to set traffic class priority. Both are free to use in any table. There is no known target that can modify hardware queue for packet. This helps in keeping filtering and packet modification rules together in filter table. Tested with rule in filter table with SET target using --map-prio and HTB for scheduling packets at egress. Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* em_ipset: Build on old kernelsSerhey Popovych2019-12-092-2/+2
| | | | | | | | | | | | | | | | | | | Make sure TCF_EM_IPSET defined and corresponds to current upstream value if not defined in target kernel. You need iproute2 version that supports em_ipset to communicate correctly. Include ip_set_compat.h after pkt_cls.h to prevent TCF_EM_IPSET redefine error. Detect skb->iif => skb->skb_iif rename after commit 8964be4a9a5c ("net: rename skb->iif to skb->skb_iif"). Add dev_get_by_index_rcu() define pointing to __dev_get_by_index() to build on RHEL6 kernels with explicit note that this may not work on all architectures. Always build em_ipset regardless of CONFIG_NET_EMATCH_IPSET option. Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Fix compatibility support for netlink extended ACK and add ↵Jozsef Kadlecsik2019-11-011-1/+1
| | | | synchronize_rcu_bh() checking
* Fix nla_policies to fully support NL_VALIDATE_STRICTJozsef Kadlecsik2019-11-013-2/+12
| | | | | | | Since v5.2 (commit "netlink: re-add parse/validate functions in strict mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies which did not support strict mode and thus the corresponding ipset commands failed.
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2019-10-3121-84/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: remove unnecessary spacesyangxingwu2019-10-312-2/+2
| | | | | | | | This patch removes extra spaces. Signed-off-by: yangxingwu <xingwu.yang@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: Add wildcard support to net,ifaceKristian Evensen2019-10-311-5/+18
| | | | | | | | | | | | | | | | | | | | | The net,iface equal functions currently compares the full interface names. In several cases, wildcard (or prefix) matching is useful. For example, when converting a large iptables rule-set to make use of ipset, I was able to significantly reduce the number of set elements by making use of wildcard matching. Wildcard matching is enabled by adding "wildcard" when adding an element to a set. Internally, this causes the IPSET_FLAG_IFACE_WILDCARD-flag to be set. When this flag is set, only the initial part of the interface name is used for comparison. Wildcard matching is done per element and not per set, as there are many cases where mixing wildcard and non-wildcard elements are useful. This means that is up to the user to handle (avoid) overlapping interface names. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: Copy the right MAC address in hash:ip,mac IPv6 setsStefano Brivio2019-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets"). When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that hash:ip,mac sets also support IPv6 as family, and this is covered by a separate function, hash_ipmac6_kadt(). In hash:ip,mac sets, the first dimension is the IP address, and the second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag in flags while deciding which MAC address to copy, destination or source. This way, mixing source and destination matches for the two dimensions of ip,mac hash type works as expected, also for IPv6. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 2001:db8::1/64 dev veth1 ip -net A addr add 2001:db8::2/64 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_hash hash:ip,mac family inet6 ip netns exec A ipset add test_hash 2001:db8::1,${dst} ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset now correctly matches a test packet: # ping -c1 2001:db8::2 >/dev/null # echo $? 0 Reported-by: Chen, Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: move ip_set_get_ip_port() to ip_set_bitmap_port.c.Jeremy Sowden2019-10-072-28/+27
| | | | | | | | ip_set_get_ip_port() is only used in ip_set_bitmap_port.c. Move it there and make it static. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: move function to ip_set_bitmap_ip.c.Jeremy Sowden2019-10-071-0/+12
| | | | | | | | One inline function in ip_set_bitmap.h is only called in ip_set_bitmap_ip.c: move it and remove inline function specifier. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: make ip_set_put_flags extern.Jeremy Sowden2019-10-071-0/+24
| | | | | | | | ip_set_put_flags is rather large for a static inline function in a header-file. Move it to ip_set_core.c and export it. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: move functions to ip_set_core.c.Jeremy Sowden2019-10-071-0/+102
| | | | | | | | Several inline functions in ip_set.h are only called in ip_set_core.c: move them and remove inline function specifier. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: move ip_set_comment functions from ip_set.h to ip_set_core.c.Jeremy Sowden2019-10-071-1/+65
| | | | | | | | | | | | | | | Most of the functions are only called from within ip_set_core.c. The exception is ip_set_init_comment. However, this is too complex to be a good candidate for a static inline function. Move it to ip_set_core.c, change its linkage to extern and export it, leaving a declaration in ip_set.h. ip_set_comment_free is only used as an extension destructor, so change its prototype to match and drop cast. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: remove inline from static functions in .c files.Jeremy Sowden2019-10-0719-138/+138
| | | | | | | | | | | | | The inline function-specifier should not be used for static functions defined in .c files since it bloats the kernel. Instead leave the compiler to decide which functions to inline. While a couple of the files affected (ip_set_*_gen.h) are technically headers, they contain templates for generating the common parts of particular set-types and so we treat them like .c files. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: inlined four headers files into another one.Jeremy Sowden2019-10-072-2/+1
| | | | | | | | | | | | | | | | | | | | linux/netfilter/ipset/ip_set.h included four other header files: include/linux/netfilter/ipset/ip_set_comment.h include/linux/netfilter/ipset/ip_set_counter.h include/linux/netfilter/ipset/ip_set_skbinfo.h include/linux/netfilter/ipset/ip_set_timeout.h Of these the first three were not included anywhere else. The last, ip_set_timeout.h, was included in a couple of other places, but defined inline functions which call other inline functions defined in ip_set.h, so ip_set.h had to be included before it. Inlined all four into ip_set.h, and updated the other files that included ip_set_timeout.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* netfilter: ipset: Fix an error code in ip_set_sockfn_get()Dan Carpenter2019-08-271-3/+5
| | | | | | | | | | | The copy_to_user() function returns the number of bytes remaining to be copied. In this code, that positive return is checked at the end of the function and we return zero/success. What we should do instead is return -EFAULT. Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* Fix rename concurrency with listingJozsef Kadlecsik2019-07-231-1/+1
| | | | | | | | | | | | | Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac setsStefano Brivio2019-06-282-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16 ip netns exec A ipset add test_bitmap 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: Actually allow destination MAC address for hash:ip,mac sets tooStefano Brivio2019-06-281-4/+0
| | | | | | | | | | | | | | | In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the KADT check that prevents matching on destination MAC addresses for hash:mac sets, but forgot to remove the same check for hash:ip,mac set. Drop this check: functionality is now commented in man pages and there's no reason to restrict to source MAC address matching anymore. Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
* ipset: update my email addressJozsef Kadlecsik2019-06-0520-34/+34
| | | | | | | | | It's better to use my kadlec@netfilter.org email address in the source code. I might not be able to use kadlec@blackhole.kfki.hu in the future. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: Fix memory accounting for hash types on resizeStefano Brivio2019-06-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a fresh array block is allocated during resize, the current in-memory set size should be increased by the size of the block, not replaced by it. Before the fix, adding entries to a hash set type, leading to a table resize, caused an inconsistent memory size to be reported. This becomes more obvious when swapping sets with similar sizes: # cat hash_ip_size.sh #!/bin/sh FAIL_RETRIES=10 tries=0 while [ ${tries} -lt ${FAIL_RETRIES} ]; do ipset create t1 hash:ip for i in `seq 1 4345`; do ipset add t1 1.2.$((i / 255)).$((i % 255)) done t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset create t2 hash:ip for i in `seq 1 4360`; do ipset add t2 1.2.$((i / 255)).$((i % 255)) done t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset swap t1 t2 t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset destroy t1 ipset destroy t2 tries=$((tries + 1)) if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then echo "FAIL after ${tries} tries:" echo "T1 size ${t1_init}, after swap ${t1_swap}" echo "T2 size ${t2_init}, after swap ${t2_swap}" exit 1 fi done echo "PASS" # echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control # ./hash_ip_size.sh [ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa [ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163) [ 2035.080353] Table destroy by resize 00000000fe6551fa FAIL after 4 tries: T1 size 9064, after swap 71128 T2 size 71128, after swap 9064 Reported-by: NOYB <JunkYardMail1@Frontier.com> Fixes: 9e41f26a505c ("netfilter: ipset: Count non-static extension memory for userspace") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix error path in set_target_v3_checkentry()Jozsef Kadlecsik2019-01-181-20/+21
| | | | Fix error path and release the references properly.
* Fix the last missing check of nla_parse()Jozsef Kadlecsik2019-01-101-3/+6
| | | | | In dump_init() the outdated comment was incorrect and we had a missing validation check of nla_parse().
* netfilter: ipset: fix a missing check of nla_parseAditya Pakki2019-01-081-2/+7
| | | | | | | | | When nla_parse fails, we should not use the results (the first argument). The fix checks if it fails, and if so, returns its error code upstream. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: merge uadd and udel functionsFlorent Fourcot2019-01-081-53/+20
| | | | | | | | Both functions are using exactly the same code, except the command value passed to call_ad function. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: remove useless memset() callsFlorent Fourcot2019-01-081-2/+0
| | | | | | | | | | One of the memset call is buggy: it does not erase full array, but only pointer size. Moreover, after a check, first step of nla_parse_nested/nla_parse is to erase tb array as well. We can remove both calls safely. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter/ipset: replace a strncpy() with strscpy()Qian Cai2018-12-041-2/+4
| | | | | | | | | | | | | | To make overflows as obvious as possible and to prevent code from blithely proceeding with a truncated string. This also has a side-effect to fix a compilation warning when using GCC 8.2.1. net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get': net/netfilter/ipset/ip_set_core.c:2027:3: warning: 'strncpy' writing 32 bytes into a region of size 2 overflows the destination [-Wstringop-overflow=] Signed-off-by: Qian Cai <cai@gmx.us> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: fix ip_set_byindex functionFlorent Fourcot2018-11-281-1/+1
| | | | | | | | | New function added by "Introduction of new commands and protocol version 7" is not working, since we return skb2 to user Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: do not call ipset_nest_end after nla_nest_cancelPan Bian2018-11-281-1/+1
| | | | | | | | | | | | | | In the error handling block, nla_nest_cancel(skb, atd) is called to cancel the nest operation. But then, ipset_nest_end(skb, atd) is unexpected called to end the nest operation. This patch calls the ipset_nest_end only on the branch that nla_nest_cancel is not called. Fixes: 45040978c89("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Correct workaround in patch "Fix calling ip_set() macro at dumping"Jozsef Kadlecsik2018-10-301-15/+4
| | | | | | As Pablo pointed out, in order to fix the bogus warnings, there's no need for the non-useful rcu_read_lock/unlock dancing. Call rcu_dereference_raw() instead, the ref_netlink protects the set.
* Introduction of new commands and protocol version 7Jozsef Kadlecsik2018-10-271-17/+149
| | | | | | | | | | | Two new commands (IPSET_CMD_GET_BYNAME, IPSET_CMD_GET_BYINDEX) are introduced. The new commands makes possible to eliminate the getsockopt operation (in iptables set/SET match/target) and thus use only netlink communication between userspace and kernel for ipset. With the new protocol version, userspace can exactly know which functionality is supported by the running kernel. Both the kernel and userspace is fully backward compatible.
* net: Convert ip_set_net_opsKirill Tkhai2018-10-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These pernet_operations initialize and destroy net_generic(net, ip_set_net_id)-related data. Since ip_set is under CONFIG_IP_SET, it's easy to watch drivers, which depend on this config. All of them are in net/netfilter/ipset directory, except of net/netfilter/xt_set.c. There are no more drivers, which use ip_set, and all of the above don't register another pernet_operations. Also, there are is no indirect users, as header file include/linux/netfilter/ipset/ip_set.h does not define indirect users by something like this: #ifdef CONFIG_IP_SET extern func(void); #else static inline func(void); #endif So, there are no more pernet operations, dereferencing net_generic(net, ip_set_net_id). ip_set_net_ops are OK to be executed in parallel for several net, so we mark them as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: Replace spin_is_locked() with lockdepLance Roy2018-10-221-1/+1
| | | | | | | | | | | | | | | | | lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: <netfilter-devel@vger.kernel.org> Cc: <coreteam@netfilter.org> Cc: <netdev@vger.kernel.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Fix calling ip_set() macro at dumpingJozsef Kadlecsik2018-10-191-4/+19
| | | | | | | | The ip_set() macro is called when either ip_set_ref_lock held only or no lock/nfnl mutex is held at dumping. Take this into account properly. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: fix ip_set_list allocation failureAndrey Ryabinin2018-09-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_set_create() and ip_set_net_init() attempt to allocate physically contiguous memory for ip_set_list. If memory is fragmented, the allocations could easily fail: vzctl: page allocation failure: order:7, mode:0xc0d0 Call Trace: dump_stack+0x19/0x1b warn_alloc_failed+0x110/0x180 __alloc_pages_nodemask+0x7bf/0xc60 alloc_pages_current+0x98/0x110 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x26/0xa0 __kmalloc+0x279/0x290 ip_set_net_init+0x4b/0x90 [ip_set] ops_init+0x3b/0xb0 setup_net+0xbb/0x170 copy_net_ns+0xf1/0x1c0 create_new_namespaces+0xf9/0x180 copy_namespaces+0x8e/0xd0 copy_process+0xb61/0x1a00 do_fork+0x91/0x320 Use kvcalloc() to fallback to 0-order allocations if high order page isn't available. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: Make invalid MAC address checks consistentStefano Brivio2018-08-302-7/+7
| | | | | | | | | | | | | | | | Set types bitmap:ipmac and hash:ipmac check that MAC addresses are not all zeroes. Introduce one missing check, and make the remaining ones consistent, using is_zero_ether_addr() instead of comparing against an array containing zeroes. This was already done for hash:mac sets in commit 26c97c5d8dac ("netfilter: ipset: Use is_zero_ether_addr instead of static and memcmp"). Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: Allow matching on destination MAC address for mac and ipmac setsStefano Brivio2018-08-303-16/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There doesn't seem to be any reason to restrict MAC address matching to source MAC addresses in set types bitmap:ipmac, hash:ipmac and hash:mac. With this patch, and this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up ip netns exec A ipset create test hash:mac dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset add test ${dst} ip netns exec A iptables -P INPUT DROP ip netns exec A iptables -I INPUT -m set --match-set test dst -j ACCEPT ipset will match packets based on destination MAC address: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,netEric Westbrook2018-08-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow /0 as advertised for hash:net,port,net sets. For "hash:net,port,net", ipset(8) says that "either subnet is permitted to be a /0 should you wish to match port between all destinations." Make that statement true. Before: # ipset create cidrzero hash:net,port,net # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0 ipset v6.34: The value of the CIDR parameter of the IP address is invalid # ipset create cidrzero6 hash:net,port,net family inet6 # ipset add cidrzero6 ::/0,12345,::/0 ipset v6.34: The value of the CIDR parameter of the IP address is invalid After: # ipset create cidrzero hash:net,port,net # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0 # ipset test cidrzero 192.168.205.129,12345,172.16.205.129 192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero. # ipset create cidrzero6 hash:net,port,net family inet6 # ipset add cidrzero6 ::/0,12345,::/0 # ipset test cidrzero6 fe80::1,12345,ff00::1 fe80::1,tcp:12345,ff00::1 is in set cidrzero6. See also: https://bugzilla.kernel.org/show_bug.cgi?id=200897 https://github.com/ewestbrook/linux/commit/df7ff6efb0934ab6acc11f003ff1a7580d6c1d9c Signed-off-by: Eric Westbrook <linux@westbrook.io> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: list:set: Decrease refcount synchronously on deletion and replaceStefano Brivio2018-07-162-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") postponed decreasing set reference counters to the RCU callback. An 'ipset del' command can terminate before the RCU grace period is elapsed, and if sets are listed before then, the reference counter shown in userspace will be wrong: # ipset create h hash:ip; ipset create l list:set; ipset add l # ipset del l h; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 1 Number of entries: 0 Members: # sleep 1; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 0 Number of entries: 0 Members: Fix this by making the reference count update synchronous again. As a result, when sets are listed, ip_set_name_byindex() might now fetch a set whose reference count is already zero. Instead of relying on the reference count to protect against concurrent set renaming, grab ip_set_ref_lock as reader and copy the name, while holding the same lock in ip_set_rename() as writer instead. Reported-by: Li Shuang <shuali@redhat.com> Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: forbid family for hash:mac setsFlorent Fourcot2018-06-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | Userspace `ipset` command forbids family option for hash:mac type: ipset create test hash:mac family inet4 ipset v6.30: Unknown argument: `family' However, this check is not done in kernel itself. When someone use external netlink applications (pyroute2 python library for example), one can create hash:mac with invalid family and inconsistant results from userspace (`ipset` command cannot read set content anymore). This patch enforce the logic in kernel, and forbids insertion of hash:mac with a family set. Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no impact on other hash:* sets Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>