| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Thus the tests tasks can be simplified and all exceptions can be handled in
the helper scripts.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
The parameter defines the upper limit in any hash bucket at adding new entries
from userspace - if the limit would be exceeded, ipset doubles the hash size
and rehashes. It means the set may consume more memory but gives faster
evaluation at matching in the set.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
|
|
|
| |
Sort 95.0.0.0 before 107.0.0.0 instead of the textual sorting.
Also, in the case of subnets, sort reversed, ie. most specific first.
|
|
|
|
|
| |
Support listing/saving with sorted entries for the hash types.
(bitmap and list types are automatically sorted.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some hosts might not use /var/log/kern.log for kernel messages,
so if we can't find a match there, try dmesg next.
If no matches are found, don't let the shell terminate the
script, so that we have a chance to try dmesg and actually echo
"no match!" if no matches are found: set +e before the setname
loop.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
| |
Parsing is attempted both for numbers and service names and
the temporary stored error message triggered to reset the state
parameters about the set. Reported by Yuri D'Elia.
|
|
|
|
| |
Fixes bugzilla id #1209.
|
| |
|
|
|
|
| |
The matching of the counters was not taken into account, fixed.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the ipset command replacement.
For ipset="/sbin/ipset"
Actual:
/sbin//sbin/ipset 2>.foo.err | ... | xargs -n1 ipset
Expected:
/sbin/ipset 2>.foo.err | ... | xargs -n1 /sbin/ipset
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
| |
At listing, timed out entries are not listed but the number of entries
counter is updated at garbage collection.
|
| |
|
|
|
|
|
|
|
| |
Give enough time for the entries to timeout before listing, so that
we get the correct number of entries.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dave Jones reported that KASan detected out of bounds access in hash:net*
types:
[ 23.139532] ==================================================================
[ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58
[ 23.152937] Write of size 4 by task ipset/457
[ 23.159742] =============================================================================
[ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected
[ 23.173641] -----------------------------------------------------------------------------
[ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456
[ 23.201836] __slab_alloc.constprop.66+0x554/0x620
[ 23.208994] __kmalloc+0x2f2/0x360
[ 23.216105] hash_net_create+0x16a/0x470
[ 23.223238] ip_set_create+0x3e6/0x740
[ 23.230343] nfnetlink_rcv_msg+0x599/0x640
[ 23.237454] netlink_rcv_skb+0x14f/0x190
[ 23.244533] nfnetlink_rcv+0x3f6/0x790
[ 23.251579] netlink_unicast+0x272/0x390
[ 23.258573] netlink_sendmsg+0x5a1/0xa50
[ 23.265485] SYSC_sendto+0x1da/0x2c0
[ 23.272364] SyS_sendto+0xe/0x10
[ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f
The bug is fixed in the patch and the testsuite is extended in ipset
to check cidr handling more thoroughly.
|
| |
|
|
|
|
|
|
| |
It is better to list the set elements for all set types, thus the
header information is uniform. Element counts are therefore added
to the bitmap and list types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It would be useful for userspace to query the size of an ipset hash,
however, this data is not exposed to userspace outside of counting the
number of member entries. This patch uses the attribute
IPSET_ATTR_ELEMENTS to indicate the size in the the header that is
exported to userspace. This field is then printed by the userspace
tool for hashes.
Because it is only meaningful for hashes to report their size, the
output is conditional on the set type. To do this checking the
MATCH_TYPENAME macro was moved to utils.h.
The bulk of this patch changes the expected test suite to account for
the change in output.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Josh Hunt <johunt@akamai.com>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
| |
Commit 092d67cda9ad4 broke the cidr handling for the hash:*net* types
when the sets were used by the SET target: entries with invalid cidr
values were added to the sets. Reported by Jonathan Johnson.
Testsuite entry is added to verify the fix.
|
|
|
|
|
|
|
|
| |
When elements added to a hash:* type of set and resizing triggered,
parallel listing could start to list the original set (before resizing)
and "continue" with listing the new set. Fix it by references and
using the original hash table for listing. Therefore the destroying
the original hash table may happen from the resizing or listing functions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Performance is tested by Jesper Dangaard Brouer:
Simple drop in FORWARD
~~~~~~~~~~~~~~~~~~~~~~
Dropping via simple iptables net-mask match::
iptables -t raw -N simple || iptables -t raw -F simple
iptables -t raw -I simple -s 198.18.0.0/15 -j DROP
iptables -t raw -D PREROUTING -j simple
iptables -t raw -I PREROUTING -j simple
Drop performance in "raw": 11.3Mpps
Generator: sending 12.2Mpps (tx:12264083 pps)
Drop via original ipset in RAW table
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a set with lots of elements::
sudo ./ipset destroy test
echo "create test hash:ip hashsize 65536" > test.set
for x in `seq 0 255`; do
for y in `seq 0 255`; do
echo "add test 198.18.$x.$y" >> test.set
done
done
sudo ./ipset restore < test.set
Dropping via ipset::
iptables -t raw -F
iptables -t raw -N net198 || iptables -t raw -F net198
iptables -t raw -I net198 -m set --match-set test src -j DROP
iptables -t raw -I PREROUTING -j net198
Drop performance in "raw" with ipset: 8Mpps
Perf report numbers ipset drop in "raw"::
+ 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test
- 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh
- _raw_read_lock_bh
+ 99.88% ip_set_test
- 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh
- _raw_read_unlock_bh
+ 99.72% ip_set_test
+ 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt
+ 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer
+ 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table
+ 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test
+ 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core
+ 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb
+ 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv
+ 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive
+ 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock
+ 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq
+ 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag
+ 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc
+ 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3
+ 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive
+ 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate
+ 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page
+ 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock
Drop via ipset in RAW table with RCU-locking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With RCU locking, the RW-lock is gone.
Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps
Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
|
|
|
|
|
|
| |
When the set was full (hash type and maxelem reached), it was not
possible to update the extension part of already existing elements.
The patch removes this limitation. (Fixes netfilter bugzilla id 880.)
|
| |
|
|
|
|
| |
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
| |
ipset would not parse ether addresses which are not exactly
17 characters long, for ex. 1:2:3:4:5:6, which is fixed in
the patch.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the "as-installed" package testing situation, the test scripts should
invokes the system installed "ipset" binary.
Therefore, the IPSET_BIN could be passed to change the binary location.
IPSET_BIN=/sbin/ipset ./runtest.sh
The test scripts run fine in build source tree without IPSET_BIN.
Signed-off-by: Neutron Soutmun <neo.neutron@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
| |
modified ipset_print_mark to print in hex rather then decimal and
altered accordingly test cases.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
|
| |
Introduce packet mark mask for hash:ip,mark data type. This allows to
set mark bit filter for the ip set.
Change-Id: Id8dd9ca7e64477c4f7b022a1d9c1a5b187f1c96e
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce packet mark support with new ip,mark hash set. This includes
userspace and kernelspace code, hash:ip,mark set tests and man page
updates.
The intended use of ip,mark set is similar to the ip:port type, but for
protocols which don't use a predictable port number. Instead of port
number it matches a firewall mark determined by a layer 7 filtering
program like opendpi.
As well as allowing or blocking traffic it will also be used for
accounting packets and bytes sent for each protocol.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
|
|
|
|
|
|
|
|
| |
This adds the userspace library, tests to validate correct operation of
the module and also provides appropriate usage information in the man
page.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
| |
|
|
|
|
|
|
|
|
|
| |
This adds the userspace library, tests to validate correct operation of
the module and also provides appropriate usage information in the man
page. The library version has been bumped accordingly.
Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
|
| |
|
|
|
|
|
| |
The max number of sets was hardcoded at kernel cofiguration time.
The patch adds the support to increase the max number of sets automatically.
|
| |
|
|
|
|
|
| |
Test all possible range variations with the hash types in order
to catch bugs like the range bug in hash:ip,port,net.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Exceptions can now be matched and we can branch according to the
possible cases:
a. match in the set if the element is not flagged as "nomatch"
b. match in the set if the element is flagged with "nomatch"
c. no match
i.e.
iptables ... -m set --match-set ... -j ...
iptables ... -m set --match-set ... --nomatch-entries -j ...
...
|
|
|
|
|
|
|
|
|
| |
Incompatibility: if your script rely on the number of lines in the header
of set listings, then the new line
Revision: number
can break your script.
|