summaryrefslogtreecommitdiffstats
path: root/kernel/include/linux/netfilter/ipset/ip_set.h
Commit message (Collapse)AuthorAgeFilesLines
* ipset: update my email addressJozsef Kadlecsik2019-06-051-1/+1
| | | | | | | | | It's better to use my kadlec@netfilter.org email address in the source code. I might not be able to use kadlec@blackhole.kfki.hu in the future. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Introduction of new commands and protocol version 7Jozsef Kadlecsik2018-10-271-1/+1
| | | | | | | | | | | Two new commands (IPSET_CMD_GET_BYNAME, IPSET_CMD_GET_BYINDEX) are introduced. The new commands makes possible to eliminate the getsockopt operation (in iptables set/SET match/target) and thus use only netlink communication between userspace and kernel for ipset. With the new protocol version, userspace can exactly know which functionality is supported by the running kernel. Both the kernel and userspace is fully backward compatible.
* ipset: list:set: Decrease refcount synchronously on deletion and replaceStefano Brivio2018-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") postponed decreasing set reference counters to the RCU callback. An 'ipset del' command can terminate before the RCU grace period is elapsed, and if sets are listed before then, the reference counter shown in userspace will be wrong: # ipset create h hash:ip; ipset create l list:set; ipset add l # ipset del l h; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 1 Number of entries: 0 Members: # sleep 1; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 0 Number of entries: 0 Members: Fix this by making the reference count update synchronous again. As a result, when sets are listed, ip_set_name_byindex() might now fetch a set whose reference count is already zero. Instead of relying on the reference count to protect against concurrent set renaming, grab ip_set_ref_lock as reader and copy the name, while holding the same lock in ip_set_rename() as writer instead. Reported-by: Li Shuang <shuali@redhat.com> Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix "don't update counters" mode when counters used at the matchingJozsef Kadlecsik2018-01-041-0/+6
| | | | The matching of the counters was not taken into account, fixed.
* netfilter: ipset: fix race condition in ipset save, swap and deleteVishwanath Pai2016-03-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine */ Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to *_head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix extension alignmentJozsef Kadlecsik2015-11-071-1/+1
| | | | | | | | | | | | | The data extensions in ipset lacked the proper memory alignment and thus could lead to kernel crash on several architectures. Therefore the structures have been reorganized and alignment attributes added where needed. The patch was tested on armv7h by Gerhard Wiesinger and on x86_64, sparc64 by Jozsef Kadlecsik. Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Count non-static extension memory into the set memory size for userspaceJozsef Kadlecsik2015-06-261-2/+6
| | | | | | | | Non-static (i.e. comment) extension was not counted into the memory size. A new internal counter is introduced for this. In the case of the hash types the sizes of the arrays are counted there as well so that we can avoid to scan the whole set when just the header data is requested.
* Add element count to all set types headerJozsef Kadlecsik2015-06-251-0/+2
| | | | | | It is better to list the set elements for all set types, thus the header information is uniform. Element counts are therefore added to the bitmap and list types.
* netlink: implement nla_put_in_addr and nla_put_in6_addrJiri Benc2015-06-131-3/+2
| | | | | | | | | | | | | | | | IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Compatibility part added. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: deinline ip_set_put_extensions()Denys Vlasenko2015-06-131-23/+2
| | | | | | | | | | | | | | | | | | | | | | n x86 allyesconfig build: The function compiles to 489 bytes of machine code. It has 25 callsites. text data bss dec hex filename 82441375 22255384 20627456 125324215 7784bb7 vmlinux.before 82434909 22255384 20627456 125317749 7783275 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> CC: Eric W. Biederman <ebiederm@xmission.com> CC: David S. Miller <davem@davemloft.net> CC: Jan Engelhardt <jengelh@medozas.de> CC: Jiri Pirko <jpirko@redhat.com> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: netfilter-devel@vger.kernel.org Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Split extensions into separate filesJozsef Kadlecsik2015-05-061-90/+2
| | | | | | | Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Improve skbinfo get/init helpersJozsef Kadlecsik2015-05-051-19/+11
| | | | | | | | | | | Use struct ip_set_skbinfo in struct ip_set_ext instead of open coded fields and assign structure members in get/init helpers instead of copying members one by one. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Headers file cleanupJozsef Kadlecsik2015-05-051-28/+28
| | | | | | | | | | Remove extra whitespace, group counter helper together. Mark some of the helpers arguments as const. Ported from a patch proposed by Sergey Popovich <popovich_sergei@mail.ua>. Suggested-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* RCU safe comment extension handlingJozsef Kadlecsik2015-03-291-1/+6
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Fix ext_*() macrosSergey Popovich2015-03-231-4/+4
| | | | | | | | | So pointers returned by these macros could be referenced with -> directly. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix cidr handling for hash:*net* typesJozsef Kadlecsik2015-03-131-2/+0
| | | | | | | | Commit 092d67cda9ad4 broke the cidr handling for the hash:*net* types when the sets were used by the SET target: entries with invalid cidr values were added to the sets. Reported by Jonathan Johnson. Testsuite entry is added to verify the fix.
* More compatibility checking and simplificationsJozsef Kadlecsik2015-01-061-1/+1
| | | | | Try hard to keep the support of the 2.6.32 kernel tree and simplify the code with self-referential macros.
* Remove an unused macroJozsef Kadlecsik2014-12-101-4/+0
|
* Remove unnecessary integer RCU handling and fix sparse warningsJozsef Kadlecsik2014-11-271-58/+19
|
* Fix parallel resizing and listing of the same setJozsef Kadlecsik2014-11-181-5/+8
| | | | | | | | When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying the original hash table may happen from the resizing or listing functions.
* styles warned by checkpatch.pl fixedJozsef Kadlecsik2014-11-181-4/+5
|
* Introduce RCU in all set types instead of rwlock per setJozsef Kadlecsik2014-11-181-20/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple || iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 || iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
* skbinfo extension: send nonzero extension elements only to userspaceJozsef Kadlecsik2014-09-151-7/+11
|
* netfilter: ipset: Add skbinfo extension kernel support in the ipset core.Anton Danilov2014-09-081-1/+55
| | | | | | | | | | | Skbinfo extension provides mapping of metainformation with lookup in the ipset tables. This patch defines the flags, the constants, the functions and the structures for the data type independent support of the extension. Note the firewall mark stores in the kernel structures as two 32bit values, but transfered through netlink as one 64bit value. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: add forceadd kernel support for hash set typesJosh Hunt2014-03-041-0/+3
| | | | | | | | | | | | | | | | | | Adds a new property for hash set types, where if a set is created with the 'forceadd' option and the set becomes full the next addition to the set may succeed and evict a random entry from the set. To keep overhead low eviction is done very simply. It checks to see which bucket the new entry would be added. If the bucket's pos value is non-zero (meaning there's at least one entry in the bucket) it replaces the first entry in the bucket. If pos is zero, then it continues down the normal add process. This property is useful if you have a set for 'ban' lists where it may not matter if you release some entries from the set early. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Prepare the kernel for create option flags when no extension is neededJozsef Kadlecsik2014-02-131-0/+2
|
* add hash:ip,mark data type to ipsetVytas Dauksa2014-01-081-4/+6
| | | | | | | | | | | | | | | | Introduce packet mark support with new ip,mark hash set. This includes userspace and kernelspace code, hash:ip,mark set tests and man page updates. The intended use of ip,mark set is similar to the ip:port type, but for protocols which don't use a predictable port number. Instead of port number it matches a firewall mark determined by a layer 7 filtering program like opendpi. As well as allowing or blocking traffic it will also be used for accounting packets and bytes sent for each protocol. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: remove unused codeStephen Hemminger2014-01-071-1/+0
| | | | | | | Function never used in current upstream code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Use netlink callback dump args onlyJozsef Kadlecsik2013-10-021-0/+10
| | | | | Instead of cb->data, use callback dump args only and introduce symbolic names instead of plain numbers at accessing the argument members.
* ipset: Add net namespace for ipsetVitaly Lavrov2013-09-281-7/+9
| | | | | | | | | | | | | | | | This patch adds netns support for ipset. Major changes were made in ip_set_core.c and ip_set.h. Global variables are moved to per net namespace. Added initialization code and the destruction of the network namespace ipset subsystem. In the prototypes of public functions ip_set_* added parameter "struct net*". The remaining corrections related to the change prototypes of public functions ip_set_*. The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347 Signed-off-by: Vitaly Lavrov <lve@guap.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Use a common function at listing the extensions of the elementsJozsef Kadlecsik2013-09-251-0/+21
|
* netfilter: ipset: Support comments for ipset entries in the core.Oliver Smith2013-09-231-8/+43
| | | | | | | | | | | | | This adds the core support for having comments on ipset entries. The comments are stored as standard null-terminated strings in dynamically allocated memory after being passed to the kernel. As a result of this, code has been added to the generic destroy function to iterate all extensions and call that extension's destroy task if the set has that extension activated, and if such a task is defined. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Support extensions which need a per data destroy functionJozsef Kadlecsik2013-09-091-5/+17
|
* Generalize extensions supportJozsef Kadlecsik2013-09-071-0/+13
| | | | | Get rid of the structure based extensions and introduce a blob for the extensions. Thus we can support more extension types easily.
* Move extension data to set structureJozsef Kadlecsik2013-09-071-9/+20
| | | | | | Default timeout and extension offsets are moved to struct set, because all set types supports all extensions and it makes possible to generalize extension support.
* Rename extension offset ids to extension idsJozsef Kadlecsik2013-09-061-8/+8
|
* Prepare ipset to support multiple networks for hash typesJozsef Kadlecsik2013-09-041-0/+2
| | | | | | In order to support hash:net,net, hash:net,port,net etc. types, arrays are introduced for the book-keeping of existing cidr sizes and network numbers in a set.
* Consistent userspace testing with nomatch flagJozsef Kadlecsik2013-07-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test.
* Use fix sized type for timeout in the extension partJozsef Kadlecsik2013-05-021-1/+1
|
* Rename simple macro names to avoid namespace issues.Jozsef Kadlecsik2013-05-011-0/+3
| | | | Reported-by: David Laight <David.Laight@ACULAB.COM>
* set match: add support to match the countersJozsef Kadlecsik2013-04-091-2/+7
| | | | | | | | | | The new revision of the set match supports to match the counters and to suppress updating the counters at matching too. At the set:list types, the updating of the subcounters can be suppressed as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Introduce the counter extension in the coreJozsef Kadlecsik2013-04-091-4/+71
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Introduce extensions to elements in the coreJozsef Kadlecsik2013-04-091-7/+39
| | | | | | | Introduce extensions to elements in the core and prepare timeout as the first one. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Make possible to test elements marked with nomatch, from userspaceJozsef Kadlecsik2013-04-091-0/+8
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Add a compatibility header file for easier maintenanceJozsef Kadlecsik2013-04-091-27/+1
| | | | | | | Unfortunately not everything could be moved there, there are still compatibility ifdefs in some other files. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* The uapi include split in the package itselfJozsef Kadlecsik2013-04-091-222/+3
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Restore the support of kernel versions between 2.6.32 and 2.6.35Jozsef Kadlecsik2012-11-051-0/+5
|
* Support to match elements marked with "nomatch" in hash:*net* setsJozsef Kadlecsik2012-09-211-0/+4
| | | | | | | | | | | | | | | Exceptions can now be matched and we can branch according to the possible cases: a. match in the set if the element is not flagged as "nomatch" b. match in the set if the element is flagged with "nomatch" c. no match i.e. iptables ... -m set --match-set ... -j ... iptables ... -m set --match-set ... --nomatch-entries -j ... ...
* Coding style fixesJozsef Kadlecsik2012-09-111-2/+3
|
* Include supported revisions in module descriptionJozsef Kadlecsik2012-09-111-0/+6
|