summaryrefslogtreecommitdiffstats
path: root/kernel/net/netfilter/ipset/ip_set_core.c
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: ipset: Properly calculate extensions offsets and total lengthSergey Popovich2015-03-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offsets and total length returned by the ip_set_elem_len() calculated incorrectly as initial set element length (i.e. len parameter) is used multiple times in offset calculations, also affecting set element total length. Use initial set element length as start offset, do not add aligned extension offset to the offset. Return offset as total length of the set element. This reduces memory requirements on per element basic for the hash:* type of sets. For example output from 'ipset -terse list test-1' on 64-bit PC, where test-1 is generated via following script: #!/bin/bash set_name='test-1' ipset create "$set_name" hash:net family inet \ timeout 10800 counters comment \ hashsize 65536 maxelem 65536 declare -i o3 o4 fmt="add $set_name 192.168.%u.%u\n" for ((o3 = 0; o3 < 256; o3++)); do for ((o4 = 0; o4 < 256; o4++)); do printf "$fmt" $o3 $o4 done done |ipset -exist restore BEFORE this patch is applied # ipset -terse list test-1 Name: test-1 Type: hash:net Revision: 6 Header: family inet hashsize 65536 maxelem 65536 timeout 10800 counters comment Size in memory: 26348440 and AFTER applying patch # ipset -terse list test-1 Name: test-1 Type: hash:net Revision: 6 Header: family inet hashsize 65536 maxelem 65536 timeout 10800 counters comment Size in memory: 7706392 References: 0 Signed-off-by: Sergey Popovich <popovich_sergei@mail.ua> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Make sure listing doesn't grab a set which is just being destroyed.Jozsef Kadlecsik2015-01-081-6/+21
| | | | | There was a small window when all sets are destroyed and a concurrent listing of all sets could grab a set which is just being destroyed.
* More compatibility checking and simplificationsJozsef Kadlecsik2015-01-061-18/+13
| | | | | Try hard to keep the support of the 2.6.32 kernel tree and simplify the code with self-referential macros.
* Fix coding styles reported by checkpatch.plJozsef Kadlecsik2015-01-061-44/+32
|
* Use nlmsg_total_size instead of NLMSG_SPACE in ip_set_core.c.Jozsef Kadlecsik2015-01-061-2/+2
|
* Call synchronize_rcu() in set type (un)register functions only when neededJozsef Kadlecsik2014-12-101-5/+4
|
* Give a better name to a macro in ip_set_core.cJozsef Kadlecsik2014-12-101-9/+9
|
* netfilter: ipset: small potential read beyond the end of bufferDan Carpenter2014-11-181-0/+5
| | | | | | | | | | We could be reading 8 bytes into a 4 byte buffer here. It seems harmless but adding a check is the right thing to do and it silences a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Fix parallel resizing and listing of the same setJozsef Kadlecsik2014-11-181-13/+17
| | | | | | | | When elements added to a hash:* type of set and resizing triggered, parallel listing could start to list the original set (before resizing) and "continue" with listing the new set. Fix it by references and using the original hash table for listing. Therefore the destroying the original hash table may happen from the resizing or listing functions.
* styles warned by checkpatch.pl fixedJozsef Kadlecsik2014-11-181-3/+8
|
* Introduce RCU in all set types instead of rwlock per setJozsef Kadlecsik2014-11-181-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance is tested by Jesper Dangaard Brouer: Simple drop in FORWARD ~~~~~~~~~~~~~~~~~~~~~~ Dropping via simple iptables net-mask match:: iptables -t raw -N simple || iptables -t raw -F simple iptables -t raw -I simple -s 198.18.0.0/15 -j DROP iptables -t raw -D PREROUTING -j simple iptables -t raw -I PREROUTING -j simple Drop performance in "raw": 11.3Mpps Generator: sending 12.2Mpps (tx:12264083 pps) Drop via original ipset in RAW table ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a set with lots of elements:: sudo ./ipset destroy test echo "create test hash:ip hashsize 65536" > test.set for x in `seq 0 255`; do for y in `seq 0 255`; do echo "add test 198.18.$x.$y" >> test.set done done sudo ./ipset restore < test.set Dropping via ipset:: iptables -t raw -F iptables -t raw -N net198 || iptables -t raw -F net198 iptables -t raw -I net198 -m set --match-set test src -j DROP iptables -t raw -I PREROUTING -j net198 Drop performance in "raw" with ipset: 8Mpps Perf report numbers ipset drop in "raw":: + 24.65% ksoftirqd/1 [ip_set] [k] ip_set_test - 21.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_lock_bh - _raw_read_lock_bh + 99.88% ip_set_test - 19.42% ksoftirqd/1 [kernel.kallsyms] [k] _raw_read_unlock_bh - _raw_read_unlock_bh + 99.72% ip_set_test + 4.31% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_kadt + 2.27% ksoftirqd/1 [ixgbe] [k] ixgbe_fetch_rx_buffer + 2.18% ksoftirqd/1 [ip_tables] [k] ipt_do_table + 1.81% ksoftirqd/1 [ip_set_hash_ip] [k] hash_ip4_test + 1.61% ksoftirqd/1 [kernel.kallsyms] [k] __netif_receive_skb_core + 1.44% ksoftirqd/1 [kernel.kallsyms] [k] build_skb + 1.42% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv + 1.36% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip + 1.16% ksoftirqd/1 [kernel.kallsyms] [k] dev_gro_receive + 1.09% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock + 0.96% ksoftirqd/1 [ixgbe] [k] ixgbe_clean_rx_irq + 0.95% ksoftirqd/1 [kernel.kallsyms] [k] __netdev_alloc_frag + 0.88% ksoftirqd/1 [kernel.kallsyms] [k] kmem_cache_alloc + 0.87% ksoftirqd/1 [xt_set] [k] set_match_v3 + 0.85% ksoftirqd/1 [kernel.kallsyms] [k] inet_gro_receive + 0.83% ksoftirqd/1 [kernel.kallsyms] [k] nf_iterate + 0.76% ksoftirqd/1 [kernel.kallsyms] [k] put_compound_page + 0.75% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_lock Drop via ipset in RAW table with RCU-locking ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With RCU locking, the RW-lock is gone. Drop performance in "raw" with ipset with RCU-locking: 11.3Mpps Performance-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
* net: use the new API kvfree()WANG Cong2014-11-031-4/+1
| | | | | | | | | It is available since v3.15-rc5. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* treewide: fix errors in printkMasanari Iida2014-11-031-1/+1
| | | | | | | | | This patch fix spelling typo in printk. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* netfilter: ipset: off by one in ip_set_nfnl_get_byindex()Dan Carpenter2014-10-211-1/+1
| | | | | | | | The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: Convert pr_warning to pr_warnJoe Perches2014-09-141-12/+11
| | | | | | | | | | | | Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Add skbinfo extension kernel support in the ipset core.Anton Danilov2014-09-081-1/+26
| | | | | | | | | | | Skbinfo extension provides mapping of metainformation with lookup in the ipset tables. This patch defines the flags, the constants, the functions and the structures for the data type independent support of the extension. Note the firewall mark stores in the kernel structures as two 32bit values, but transfered through netlink as one 64bit value. Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix static checker warning in ip_set_core.cJozsef Kadlecsik2014-09-011-1/+2
| | | | | | | | | Dan Carpenter reported the following static checker warning: net/netfilter/ipset/ip_set_core.c:1414 call_ad() error: 'nlh->nlmsg_len' from user is not capped properly The payload size is limited now by the max size of size_t.
* netfilter: ip_set: rename nfnl_dereference()/nfnl_set()Patrick McHardy2014-03-071-23/+23
| | | | | | | | | | The next patch will introduce a nfnl_dereference() macro that actually checks that the appropriate mutex is held and therefore needs a subsystem argument. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: add forceadd kernel support for hash set typesJosh Hunt2014-03-041-0/+2
| | | | | | | | | | | | | | | | | | Adds a new property for hash set types, where if a set is created with the 'forceadd' option and the set becomes full the next addition to the set may succeed and evict a random entry from the set. To keep overhead low eviction is done very simply. It checks to see which bucket the new entry would be added. If the bucket's pos value is non-zero (meaning there's at least one entry in the bucket) it replaces the first entry in the bucket. If pos is zero, then it continues down the normal add process. This property is useful if you have a set for 'ban' lists where it may not matter if you release some entries from the set early. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: move registration message to init from net_initIlia Mirkin2014-02-161-1/+1
| | | | | | | | | Commit 1785e8f473 ("netfiler: ipset: Add net namespace for ipset") moved the initialization print into net_init, which can get called a lot due to namespaces. Move it back into init, reduce to pr_info. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: remove unused codeStephen Hemminger2014-01-071-28/+0
| | | | | | | Function never used in current upstream code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* ipset: Follow manual page behavior for SET target on list:setSergey Popovich2013-11-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipset(8) for list:set says: The match will try to find a matching entry in the sets and the target will try to add an entry to the first set to which it can be added. However real behavior is bit differ from described. Consider example: # ipset create test-1-v4 hash:ip family inet # ipset create test-1-v6 hash:ip family inet6 # ipset create test-1 list:set # ipset add test-1 test-1-v4 # ipset add test-1 test-1-v6 # iptables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src # ip6tables -A INPUT -p tcp --destination-port 25 -j SET --add-set test-1 src And then when iptables/ip6tables rule matches packet IPSET target tries to add src from packet to the list:set test-1 where first entry is test-1-v4 and the second one is test-1-v6. For IPv4, as it first entry in test-1 src added to test-1-v4 correctly, but for IPv6 src not added! Placing test-1-v6 to the first element of list:set makes behavior correct for IPv6, but brokes for IPv4. This is due to result, returned from ip_set_add() and ip_set_del() from net/netfilter/ipset/ip_set_core.c when set in list:set equires more parameters than given or address families do not match (which is this case). It seems wrong returning 0 from ip_set_add() and ip_set_del() in this case, as 0 should be returned only when an element successfuly added/deleted to/from the set, contrary to ip_set_test() which returns 0 when no entry exists and >0 when entry found in set. Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* net->user_ns is available starting from 3.8, add compatibility checkingJozsef Kadlecsik2013-10-271-0/+4
| | | | Reported by Jan Engelhardt
* Compatibility code is modified not to rely on kernel version numbersJozsef Kadlecsik2013-10-021-3/+38
| | | | | | | Instead the kernel source code is checked to verify the different compatibility issues for the supported kernel releases. This way hopefully backported features will be handled properly.
* Use netlink callback dump args onlyJozsef Kadlecsik2013-10-021-35/+35
| | | | | Instead of cb->data, use callback dump args only and introduce symbolic names instead of plain numbers at accessing the argument members.
* ipset: Add net namespace for ipsetVitaly Lavrov2013-09-281-101/+187
| | | | | | | | | | | | | | | | This patch adds netns support for ipset. Major changes were made in ip_set_core.c and ip_set.h. Global variables are moved to per net namespace. Added initialization code and the destruction of the network namespace ipset subsystem. In the prototypes of public functions ip_set_* added parameter "struct net*". The remaining corrections related to the change prototypes of public functions ip_set_*. The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347 Signed-off-by: Vitaly Lavrov <lve@guap.ru> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netfilter: ipset: Support comments for ipset entries in the core.Oliver Smith2013-09-231-0/+14
| | | | | | | | | | | | | This adds the core support for having comments on ipset entries. The comments are stored as standard null-terminated strings in dynamically allocated memory after being passed to the kernel. As a result of this, code has been added to the generic destroy function to iterate all extensions and call that extension's destroy task if the set has that extension activated, and if such a task is defined. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Generalize extensions supportJozsef Kadlecsik2013-09-071-0/+46
| | | | | Get rid of the structure based extensions and introduce a blob for the extensions. Thus we can support more extension types easily.
* Introduce new operation to get both setname and familyJozsef Kadlecsik2013-09-041-0/+17
| | | | | | | | ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating a clear error message to the user, which is not helpful.
* Validate the set family and not the set type family at swapping.Jozsef Kadlecsik2013-08-141-1/+1
| | | | Bug reported by Quentin Armitage, netfilter bugzilla id #843.
* Consistent userspace testing with nomatch flagJozsef Kadlecsik2013-07-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test.
* set match: add support to match the countersJozsef Kadlecsik2013-04-091-1/+1
| | | | | | | | | | The new revision of the set match supports to match the counters and to suppress updating the counters at matching too. At the set:list types, the updating of the subcounters can be suppressed as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Introduce the counter extension in the coreJozsef Kadlecsik2013-04-091-0/+10
| | | | Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Introduce extensions to elements in the coreJozsef Kadlecsik2013-04-091-7/+17
| | | | | | | Introduce extensions to elements in the core and prepare timeout as the first one. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Add a compatibility header file for easier maintenanceJozsef Kadlecsik2013-04-091-38/+16
| | | | | | | Unfortunately not everything could be moved there, there are still compatibility ifdefs in some other files. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* "Directory not empty" error message (reported by John Brendler)Jozsef Kadlecsik2013-02-211-1/+2
| | | | | | | | | When an entry flagged with "nomatch" was tested by ipset, it returned the error message "Kernel error received: Directory not empty" instead of "<element> is NOT in set <setname>". The internal error code was not properly transformed before returning to userspace, fixed.
* Make sure ip_set_max isn't set to IPSET_INVALID_IDJozsef Kadlecsik2012-11-271-1/+1
|
* Add ipset package version to external module descriptionJozsef Kadlecsik2012-11-271-1/+6
|
* Backport RCU handling up to 2.6.32.xJozsef Kadlecsik2012-11-271-0/+8
| | | | __rcu and rcu_dereference_protected is missing from older kernel releases.
* Netlink pid is renamed to portid in kernel 3.7.0Jozsef Kadlecsik2012-11-261-10/+16
| | | | | | Handle the renaming of the netlink_skb_parms structure member. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* Fix RCU handling when the number of maximal sets are increasedJozsef Kadlecsik2012-11-261-83/+117
| | | | | | Eric Dumazet spotted that RCU handling was far incomplete in the patch which added the support of increasing the number of maximal sets automatically. This patch completes the RCU handling of the ip_set_list array of the sets.
* Increase the number of maximal sets automatically as neededJozsef Kadlecsik2012-11-191-8/+51
| | | | | The max number of sets was hardcoded at kernel cofiguration time. The patch adds the support to increase the max number of sets automatically.
* Support to match elements marked with "nomatch" in hash:*net* setsJozsef Kadlecsik2012-09-211-0/+6
| | | | | | | | | | | | | | | Exceptions can now be matched and we can branch according to the possible cases: a. match in the set if the element is not flagged as "nomatch" b. match in the set if the element is flagged with "nomatch" c. no match i.e. iptables ... -m set --match-set ... -j ... iptables ... -m set --match-set ... --nomatch-entries -j ... ...
* Coding style fixesJozsef Kadlecsik2012-09-111-3/+6
|
* net: cleanup unsigned to unsigned intEric Dumazet2012-09-081-3/+3
| | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipset: Handle properly an IPSET_CMD_NONETomasz Bursztyka2012-06-291-0/+12
| | | | | Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* netlink: add netlink_dump_control structure for netlink_dump_start()Pablo Neira Ayuso2012-05-101-2/+12
| | | | Backport of Pablo's patch to the ipset package.
* ipset: Stop using NLA_PUT*().David S. Miller2012-05-101-19/+24
| | | | | | | These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
* Invert the logic to include version.h in ip_set_core.cJozsef Kadlecsik2011-09-151-1/+1
|
* Fix compiling ipset as external kernel modulesJozsef Kadlecsik2011-09-061-1/+1
|