summaryrefslogtreecommitdiffstats
path: root/src/netlink.c
Commit message (Collapse)AuthorAgeFilesLines
...
* cache: add hashtable cache for flowtablePablo Neira Ayuso2021-05-021-1/+1
| | | | | | | | | | Add flowtable hashtable cache. Actually I am not expecting that many flowtables to benefit from the hashtable to be created by streamline this code with tables, chains, sets and policy objects. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: add hashtable cache for objectPablo Neira Ayuso2021-05-021-19/+0
| | | | | | | | | | | | | | | | | | | | This patch adds a hashtable for object lookups. This patch also splits table->objs in two: - Sets that reside in the cache are stored in the new tables->cache_obj and tables->cache_obj_ht. - Set that defined via command line / ruleset file reside in tables->obj. Sets in the cache (already in the kernel) are not placed in the table->objs list. By keeping separated lists, objs defined via command line / ruleset file can be added to cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: pass chain name to chain_cache_find()Pablo Neira Ayuso2021-05-021-1/+1
| | | | | | | | You can identify chains through the unique handle in deletions, update this interface to take a string instead of the handle to prepare for the introduction of 64-bit handle chain lookups. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: add hashtable cache for setsPablo Neira Ayuso2021-04-031-31/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a hashtable for set lookups. This patch also splits table->sets in two: - Sets that reside in the cache are stored in the new tables->cache_set and tables->cache_set_ht. - Set that defined via command line / ruleset file reside in tables->set. Sets in the cache (already in the kernel) are not placed in the table->sets list. By keeping separated lists, sets defined via command line / ruleset file can be added to cache. Adding 10000 sets, before: # time nft -f x real 0m6,415s user 0m3,126s sys 0m3,284s After: # time nft -f x real 0m3,949s user 0m0,743s sys 0m3,205s Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: use chain hashtable for lookupsPablo Neira Ayuso2021-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | Instead of the linear list lookup. Before this patch: real 0m21,735s user 0m20,329s sys 0m1,384s After: real 0m10,910s user 0m9,448s sys 0m1,434s chain_lookup() is removed since linear list lookups are only used by the fuzzy chain name matching for error reporting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: don't crash when set elements are not evaluated as expectedFlorian Westphal2021-04-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | define foo = 2001:db8:123::/48 table inet filter { set foo { typeof ip6 saddr elements = $foo } } gives crash. This now exits with: stdin:1:14-30: Error: Unexpected initial set type prefix define foo = 2001:db8:123::/48 ^^^^^^^^^^^^^^^^^ For literals, bison parser protects us, as it enforces 'elements = { 2001:... '. For 'elements = $foo' we can't detect it at parsing stage as the '$foo' symbol might as well evaluate to "{ 2001, ...}" (i.e. we can't do a set element allocation). So at least detect this from set instantiaton. Signed-off-by: Florian Westphal <fw@strlen.de>
* nftables: add flags offload to flowtableFrank Wunderlich2021-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allow flags (currently only offload) in flowtables like it is stated here: https://lwn.net/Articles/804384/ tested on mt7622/Bananapi-R64 table ip filter { flowtable f { hook ingress priority filter + 1 devices = { lan3, lan0, wan } flags offload; } chain forward { type filter hook forward priority filter; policy accept; ip protocol { tcp, udp } flow add @f } } table ip nat { chain post { type nat hook postrouting priority filter; policy accept; oifname "wan" masquerade } } Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* table: support for the table owner flagPablo Neira Ayuso2021-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Add new flag to allow userspace process to own tables: Tables that have an owner can only be updated/destroyed by the owner. The table is destroyed either if the owner process calls nft_ctx_free() or owner process is terminated (implicit table release). The ruleset listing includes the program name that owns the table: nft> list ruleset table ip x { # progname nft flags owner chain y { type filter hook input priority filter; policy accept; counter packets 1 bytes 309 } } Original code to pretty print the netlink portID to program name has been extracted from the conntrack userspace utility. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* trace: do not remove icmp type from packet dumpFlorian Westphal2021-02-081-1/+3
| | | | | | | | | | | | | | | | | As of 0.9.8 the icmp type is marked as a protocol field, so its elided in 'nft monitor trace' output: icmp code 0 icmp id 44380 .. Restore it. Unlike tcp, where 'tcp sport' et. al in the dump will make the 'ip protocol tcp' redundant this case isn't obvious in the icmp case: icmp type 8 code 0 id ... Reported-by: Martin Gignac <martin.gignac@gmail.com> Fixes: 98b871512c4677 ("src: add auto-dependencies for ipv4 icmp") Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add set element multi-statement supportPablo Neira Ayuso2020-12-181-5/+61
| | | | | | | | Extend the set element infrastructure to support for several statements. This patch places the statements right after the key when printing it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add support for multi-statement in dynamic sets and mapsPablo Neira Ayuso2020-12-171-0/+1
| | | | | | | | This patch allows for two statements for dynamic set updates, e.g. nft rule x y add @y { ip daddr limit rate 1/second counter } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* nft: trace: print packet unconditionallyFlorian Westphal2020-12-121-4/+4
| | | | | | | | | | | The kernel includes the packet dump once for each base hook. This means that in case a table contained no matching rule(s), the packet dump will be included in the base policy dump. Simply move the packet dump request out of the switch statement so the debug output shows current packet even with no matched rule. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: report EPERM for non-root usersPablo Neira Ayuso2020-12-041-1/+1
| | | | | | | | | $ /usr/sbin/nft list ruleset Operation not permitted (you must be root) Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1372 Acked-by: Arturo Borrero Gonzalez <arturo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add comment support for chainsJose M. Guisado Gomez2020-09-301-0/+32
| | | | | | | | | | | | | | | | | | | | This patch enables the user to specify a comment when adding a chain. Relies on kernel space supporting userdata for chains. > nft add table ip filter > nft add chain ip filter input { comment "test"\; type filter hook input priority 0\; policy accept\; } > list ruleset table ip filter { chain input { comment "test" type filter hook input priority filter; policy accept; } } Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add comment support for objectsJose M. Guisado Gomez2020-09-081-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enables specifying an optional comment when declaring named objects. The comment is to be specified inside the object's block ({} block) Relies on libnftnl exporting nftnl_obj_get_data and kernel space support to store the comments. For consistency, this patch makes the comment be printed first when listing objects. Adds a testcase importing all commented named objects except for secmark, although it's supported. Example: Adding a quota with a comment > add table inet filter > nft add quota inet filter q { over 1200 bytes \; comment "test_comment"\; } > list ruleset table inet filter { quota q { comment "test_comment" over 1200 bytes } } Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add comment support when adding tablesJose M. Guisado Gomez2020-08-281-0/+32
| | | | | | | | | | | | | | | | | | | Adds userdata building logic if a comment is specified when creating a new table. Adds netlink userdata parsing callback function. Relies on kernel supporting userdata for nft_table. Example: > nft add table ip x { comment "test"\; } > nft list ruleset table ip x { comment "test" } Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add chain hashtable cachePablo Neira Ayuso2020-08-261-46/+0
| | | | | | | | | | | | | | | | | | | | | | | | This significantly improves ruleset listing time with large rulesets (~50k rules) with _lots_ of non-base chains. # time nft list ruleset &> /dev/null Before this patch: real 0m11,172s user 0m6,810s sys 0m4,220s After this patch: real 0m4,747s user 0m0,802s sys 0m3,912s This patch also removes list_bindings from netlink_ctx since there is no need to keep a temporary list of chains anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* nftables: dump raw element info from libnftnl when netlink debugging is onFlorian Westphal2020-08-201-2/+38
| | | | | | | | | | | | | | | | Example: nft --debug=netlink list ruleset inet firewall @knock_candidates_ipv4 element 0100007f 00007b00 : 0 [end] element 0200007f 0000f1ff : 0 [end] element 0100007f 00007a00 : 0 [end] inet firewall @__set0 element 00000100 : 0 [end] element 00000200 : 0 [end] inet firewall knock-input 3 [ meta load l4proto => reg 1 ] ... Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add comment support for set declarationsJose M. Guisado Gomez2020-08-121-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow users to add a comment when declaring a named set. Adds set output handling the comment in both nftables and json format. $ nft add table ip x $ nft add set ip x s {type ipv4_addr\; comment "some_addrs"\; elements = {1.1.1.1, 1.2.3.4}} $ nft list ruleset table ip x { set s { type ipv4_addr; comment "some_addrs" elements = { 1.1.1.1, 1.2.3.4 } } } $ nft --json list ruleset { "nftables": [ { "metainfo": { "json_schema_version": 1, "release_name": "Capital Idea #2", "version": "0.9.6" } }, { "table": { "family": "ip", "handle": 4857, "name": "x" } }, { "set": { "comment": "some_addrs", "elem": [ "1.1.1.1", "1.2.3.4" ], "family": "ip", "handle": 1, "name": "s", "table": "x", "type": "ipv4_addr" } } ] } Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove cache lookups after the evaluation phasePablo Neira Ayuso2020-07-291-2/+2
| | | | | | | | | | | | This patch adds a new field to the cmd structure for elements to store a reference to the set. This saves an extra lookup in the netlink bytecode generation step. This patch also allows to incrementally update during the evaluation phase according to the command actions, which is required by the follow up ("evaluate: remove table from cache on delete table") bugfix patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: fix concat range expansion in map caseFlorian Westphal2020-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maps with range + concatenation do not work: Input to nft -f: map map_test_concat_interval { type ipv4_addr . ipv4_addr : mark flags interval elements = { 192.168.0.0/24 . 192.168.0.0/24 : 1, 192.168.0.0/24 . 10.0.0.1 : 2, 192.168.1.0/24 . 10.0.0.1 : 3, 192.168.0.0/24 . 192.168.1.10 : 4, } } nft list: map map_test_concat_interval { type ipv4_addr . ipv4_addr : mark flags interval elements = { 192.168.0.0 . 192.168.0.0-10.0.0.1 : 0x00000002, 192.168.1.0-192.168.0.0 . 10.0.0.1-192.168.1.10 : 0x00000004 } } This is not a display bug, nft sends broken information to kernel. Use the correct key expression to fix this. Fixes: 8ac2f3b2fca3 ("src: Add support for concatenated set ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
* src: support for implicit chain bindingsPablo Neira Ayuso2020-07-151-17/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to group rules in a subchain, e.g. table inet x { chain y { type filter hook input priority 0; tcp dport 22 jump { ip saddr { 127.0.0.0/8, 172.23.0.0/16, 192.168.13.0/24 } accept ip6 saddr ::1/128 accept; } } } This also supports for the `goto' chain verdict. This patch adds a new chain binding list to avoid a chain list lookup from the delinearize path for the usual chains. This can be simplified later on with a single hashtable per table for all chains. From the shell, you have to use the explicit separator ';', in bash you have to escape this: # nft add rule inet x y tcp dport 80 jump { ip saddr 127.0.0.1 accept\; ip6 saddr ::1 accept \; } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: fix netlink_get_setelem() memleaksPablo Neira Ayuso2020-05-061-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ==26693==ERROR: LeakSanitizer: detected memory leaks Direct leak of 256 byte(s) in 2 object(s) allocated from: #0 0x7f6ce2189330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7f6ce1b1767a in xmalloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:36 #2 0x7f6ce1b177d3 in xzalloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:65 #3 0x7f6ce1a41760 in expr_alloc /home/pablo/devel/scm/git-netfilter/nftables/src/expression.c:45 #4 0x7f6ce1a4dea7 in set_elem_expr_alloc /home/pablo/devel/scm/git-netfilter/nftables/src/expression.c:1278 #5 0x7f6ce1ac2215 in netlink_delinearize_setelem /home/pablo/devel/scm/git-netfilter/nftables/src/netlink.c:1094 #6 0x7f6ce1ac3c16 in list_setelem_cb /home/pablo/devel/scm/git-netfilter/nftables/src/netlink.c:1172 #7 0x7f6ce0198808 in nftnl_set_elem_foreach /home/pablo/devel/scm/git-netfilter/libnftnl/src/set_elem.c:725 Indirect leak of 256 byte(s) in 2 object(s) allocated from: #0 0x7f6ce2189330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7f6ce1b1767a in xmalloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:36 #2 0x7f6ce1b177d3 in xzalloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:65 #3 0x7f6ce1a41760 in expr_alloc /home/pablo/devel/scm/git-netfilter/nftables/src/expression.c:45 #4 0x7f6ce1a4515d in constant_expr_alloc /home/pablo/devel/scm/git-netfilter/nftables/src/expression.c:388 #5 0x7f6ce1abaf12 in netlink_alloc_value /home/pablo/devel/scm/git-netfilter/nftables/src/netlink.c:354 #6 0x7f6ce1ac17f5 in netlink_delinearize_setelem /home/pablo/devel/scm/git-netfilter/nftables/src/netlink.c:1080 #7 0x7f6ce1ac3c16 in list_setelem_cb /home/pablo/devel/scm/git-netfilter/nftables/src/netlink.c:1172 #8 0x7f6ce0198808 in nftnl_set_elem_foreach /home/pablo/devel/scm/git-netfilter/libnftnl/src/set_elem.c:725 Indirect leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f6ce2189720 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9720) #1 0x7f6ce1b1778d in xrealloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:55 #2 0x7f6ce1b1756d in gmp_xrealloc /home/pablo/devel/scm/git-netfilter/nftables/src/gmputil.c:202 #3 0x7f6ce1417059 in __gmpz_realloc (/usr/lib/x86_64-linux-gnu/libgmp.so.10+0x23059) Indirect leak of 8 byte(s) in 1 object(s) allocated from: #0 0x7f6ce2189330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7f6ce1b1767a in xmalloc /home/pablo/devel/scm/git-netfilter/nftables/src/utils.c:36 #2 0x7f6ce14105c5 in __gmpz_init2 (/usr/lib/x86_64-linux-gnu/libgmp.so.10+0x1c5c5) SUMMARY: AddressSanitizer: 536 byte(s) leaked in 6 allocation(s). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* nat: transform range to prefix expression when possiblePablo Neira Ayuso2020-04-301-2/+2
| | | | | | | This patch transform a range of IP addresses to prefix when listing the ruleset. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: NAT support for intervals in mapsPablo Neira Ayuso2020-04-281-1/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to specify an interval of IP address in maps. table ip x { chain y { type nat hook postrouting priority srcnat; policy accept; snat ip interval to ip saddr map { 10.141.11.4 : 192.168.2.2-192.168.2.4 } } } The example above performs SNAT to packets that comes from 10.141.11.4 to an interval of IP addresses from 192.168.2.2 to 192.168.2.4 (both included). You can also combine this with dynamic maps: table ip x { map y { type ipv4_addr : interval ipv4_addr flags interval elements = { 10.141.10.0/24 : 192.168.2.2-192.168.2.4 } } chain y { type nat hook postrouting priority srcnat; policy accept; snat ip interval to ip saddr map @y } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: Show the handles of unknown rules in "nft monitor trace"Luis Ressel2020-04-011-15/+27
| | | | | | | | | | | | When "nft monitor trace" doesn't know a rule (because it was only added to the ruleset after nft was invoked), that rule is silently omitted in the trace output, which can come as a surprise when debugging issues. Instead, we can at least show the information we got via netlink, i.e. the family, table and chain name, rule handle and verdict. Signed-off-by: Luis Ressel <aranea@aixah.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* rule: add hook_specPablo Neira Ayuso2020-03-311-4/+4
| | | | | | Store location of chain hook definition. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add support for flowtable counterPablo Neira Ayuso2020-03-261-0/+2
| | | | | | | | | | | | | | | | | | Allow users to enable flow counters via control plane toggle, e.g. table ip x { flowtable y { hook ingress priority 0; counter; } chain z { type filter hook ingress priority filter; flow add @z } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for counter in set definitionPablo Neira Ayuso2020-03-201-0/+7
| | | | | | | | | | | | | | | | | | | | | This patch allows you to turn on counter for each element in the set. table ip x { set y { typeof ip saddr counter elements = { 192.168.10.35, 192.168.10.101, 192.168.10.135 } } chain z { type filter hook output priority filter; policy accept; ip daddr @y } } This example shows how to turn on counters globally in the set 'y'. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for restoring element countersPablo Neira Ayuso2020-03-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to restore counters in dynamic sets: table ip test { set test { type ipv4_addr size 65535 flags dynamic,timeout timeout 30d gc-interval 1d elements = { 192.168.10.13 expires 19d23h52m27s576ms counter packets 51 bytes 17265 } } chain output { type filter hook output priority 0; update @test { ip saddr } } } You can also add counters to elements from the control place, ie. table ip test { set test { type ipv4_addr size 65535 elements = { 192.168.2.1 counter packets 75 bytes 19043 } } chain output { type filter hook output priority filter; policy accept; ip daddr @test } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for offload chain flagPablo Neira Ayuso2020-03-031-0/+2
| | | | | | | | | | | | | | | | | | | | This patch extends the basechain definition to allow users to specify the offload flag. This flag enables hardware offload if your drivers supports it. # cat file.nft table netdev x { chain y { type filter hook ingress device eth0 priority 10; flags offload; } } # nft -f file.nft Note: You have to enable offload via ethtool: # ethtool -K eth0 hw-tc-offload on Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: handle concatenations on set elements mappingsFlorian Westphal2020-02-241-0/+7
| | | | | | | | | | We can already handle concatenated keys, this extends concat coverage to the data type as well, i.e. this can be dissected: type ipv4_addr : ipv4_addr . inet_service Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Add support for concatenated set rangesStefano Brivio2020-02-071-28/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After exporting field lengths via NFTNL_SET_DESC_CONCAT attributes, we now need to adjust parsing of user input and generation of netlink key data to complete support for concatenation of set ranges. Instead of using separate elements for start and end of a range, denoting the end element by the NFT_SET_ELEM_INTERVAL_END flag, as it's currently done for ranges without concatenation, we'll use the new attribute NFTNL_SET_ELEM_KEY_END as suggested by Pablo. It behaves in the same way as NFTNL_SET_ELEM_KEY, but it indicates that the included key represents the upper bound of a range. For example, "packets with an IPv4 address between 192.0.2.0 and 192.0.2.42, with destination port between 22 and 25", needs to be expressed as a single element with two keys: NFTA_SET_ELEM_KEY: 192.0.2.0 . 22 NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25 To achieve this, we need to: - adjust the lexer rules to allow multiton expressions as elements of a concatenation. As wildcards are not allowed (semantics would be ambiguous), exclude wildcards expressions from the set of possible multiton expressions, and allow them directly where needed. Concatenations now admit prefixes and ranges - generate, for each element in a range concatenation, a second key attribute, that includes the upper bound for the range - also expand prefixes and non-ranged values in the concatenation to ranges: given a set with interval and concatenation support, the kernel has no way to tell which elements are ranged, so they all need to be. For example, 192.0.2.0 . 192.0.2.9 : 1024 is sent as: NFTA_SET_ELEM_KEY: 192.0.2.0 . 1024 NFTA_SET_ELEM_KEY_END: 192.0.2.9 . 1024 - aggregate ranges when elements received by the kernel represent concatenated ranges, see concat_range_aggregate() - perform a few minor adjustments where interval expressions are already handled: we have intervals in these sets, but the set specification isn't just an interval, so we can't just aggregate and deaggregate interval ranges linearly v4: No changes v3: - rework to use a separate key for closing element of range instead of a separate element with EXPR_F_INTERVAL_END set (Pablo Neira Ayuso) v2: - reworked netlink_gen_concat_data(), moved loop body to a new function, netlink_gen_concat_data_expr() (Phil Sutter) - dropped repeated pattern in bison file, replaced by a new helper, compound_expr_alloc_or_add() (Phil Sutter) - added set_is_nonconcat_range() helper (Phil Sutter) - in expr_evaluate_set(), we need to set NFT_SET_SUBKEY also on empty sets where the set in the context already has the flag - dropped additional 'end' parameter from netlink_gen_data(), temporarily set EXPR_F_INTERVAL_END on expressions and use that from netlink_gen_concat_data() to figure out we need to add the 'end' element (Phil Sutter) - replace range_mask_len() by a simplified version, as we don't need to actually store the composing masks of a range (Phil Sutter) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Add support for NFTNL_SET_DESC_CONCATStefano Brivio2020-02-071-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | To support arbitrary range concatenations, the kernel needs to know how long each field in the concatenation is. The new libnftnl NFTNL_SET_DESC_CONCAT set attribute describes this as an array of lengths, in bytes, of concatenated fields. While evaluating concatenated expressions, export the datatype size into the new field_len array, and hand the data over via libnftnl. Similarly, when data is passed back from libnftnl, parse it into the set description. When set data is cloned, we now need to copy the additional fields in set_clone(), too. This change depends on the libnftnl patch with title: set: Add support for NFTA_SET_DESC_CONCAT attributes v4: No changes v3: Rework to use set description data instead of a stand-alone attribute v2: No changes Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add "typeof" build/parse/print supportFlorian Westphal2019-12-171-13/+116
| | | | | | | | | | | | | | | | | | | | This patch adds two new expression operations to build and to parse the userdata area that describe the set key and data typeof definitions. For maps, the grammar enforces either "type data_type : data_type" or or "typeof expression : expression". Check both key and data for valid user typeof info first. If they check out, flag set->key_typeof_valid as true and use it for printing the key info. This patch comes with initial support for using payload expressions with the 'typeof' keyword, followup patches will add support for other expressions as well. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: store expr, not dtype to track data in setsFlorian Westphal2019-12-161-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be needed once we add support for the 'typeof' keyword to handle maps that could e.g. store 'ct helper' "type" values. Instead of: set foo { type ipv4_addr . mark; this would allow set foo { typeof(ip saddr) . typeof(ct mark); (exact syntax TBD). This would be needed to allow sets that store variable-sized data types (string, integer and the like) that can't be used at at the moment. Adding special data types for everything is problematic due to the large amount of different types needed. For anonymous sets, e.g. "string" can be used because the needed size can be inferred from the statement, e.g. 'osf name { "Windows", "Linux }', but in case of named sets that won't work because 'type string' lacks the context needed to derive the size information. With 'typeof(osf name)' the context is there, but at the moment it won't help because the expression is discarded instantly and only the data type is retained. Signed-off-by: Florian Westphal <fw@strlen.de>
* segtree: don't remove nul-root element from interval setPablo Neira Ayuso2019-12-091-2/+5
| | | | | | | | | | | | | Check from the delinearize set element path if the nul-root element already exists in the interval set. Hence, the element insertion path skips the implicit nul-root interval insertion. Under some circunstances, nft bogusly fails to delete the last element of the interval set and to create an element in an existing empty internal set. This patch includes a test that reproduces the issue. Fixes: 4935a0d561b5 ("segtree: special handling for the first non-matching segment") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: off-by-one write in netdev chain device arrayPablo Neira Ayuso2019-12-021-2/+2
| | | | | | | | | | | | | | | | | | ==728473== Invalid write of size 8 ==728473== at 0x48960F2: netlink_delinearize_chain (netlink.c:422) ==728473== by 0x4896252: list_chain_cb (netlink.c:459) ==728473== by 0x4896252: list_chain_cb (netlink.c:441) ==728473== by 0x4F2C654: nftnl_chain_list_foreach (chain.c:1011) ==728473== by 0x489629F: netlink_list_chains (netlink.c:478) ==728473== by 0x4882303: cache_init_objects (rule.c:177) ==728473== by 0x4882303: cache_init (rule.c:222) ==728473== by 0x4882303: cache_update (rule.c:272) ==728473== by 0x48A7DCE: nft_evaluate (libnftables.c:408) ==728473== by 0x48A86D9: nft_run_cmd_from_buffer (libnftables.c:449) ==728473== by 0x10A5D6: main (main.c:338) Fixes: 3fdc7541fba0 ("src: add multidevice support for netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: flowtable: add support for delete command by handleEric Jallot2019-11-061-0/+2
| | | | | | | Also, display handle when listing with '-a'. Signed-off-by: Eric Jallot <ejallot@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: flowtable: add support for named flowtable listingEric Jallot2019-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | This patch allows you to dump a named flowtable. # nft list flowtable inet t f table inet t { flowtable f { hook ingress priority filter + 10 devices = { eth0, eth1 } } } Also: libnftables-json.adoc: fix missing quotes. Fixes: db0697ce7f60 ("src: support for flowtable listing") Fixes: 872f373dc50f ("doc: Add JSON schema documentation") Signed-off-by: Eric Jallot <ejallot@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add multidevice support for netdev chainPablo Neira Ayuso2019-10-301-3/+17
| | | | | | | | | | | | | | This patch allows you to specify multiple netdevices to be bound to the netdev basechain, eg. # nft add chain netdev x y { \ type filter hook ingress devices = { eth0, eth1 } priority 0\; } json codebase has been updated to support for one single device with the existing representation, no support for multidevice is included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add synproxy stateful object supportFernando Fernandez Mancera2019-09-131-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Add support for "synproxy" stateful object. For example (for TCP port 80 and using maps with saddr): table ip foo { synproxy https-synproxy { mss 1460 wscale 7 timestamp sack-perm } synproxy other-synproxy { mss 1460 wscale 5 } chain bar { tcp dport 80 synproxy name "https-synproxy" synproxy name ip saddr map { 192.168.1.0/24 : "https-synproxy", 192.168.2.0/24 : "other-synproxy" } } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: fix jumps on bigendian archesFlorian Westphal2019-08-141-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | table bla { chain foo { } chain bar { jump foo } } } Fails to restore on big-endian platforms: jump.nft:5:2-9: Error: Could not process rule: No such file or directory jump foo nft passes a 0-length name to the kernel. This is because when we export the value (the string), we provide the size of the destination buffer. In earlier versions, the parser allocated the name with the same fixed size and all was fine. After the fix, the export places the name in the wrong location in the destination buffer. This makes tests/shell/testcases/chains/0001jumps_0 work on s390x. v2: convert one error check to a BUG(), it should not happen unless kernel abi is broken. Fixes: 142350f154c78 ("src: invalid read when importing chain name") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow variable in chain policyFernando Fernandez Mancera2019-08-081-1/+7
| | | | | | | | | | | | This patch allows you to use variables in chain policy definition, e.g. define default_policy = "accept" add table ip foo add chain ip foo bar {type filter hook input priority filter; policy $default_policy} Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow variables in the chain priority specificationFernando Fernandez Mancera2019-08-081-5/+17
| | | | | | | | | | | | | | | | | This patch allows you to use variables in chain priority definitions, e.g. define prio = filter define prionum = 10 define prioffset = "filter - 150" add table ip foo add chain ip foo bar { type filter hook input priority $prio; } add chain ip foo ber { type filter hook input priority $prionum; } add chain ip foo bor { type filter hook input priority $prioffset; } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add ct expectations supportStéphane Veyret2019-07-161-0/+12
| | | | | | | This modification allow to directly add/list/delete expectations. Signed-off-by: Stéphane Veyret <sveyret@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add set_is_datamap(), set_is_objmap() and set_is_map() helpersPablo Neira Ayuso2019-07-161-6/+6
| | | | | | | | | | | | | Two map types are currently possible: * data maps, ie. set_is_datamap(). * object maps, ie. set_is_objmap(). This patch adds helper functions to check for the map type. set_is_map() allows you to check for either map type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: enable set expiration date for set elementsLaura Garcia Liebana2019-06-281-0/+3
| | | | | | | | | | | | | | | | Currently, the expiration of every element in a set or map is a read-only parameter generated at kernel side. This change will permit to set a certain expiration date per element that will be required, for example, during stateful replication among several nodes. This patch will enable the _expires_ input parameter in the parser and propagate NFTNL_SET_ELEM_EXPIRATION in order to send the configured value. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: remove netlink_list_table()Pablo Neira Ayuso2019-06-171-6/+1
| | | | | | Remove this wrapper, call netlink_list_rules() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add reference counter for dynamic datatypesPablo Neira Ayuso2019-06-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two datatypes are using runtime datatype allocation: * Concatenations. * Integer, that require byteorder adjustment. From the evaluation / postprocess step, transformations are common, hence expressions may end up fetching (infering) datatypes from an existing one. This patch adds a reference counter to release the dynamic datatype object when it is shared. The API includes the following helper functions: * datatype_set(expr, datatype), to assign a datatype to an expression. This helper already deals with reference counting for dynamic datatypes. This also drops the reference counter of any previous datatype (to deal with the datatype replacement case). * datatype_get(datatype) bumps the reference counter. This function also deals with nul-pointers, that occurs when the datatype is unset. * datatype_free() drops the reference counter, and it also releases the datatype if there are not more clients of it. Rule of thumb is: The reference counter of any newly allocated datatype is set to zero. This patch also updates every spot to use datatype_set() for non-dynamic datatypes, for consistency. In this case, the helper just makes an simple assignment. Note that expr_alloc() has been updated to call datatype_get() on the datatype that is assigned to this new expression. Moreover, expr_free() calls datatype_free(). This fixes valgrind reports like this one: ==28352== 1,350 (440 direct, 910 indirect) bytes in 5 blocks are definitely lost in loss recor 3 of 3 ==28352== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==28352== by 0x4E79558: xmalloc (utils.c:36) ==28352== by 0x4E7963D: xzalloc (utils.c:65) ==28352== by 0x4E6029B: dtype_alloc (datatype.c:1073) ==28352== by 0x4E6029B: concat_type_alloc (datatype.c:1127) ==28352== by 0x4E6D3B3: netlink_delinearize_set (netlink.c:578) ==28352== by 0x4E6D68E: list_set_cb (netlink.c:648) ==28352== by 0x5D74023: nftnl_set_list_foreach (set.c:780) ==28352== by 0x4E6D6F3: netlink_list_sets (netlink.c:669) ==28352== by 0x4E5A7A3: cache_init_objects (rule.c:159) ==28352== by 0x4E5A7A3: cache_init (rule.c:216) ==28352== by 0x4E5A7A3: cache_update (rule.c:266) ==28352== by 0x4E7E0EE: nft_evaluate (libnftables.c:388) ==28352== by 0x4E7EADD: nft_run_cmd_from_filename (libnftables.c:479) ==28352== by 0x109A53: main (main.c:310) This patch also removes the DTYPE_F_CLONE flag which is broken and not needed anymore since proper reference counting is in place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>