nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	evaluate: validate maximum log statement prefix length	Pablo Neira Ayuso	2023-10-17	1	-1/+6
\| \| \| \| \| \| \| \|	Otherwise too long string overruns the log prefix buffer. Fixes: e76bb3794018 ("src: allow for variables in the log prefix string") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1714 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: suggest != in negation error message	Florian Westphal	2023-10-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	when I run sudo nft insert rule filter FORWARD iifname "ens2f1" ip saddr not @ip_macs counter drop comment \" BLOCK ALL NON REGISTERED IP/MACS \" I get: Error: negation can only be used with singleton bitmask values And even I did not spot the problem immediately. I don't think "not" should have been added, its easily confused with "not equal"/"neq"/!= and hides that this is allegedly a binop. At least mention that the commandline is asking for a binary operation here and suggest "!=". Signed-off-by: Florian Westphal <fw@strlen.de>
*	include: include <string.h> in <nft.h>	Thomas Haller	2023-09-28	1	-1/+0
\| \| \| \| \| \| \| \|	<string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use "enum byteorder" instead of int in set_datatype_alloc()	Thomas Haller	2023-09-20	1	-1/+1
\| \| \| \| \| \| \|	Use the enum types as we have them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: perform mark datatype compatibility check from maps	Pablo Neira Ayuso	2023-09-19	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Wrap datatype compatibility check into a helper function and use it for map evaluation, otherwise the following bogus error message is displayed: Error: datatype mismatch, map expects packet mark, mapping expression has type integer Add unit tests to improve coverage for this usecase. Fixes: 5d8e33ddb112 ("evaluate: relax type-checking for integer arguments in mark statements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: expand sets and maps before evaluation	Pablo Neira Ayuso	2023-09-19	1	-17/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3975430b12d9 ("src: expand table command before evaluation") moved ruleset expansion before evaluation, except for sets and maps. For sets and maps there is still a post_expand() phase. This patch moves sets and map expansion to allocate an independent CMD_OBJ_SETELEMS command to add elements to named set and maps which is evaluated, this consolidates the ruleset expansion to happen always before the evaluation step for all objects, except for anonymous sets and maps. This approach avoids an interference with the set interval code which detects overlaps and merges of adjacents ranges. This set interval routine uses set->init to maintain a cache of existing elements. Then, the post_expand() phase incorrectly expands set->init cache and it triggers a bogus ENOENT errors due to incorrect bytecode (placing element addition before set creation) in combination with user declared sets using the flat syntax notation. Since the evaluation step (coming after the expansion) creates implicit/anonymous sets and maps, those are not expanded anymore. These anonymous sets still need to be evaluated from set_evaluate() path and the netlink bytecode generation path, ie. do_add_set(), needs to deal with anonymous sets. Note that, for named sets, do_add_set() does not use set->init. Such content is part of the existing cache, and the CMD_OBJ_SETELEMS command is responsible for adding elements to named sets. Fixes: 3975430b12d9 ("src: expand table command before evaluation") Reported-by: Jann Haber <jannh@selfnet.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix memleak in prefix evaluation with wildcard interface name	Pablo Neira Ayuso	2023-09-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following ruleset: table ip x { chain y { meta iifname { abcde*, xyz } } } triggers the following memleak: ==6871== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==6871== at 0x483877F: malloc (vg_replace_malloc.c:307) ==6871== by 0x48AD898: xmalloc (utils.c:37) ==6871== by 0x4BC8B22: __gmpz_init2 (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.4.1) ==6871== by 0x4887E67: constant_expr_alloc (expression.c:424) ==6871== by 0x488EF1F: expr_evaluate_prefix (evaluate.c:1138) ==6871== by 0x488EF1F: expr_evaluate (evaluate.c:2725) ==6871== by 0x488E76D: expr_evaluate_set_elem (evaluate.c:1662) ==6871== by 0x488E76D: expr_evaluate (evaluate.c:2739) ==6871== by 0x4891033: list_member_evaluate (evaluate.c:1454) ==6871== by 0x488E2B6: expr_evaluate_set (evaluate.c:1757) ==6871== by 0x488E2B6: expr_evaluate (evaluate.c:2737) ==6871== by 0x48910D0: elems_evaluate (evaluate.c:4605) ==6871== by 0x4891432: set_evaluate (evaluate.c:4711) ==6871== by 0x48915BC: implicit_set_declaration (evaluate.c:122) ==6871== by 0x488F18A: expr_evaluate_relational (evaluate.c:2503) ==6871== by 0x488F18A: expr_evaluate (evaluate.c:2745) expr_evaluate_prefix() calls constant_expr_alloc() which have already called mpz_init2(), the second call to mpz_init2() overlaps the existing mpz_t data memory area. Remove extra mpz_init2() call to fix this memleak. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	1	-26/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: rename "dtype_clone()" to datatype_clone()	Thomas Haller	2023-09-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: fix get element for concatenated set	Florian Westphal	2023-09-06	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	given: table ip filter { set test { type ipv4_addr . ether_addr . mark flags interval elements = { 198.51.100.0/25 . 00:0b:0c:ca:cc:10-c1:a0:c1:cc:10:00 . 0x0000006f, } } } We get lookup failure: nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f } Error: Could not process rule: No such file or directory Its possible to work around this via dummy range somewhere in the key, e.g. nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f-0x6f } but that shouldn't be needed, so make sure the INTERVAL flag is enabled for the queried element if the set is of interval type. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: revisit anonymous set with single element optimization	Pablo Neira Ayuso	2023-09-02	1	-20/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reworks it to perform this optimization from the evaluation step of the relational expression. Hence, when optimizing for protocol flags, use OP_EQ instead of OP_IMPLICIT, that is: tcp flags { syn } becomes (to represent an exact match): tcp flags == syn given OP_IMPLICIT and OP_EQ are not equivalent for flags. 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") disabled this optimization, which is enabled again after this patch. Fixes: 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: place byteorder conversion after numgen for IP address datatypes	Jorge Ortiz	2023-09-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The numgen extension generates numbers in little-endian. This can be very tricky when trying to combine it with IP addresses, which use big endian. This change adds a new byteorder operation to convert data type endianness. Before this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ nat snat ip addr_min reg 1 ] After this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ byteorder reg 1 = hton(reg 1, 4, 4) ] [ nat snat ip addr_min reg 1 ] Regression tests have been modified to include these new cases. Signed-off-by: Jorge Ortiz Escribano <jorge.ortiz.escribano@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: simplify chain_alloc()	Pablo Neira Ayuso	2023-08-31	1	-1/+1
\| \| \| \| \| \| \|	Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: do not remove anonymous set with protocol flags and single element	Pablo Neira Ayuso	2023-08-30	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set lookups with flags search for an exact match, however: tcp flags { syn } gets transformed into: tcp flags syn which is matching on the syn flag only (non-exact match). This optimization is safe for ct state though, because only one bit is ever set on in the ct state bitmask. Since protocol flags allow for combining flags, skip this optimization to retain exact match semantics. Another possible solution is to turn OP_IMPLICIT into OP_EQ for exact flag match to re-introduce this optimization and deal with this corner case. Fixes: fee6bda06403 ("evaluate: remove anon sets with exactly one element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: don't needlessly clear full string buffer in ↵	Thomas Haller	2023-08-29	1	-1/+4
\| \| \| \| \| \| \|	stmt_evaluate_log_prefix() Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rework SNPRINTF_BUFFER_SIZE() and handle truncation	Thomas Haller	2023-08-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, the macro asserts against truncation. This is despite the callers still checked for truncation and tried to handle it. Probably for good reason. With stmt_evaluate_log_prefix() it's not clear that the code ensures that truncation cannot happen, so we must not assert against it, but handle it. Also, - wrap the macro in "do { ... } while(0)" to make it more function-like. - evaluate macro arguments exactly once, to make it more function-like. - take pointers to the arguments that are being modified. - use assert() instead of abort(). - use size_t type for arguments related to the buffer size. - drop "size". It was mostly redundant to "offset". We can know everything we want based on "len" and "offset" alone. - "offset" previously was incremented before checking for truncation. So it would point somewhere past the buffer. This behavior does not seem useful. Instead, on truncation "len" will be zero (as before) and "offset" will point one past the buffer (one past the terminating NUL). Thereby, also fix a warning from clang: evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size = 0; ^ meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size; ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix check for truncation in stmt_evaluate_log_prefix()	Thomas Haller	2023-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, nft crashes with prefix longer than 127 bytes: # nft add rule x y log prefix \"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\" ==159385==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffed5bf4a10 at pc 0x7f3134839269 bp 0x7ffed5bf48b0 sp 0x7ffed5bf4060 WRITE of size 129 at 0x7ffed5bf4a10 thread T0 #0 0x7f3134839268 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:778 #1 0x7f3133e3074e in __mpz_export_data /tmp/nftables/src/gmputil.c:110 #2 0x7f3133d21d3c in expr_to_string /tmp/nftables/src/expression.c:192 #3 0x7f3133ded103 in netlink_gen_log_stmt /tmp/nftables/src/netlink_linearize.c:1148 #4 0x7f3133df33a1 in netlink_gen_stmt /tmp/nftables/src/netlink_linearize.c:1682 [...] Fixes: e76bb3794018 ('src: allow for variables in the log prefix string') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Drop dead code from expr_evaluate_mapping()	Phil Sutter	2023-08-29	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 343a51702656a ("src: store expr, not dtype to track data in sets"), set->data is allocated for object maps in set_evaluate(), all other map types have set->data initialized by the parser already, set_evaluate() also checks that. Drop the confusing check, later in the function set->data is dereferenced unconditionally. Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	include: include <std{bool,int}.h> via <nft.h>	Thomas Haller	2023-08-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is always included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking	Thomas Haller	2023-08-24	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: error out on meter overlap with an existing set/map declaration	Pablo Neira Ayuso	2023-08-23	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the problems with meters is that they use the set/map infrastructure behind the scenes which might be confusing to users. This patch errors out in case user declares a meter whose name overlaps with an existing set/map: meter.nft:15:18-91: Error: File exists; meter ‘syn4-meter’ overlaps an existing set ‘syn4-meter’ in family inet tcp dport 22 meter syn4-meter { ip saddr . tcp dport timeout 5m limit rate 20/minute } counter accept ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An old 5.10 kernel bails out simply with EEXIST, with this patch a better hint is provided. Dynamic sets are preferred over meters these days. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ct expectation: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-07-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
*	Implement 'reset {set,map,element}' commands	Phil Sutter	2023-07-13	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	evaluate: Cache looked up set for list commands	Phil Sutter	2023-07-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Evaluation phase checks the given table and set exist in cache. Relieve execution phase from having to perform the lookup again by storing the set reference in cmd->set. Just have to increase the ref counter so cmd_free() does the right thing (which lacked handling of MAP and METER objects for some reason). Signed-off-by: Phil Sutter <phil@nwl.cc>
*	evaluate: Merge some cases in cmd_evaluate_list()	Phil Sutter	2023-07-13	1	-32/+4
\| \| \| \| \| \| \| \|	The code for set, map and meter were almost identical apart from the specific last check. Fold them together and make the distinction in that spot only. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	evaluate: place byteorder conversion before rshift in payload statement	Pablo Neira Ayuso	2023-07-08	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For bitfield that spans more than one byte, such as ip6 dscp, byteorder conversion needs to be done before rshift. Add unary expression for this conversion only in the case of meta and ct statements. Before this patch: # nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp' ip6 x y [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <--------- incorrect [ meta set mark with reg 1 ] After this patch: # nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp' ip6 x y [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <-------- correct [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ meta set mark with reg 1 ] For the matching case, binary transfer already deals with the rshift to adjust left and right hand side of the expression, the unary conversion is not needed in such case. Fixes: 8221d86e616b ("tests: py: add test-cases for ct and packet mark payload expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ct timeout: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-06-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<empty ruleset> $ nft list ct timeout table t Error: No such file or directory list ct timeout table t ^ This is expected to list all 'ct timeout' objects. The failure is correct, the table 't' does not exist. But now lets add one: $ nft add table t $ nft list ct timeout table t Segmentation fault (core dumped) ... and thats not expected, nothing should be shown and nft should exit normally. Because of missing TIMEOUTS command enum, the backend thinks it should do an object lookup, but as frontend asked for 'list of objects' rather than 'show this object', handle.obj.name is NULL, which then results in this crash. Update the command enums so that backend knows what the frontend asked for. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: do not abort when prefix map has non-map element	Florian Westphal	2023-06-20	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \|	Before: nft: evaluate.c:1849: __mapping_expr_expand: Assertion `i->etype == EXPR_MAPPING' failed. after: Error: expected mapping, not set element snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24, 10.141.12.1 } Signed-off-by: Florian Westphal <fw@strlen.de>
*	exthdr: add boolean DCCP option matching	Jeremy Sowden	2023-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Iptables supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend exthdr expressions to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: permit use of constant values in set lookup keys	Florian Westphal	2023-05-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Something like: Given: set s { type ipv4_addr . ipv4_addr . inet_service .. } something like add rule ip saddr . 1.2.3.4 . 80 @s goto c1 fails with: "Error: Can't parse symbolic invalid expressions". This fails because the relational expression first evaluates the left hand side, so when concat evaluation sees '1.2.3.4' no key context is available. Check if the RHS is a set reference, and, if so, evaluate the right hand side. This sets a pointer to the set key in the evaluation context structure which then makes the concat evaluation step parse 1.2.3.4 and 80 as ipv4 address and 16bit port number. On delinearization, extend relop postprocessing to copy the datatype from the rhs (set reference, has proper datatype according to set->key) to the lhs (concat expression). Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: set NFT_SET_EVAL flag if dynamic set already exists	Pablo Neira Ayuso	2023-05-18	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft reports EEXIST when reading an existing set whose NFT_SET_EVAL has been previously inferred from the ruleset. # cat test.nft table ip test { set dlist { type ipv4_addr size 65535 } chain output { type filter hook output priority filter; policy accept; udp dport 1234 update @dlist { ip daddr } counter packets 0 bytes 0 } } # nft -f test.nft # nft -f test.nft test.nft:2:6-10: Error: Could not process rule: File exists set dlist { ^^^^^ Phil Sutter says: In the first call, the set lacking 'dynamic' flag does not exist and is therefore added to the cache. Consequently, both the 'add set' command and the set statement point at the same set object. In the second call, a set with same name exists already, so the object created for 'add set' command is not added to cache and consequently not updated with the missing flag. The kernel thus rejects the NEWSET request as the existing set differs from the new one. Set on the NFT_SET_EVAL flag if the existing set sets it on. Fixes: 8d443adfcc8c1 ("evaluate: attempt to set_eval flag if dynamic updates requested") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: skip optimization if anonymous set uses stateful statement	Pablo Neira Ayuso	2023-05-10	1	-1/+1
\| \| \| \| \| \| \| \|	fee6bda06403 ("evaluate: remove anon sets with exactly one element") introduces an optimization to remove use of sets with single element. Skip this optimization if set element contains stateful statements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: allow stateful statements with anonymous verdict maps	Pablo Neira Ayuso	2023-05-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Evaluation fails to accept stateful statements in verdict maps, relax the following check for anonymous sets: test.nft:4:29-35: Error: missing statement in map declaration ip saddr vmap { 127.0.0.1 counter : drop, * counter : accept } ^^^^^^^ The existing code generates correctly the counter in the anonymous verdict map. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: bail out if new flowtable does not specify hook and priority	Pablo Neira Ayuso	2023-04-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user forgets to specify the hook and priority and the flowtable does not exist, then bail out: # cat flowtable-incomplete.nft table t { flowtable f { devices = { lo } } } # nft -f /tmp/k flowtable-incomplete.nft:2:12-12: Error: missing hook and priority in flowtable declaration flowtable f { ^ Update one existing tests/shell to specify a hook and priority. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow for updating devices on existing netdev chain	Pablo Neira Ayuso	2023-04-24	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to add/remove devices to an existing chain: # cat ruleset.nft table netdev x { chain y { type filter hook ingress devices = { eth0 } priority 0; policy accept; } } # nft -f ruleset.nft # nft add chain netdev x y '{ devices = { eth1 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth0, eth1 } priority 0; policy accept; } } # nft delete chain netdev x y '{ devices = { eth0 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth1 } priority 0; policy accept; } } This feature allows for creating an empty netdev chain, with no devices. In such case, no packets are seen until a device is registered. This patch includes extended netlink error reporting: # nft add chain netdev x y '{ devices = { x } ; }' Error: Could not process rule: No such file or directory add chain netdev x y { devices = { x } ; } ^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: bogus missing transport protocol	Pablo Neira Ayuso	2023-04-05	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users have to specify a transport protocol match such as meta l4proto tcp before the redirect statement, even if the redirect statement already implicitly refers to the transport protocol, for instance: test.nft:3:16-53: Error: transport protocol mapping is only valid after transport protocol match redirect to :tcp dport map { 83 : 8083, 84 : 8084 } ~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Evaluate the redirect expression before the mandatory check for the transport protocol match, so protocol context already provides a transport protocol. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: honor statement length in bitwise evaluation	Pablo Neira Ayuso	2023-03-28	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get length from statement, instead infering it from the expression that is used to set the value. In the particular case of {ct\|meta} mark, this is 32 bits. Otherwise, bytecode generation is not correct: # nft -c --debug=netlink 'add rule ip6 x y ct mark set ip6 dscp << 2 \| 0x10' [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 1) ] [ bitwise reg 1 = ( reg 1 << 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x00000fef ) ^ 0x00000010 ] <--- incorrect! [ ct set mark with reg 1 ] the previous bitwise shift already upgraded to 32-bits (not visible from the netlink debug output above). After this patch, the last \| 0x10 uses 32-bits: [ bitwise reg 1 = ( reg 1 & 0xffffffef ) ^ 0x00000010 ] note that mask 0xffffffef is used instead of 0x00000fef. Patch ("evaluate: support shifts larger than the width of the left operand") provides the statement length through eval context. Use it to evaluate the bitwise expression accordingly, otherwise bytecode is incorrect: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000' ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x1e000000 ) ^ 0x000000ff ] <-- incorrect byteorder for OR [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <-- no needed for single ip dscp byte [ ct set mark with reg 1 ] Correct bytecode: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000 ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x0000001e ) ^ 0xff000000 ] [ ct set mark with reg 1 ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: honor statement length in integer evaluation	Pablo Neira Ayuso	2023-03-28	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, bogus error is reported: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000' Error: Value 4278190080 exceeds valid range 0-63 add rule ip x y ct mark set ip dscp & 0x0f << 1 \| 0xff000000 ^^^^^^^^^^ Use the statement length as the maximum value in the mark statement expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: set up integer type to shift expression	Pablo Neira Ayuso	2023-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Otherwise expr_evaluate_value() fails with invalid datatype: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1' BUG: invalid basetype invalid nft: evaluate.c:440: expr_evaluate_value: Assertion `0' failed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: relax type-checking for integer arguments in mark statements	Pablo Neira Ayuso	2023-03-28	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Moreover, swap expression byteorder before to make it compatible with the statement byteorder, to ensure rulesets are portable. # nft --debug=netlink add rule ip t c 'meta mark set ip saddr' ip t c [ payload load 4b @ network header + 12 => reg 1 ] [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <----------- byteorder swap [ meta set mark with reg 1 ] Based on original work from Jeremy Sowden. The following patches are required for this to work: evaluate: get length from statement instead of lhs expression evaluate: don't eval unary arguments evaluate: support shifts larger than the width of the left operand netlink_delinearize: correct type and byte-order of shifts evaluate: insert byte-order conversions for expressions between 9 and 15 bits Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: don't eval unary arguments	Jeremy Sowden	2023-03-28	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When a unary expression is inserted to implement a byte-order conversion, the expression being converted has already been evaluated and so `expr_evaluate_unary` doesn't need to do so. This is required by {ct\|meta} statements with bitwise operations, which might result in byteorder conversion of the expression. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: support shifts larger than the width of the left operand	Pablo Neira Ayuso	2023-03-28	1	-18/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: insert byte-order conversions for expressions between 9 and 15 bits	Jeremy Sowden	2023-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Round up expression lengths when determining whether to insert a byte-order conversion. For example, if one is masking a network header which spans a byte boundary, the mask will span two bytes and so it will need to be in NBO. Fixes: bb03cbcd18a1 ("evaluate: no need to swap byte-order for values of fewer than 16 bits.") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Revert "evaluate: relax type-checking for integer arguments in mark statements"	Pablo Neira Ayuso	2023-03-14	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \|	This patch reverts eab3eb7f146c ("evaluate: relax type-checking for integer arguments in mark statements") since it might cause ruleset portability issues when moving a ruleset from little to big endian host (and vice-versa). Let's revert this until we agree on what to do in this case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: improve error reporting for unsupported chain type	Pablo Neira Ayuso	2023-03-11	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8c75d3a16960 ("Reject invalid chain priority values in user space") provides error reporting from the evaluation phase. Instead, this patch infers the error after the kernel reports EOPNOTSUPP. test.nft:3:28-40: Error: Chains of type "nat" must have a priority value above -200 type nat hook prerouting priority -300; ^^^^^^^^^^^^^ This patch also adds another common issue for users compiling their own kernels if they forget to enable CONFIG_NFT_NAT in their .config file. Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Reject invalid chain priority values in user space	Phil Sutter	2023-03-10	1	-0/+9
\| \| \| \| \| \| \| \|	The kernel doesn't accept nat type chains with a priority of -200 or below. Catch this and provide a better error message than the kernel's EOPNOTSUPP. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: add last statement	Pablo Neira Ayuso	2023-02-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: expand value to range when nat mapping contains intervals	Pablo Neira Ayuso	2023-02-28	1	-2/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the data in the mapping contains a range, then upgrade value to range. Otherwise, the following error is displayed: /dev/stdin:11:57-75: Error: Could not process rule: Invalid argument dnat ip to iifname . ip saddr map { enp2s0 . 10.1.1.136 : 1.1.2.69, enp2s0 . 10.1.1.1-10.1.1.135 : 1.1.2.66-1.84.236.78 } ^^^^^^^^^^^^^^^^^^^ The kernel rejects this command because userspace sends a single value while the kernel expects the range that represents the min and the max IP address to be used for NAT. The upgrade is also done when concatenation with intervals is used in the rhs of the mapping. For anonymous sets, expansion cannot be done from expr_evaluate_mapping() because the EXPR_F_INTERVAL flag is inferred from the elements. For explicit sets, this can be done from expr_evaluate_mapping() because the user already specifies the interval flag in the rhs of the map definition. Update tests/shell and tests/py to improve testing coverage in this case. Fixes: 9599d9d25a6b ("src: NAT support for intervals in maps") Fixes: 66746e7dedeb ("src: support for nat with interval concatenation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>