nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	evaluate: bail out if new flowtable does not specify hook and priority	Pablo Neira Ayuso	2023-04-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user forgets to specify the hook and priority and the flowtable does not exist, then bail out: # cat flowtable-incomplete.nft table t { flowtable f { devices = { lo } } } # nft -f /tmp/k flowtable-incomplete.nft:2:12-12: Error: missing hook and priority in flowtable declaration flowtable f { ^ Update one existing tests/shell to specify a hook and priority. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow for updating devices on existing netdev chain	Pablo Neira Ayuso	2023-04-24	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to add/remove devices to an existing chain: # cat ruleset.nft table netdev x { chain y { type filter hook ingress devices = { eth0 } priority 0; policy accept; } } # nft -f ruleset.nft # nft add chain netdev x y '{ devices = { eth1 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth0, eth1 } priority 0; policy accept; } } # nft delete chain netdev x y '{ devices = { eth0 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth1 } priority 0; policy accept; } } This feature allows for creating an empty netdev chain, with no devices. In such case, no packets are seen until a device is registered. This patch includes extended netlink error reporting: # nft add chain netdev x y '{ devices = { x } ; }' Error: Could not process rule: No such file or directory add chain netdev x y { devices = { x } ; } ^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: bogus missing transport protocol	Pablo Neira Ayuso	2023-04-05	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users have to specify a transport protocol match such as meta l4proto tcp before the redirect statement, even if the redirect statement already implicitly refers to the transport protocol, for instance: test.nft:3:16-53: Error: transport protocol mapping is only valid after transport protocol match redirect to :tcp dport map { 83 : 8083, 84 : 8084 } ~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Evaluate the redirect expression before the mandatory check for the transport protocol match, so protocol context already provides a transport protocol. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: honor statement length in bitwise evaluation	Pablo Neira Ayuso	2023-03-28	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get length from statement, instead infering it from the expression that is used to set the value. In the particular case of {ct\|meta} mark, this is 32 bits. Otherwise, bytecode generation is not correct: # nft -c --debug=netlink 'add rule ip6 x y ct mark set ip6 dscp << 2 \| 0x10' [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 1) ] [ bitwise reg 1 = ( reg 1 << 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x00000fef ) ^ 0x00000010 ] <--- incorrect! [ ct set mark with reg 1 ] the previous bitwise shift already upgraded to 32-bits (not visible from the netlink debug output above). After this patch, the last \| 0x10 uses 32-bits: [ bitwise reg 1 = ( reg 1 & 0xffffffef ) ^ 0x00000010 ] note that mask 0xffffffef is used instead of 0x00000fef. Patch ("evaluate: support shifts larger than the width of the left operand") provides the statement length through eval context. Use it to evaluate the bitwise expression accordingly, otherwise bytecode is incorrect: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000' ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x1e000000 ) ^ 0x000000ff ] <-- incorrect byteorder for OR [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <-- no needed for single ip dscp byte [ ct set mark with reg 1 ] Correct bytecode: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000 ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x0000001e ) ^ 0xff000000 ] [ ct set mark with reg 1 ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: honor statement length in integer evaluation	Pablo Neira Ayuso	2023-03-28	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, bogus error is reported: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 \| 0xff000000' Error: Value 4278190080 exceeds valid range 0-63 add rule ip x y ct mark set ip dscp & 0x0f << 1 \| 0xff000000 ^^^^^^^^^^ Use the statement length as the maximum value in the mark statement expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: set up integer type to shift expression	Pablo Neira Ayuso	2023-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Otherwise expr_evaluate_value() fails with invalid datatype: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1' BUG: invalid basetype invalid nft: evaluate.c:440: expr_evaluate_value: Assertion `0' failed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: relax type-checking for integer arguments in mark statements	Pablo Neira Ayuso	2023-03-28	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Moreover, swap expression byteorder before to make it compatible with the statement byteorder, to ensure rulesets are portable. # nft --debug=netlink add rule ip t c 'meta mark set ip saddr' ip t c [ payload load 4b @ network header + 12 => reg 1 ] [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <----------- byteorder swap [ meta set mark with reg 1 ] Based on original work from Jeremy Sowden. The following patches are required for this to work: evaluate: get length from statement instead of lhs expression evaluate: don't eval unary arguments evaluate: support shifts larger than the width of the left operand netlink_delinearize: correct type and byte-order of shifts evaluate: insert byte-order conversions for expressions between 9 and 15 bits Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: don't eval unary arguments	Jeremy Sowden	2023-03-28	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When a unary expression is inserted to implement a byte-order conversion, the expression being converted has already been evaluated and so `expr_evaluate_unary` doesn't need to do so. This is required by {ct\|meta} statements with bitwise operations, which might result in byteorder conversion of the expression. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: support shifts larger than the width of the left operand	Pablo Neira Ayuso	2023-03-28	1	-18/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: insert byte-order conversions for expressions between 9 and 15 bits	Jeremy Sowden	2023-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Round up expression lengths when determining whether to insert a byte-order conversion. For example, if one is masking a network header which spans a byte boundary, the mask will span two bytes and so it will need to be in NBO. Fixes: bb03cbcd18a1 ("evaluate: no need to swap byte-order for values of fewer than 16 bits.") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Revert "evaluate: relax type-checking for integer arguments in mark statements"	Pablo Neira Ayuso	2023-03-14	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \|	This patch reverts eab3eb7f146c ("evaluate: relax type-checking for integer arguments in mark statements") since it might cause ruleset portability issues when moving a ruleset from little to big endian host (and vice-versa). Let's revert this until we agree on what to do in this case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: improve error reporting for unsupported chain type	Pablo Neira Ayuso	2023-03-11	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8c75d3a16960 ("Reject invalid chain priority values in user space") provides error reporting from the evaluation phase. Instead, this patch infers the error after the kernel reports EOPNOTSUPP. test.nft:3:28-40: Error: Chains of type "nat" must have a priority value above -200 type nat hook prerouting priority -300; ^^^^^^^^^^^^^ This patch also adds another common issue for users compiling their own kernels if they forget to enable CONFIG_NFT_NAT in their .config file. Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Reject invalid chain priority values in user space	Phil Sutter	2023-03-10	1	-0/+9
\| \| \| \| \| \| \| \|	The kernel doesn't accept nat type chains with a priority of -200 or below. Catch this and provide a better error message than the kernel's EOPNOTSUPP. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: add last statement	Pablo Neira Ayuso	2023-02-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: expand value to range when nat mapping contains intervals	Pablo Neira Ayuso	2023-02-28	1	-2/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the data in the mapping contains a range, then upgrade value to range. Otherwise, the following error is displayed: /dev/stdin:11:57-75: Error: Could not process rule: Invalid argument dnat ip to iifname . ip saddr map { enp2s0 . 10.1.1.136 : 1.1.2.69, enp2s0 . 10.1.1.1-10.1.1.135 : 1.1.2.66-1.84.236.78 } ^^^^^^^^^^^^^^^^^^^ The kernel rejects this command because userspace sends a single value while the kernel expects the range that represents the min and the max IP address to be used for NAT. The upgrade is also done when concatenation with intervals is used in the rhs of the mapping. For anonymous sets, expansion cannot be done from expr_evaluate_mapping() because the EXPR_F_INTERVAL flag is inferred from the elements. For explicit sets, this can be done from expr_evaluate_mapping() because the user already specifies the interval flag in the rhs of the map definition. Update tests/shell and tests/py to improve testing coverage in this case. Fixes: 9599d9d25a6b ("src: NAT support for intervals in maps") Fixes: 66746e7dedeb ("src: support for nat with interval concatenation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: expand table command before evaluation	Pablo Neira Ayuso	2023-02-24	1	-39/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nested syntax notation results in one single table command which includes all other objects. This differs from the flat notation where there is usually one command per object. This patch adds a previous step to the evaluation phase to expand the objects that are contained in the table into independent commands, so both notations have similar representations. Remove the code to evaluate the nested representation in the evaluation phase since commands are independently evaluated after the expansion. The commands are expanded after the set element collapse step, in case that there is a long list of singleton element commands to be added to the set, to shorten the command list iteration. This approach also avoids interference with the object cache that is populated in the evaluation, which might refer to objects coming in the existing command list that is being processed. There is still a post_expand phase to detach the elements from the set which could be consolidated by updating the evaluation step to handle the CMD_OBJ_SETELEMS command type. This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that contains rules") which broke rule addition/insertion by index because the expansion code after the evaluation messes up the cache. Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: infer family from mapping	Pablo Neira Ayuso	2023-02-21	1	-5/+40
\| \| \| \| \| \| \|	If the key in the nat mapping is either ip or ip6, then set the nat family accordingly, no need for explicit family in the nat statement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: print error on missing family in nat statement	Pablo Neira Ayuso	2023-02-21	1	-3/+29
\| \| \| \| \| \| \| \| \| \| \|	Print error message in case family cannot be inferred, before this patch, $? shows 1 after nft execution but no error message was printed. While at it, update error reporting for consistency in similar use cases. Fixes: e5c9c8fe0bcc ("evaluate: stmt_evaluate_nat_map() only if stmt->nat.ipportmap == true") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: relax type-checking for integer arguments in mark statements	Jeremy Sowden	2023-02-07	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
*	src: add support to command "destroy"	Fernando F. Mancera	2023-02-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"destroy" command performs a deletion as "delete" command but does not fail if the object does not exist. As there is no NLM_F_* flag for ignoring such error, it needs to be ignored directly on error handling. Example of use: # nft list ruleset table ip filter { chain output { } } # nft destroy table ip missingtable # echo $? 0 # nft list ruleset table ip filter { chain output { } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: set eval ctx for add/update statements with integer constants	Florian Westphal	2023-01-26	1	-2/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric reports that nft asserts when using integer basetype constants with 'typeof' sets. Example: table netdev t { set s { typeof ether saddr . vlan id flags dynamic,timeout } chain c { } } loads fine. But adding a rule with add/update statement fails: nft 'add rule netdev t c set update ether saddr . 0 @s' nft: netlink_linearize.c:867: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. When the 'ether saddr . 0' concat expression is processed, there is no set definition available anymore to deduce the required size of the integer constant. nft eval step then derives the required length using the data types. '0' has integer basetype, so the deduced length is 0. The assertion triggers because serialization step finds that it needs one more register. 2 are needed to store the ethernet address, another register is needed for the vlan id. Update eval step to make the expression context store the set key information when processing the preceeding set reference, then let stmt_evaluate_set() preserve the existing context instead of zeroing it again via stmt_evaluate_arg(). This makes concat expression evaluation compute the total size needed based on the sets key definition. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
*	Implement 'reset rule' and 'reset rules' commands	Phil Sutter	2023-01-18	1	-0/+2
\| \| \| \| \| \| \| \|	Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: add gre support	Pablo Neira Ayuso	2023-01-02	1	-12/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	GRE has a number of fields that are conditional based on flags, which requires custom dependency code similar to icmp and icmpv6. Matching on optional fields is not supported at this stage. Since this is a layer 3 tunnel protocol, an implicit dependency on NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this patch adds new infrastructure to remove an outer dependency based on the inner protocol from delinearize path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: display (inner) tag in --debug=proto-ctx	Pablo Neira Ayuso	2023-01-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add vxlan matching support	Pablo Neira Ayuso	2023-01-02	1	-9/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the initial infrastructure to support for inner header tunnel matching and its first user: vxlan. A new struct proto_desc field for payload and meta expression to specify that the expression refers to inner header matching is used. The existing codebase to generate bytecode is fully reused, allowing for reusing existing supported layer 2, 3 and 4 protocols. Syntax requires to specify vxlan before the inner protocol field: ... vxlan ip protocol udp ... vxlan ip saddr 1.2.3.0/24 This also works with concatenations and anonymous sets, eg. ... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 } You have to restrict vxlan matching to udp traffic, otherwise it complains on missing transport protocol dependency, e.g. ... udp dport 4789 vxlan ip daddr 1.2.3.4 The bytecode that is generated uses the new inner expression: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] JSON support is not included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add eval_proto_ctx()	Pablo Neira Ayuso	2023-01-02	1	-73/+115
\| \| \| \| \| \| \| \| \| \| \|	Add eval_proto_ctx() to access protocol context (struct proto_ctx). Rename struct proto_ctx field to _pctx to highlight that this field is internal and the helper function should be used. This patch comes in preparation for supporting outer and inner protocol context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix shift exponent underflow in concatenation evaluation	Pablo Neira Ayuso	2022-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is an underflow of the index that iterates over the concatenation: ../include/datatype.h:292:15: runtime error: shift exponent 4294967290 is too large for 32-bit type 'unsigned int' set the datatype to invalid which is fine to evaluate a concatenation in a set/map statement. Update b8e1940aa190 ("tests: add a test case for map update from packet path with concat") so it does not need a workaround to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_linearize: fix timeout with map updates	Florian Westphal	2022-12-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Map updates can use timeouts, just like with sets, but the linearization step did not pass this info to the kernel. meta l4proto tcp update @pinned { ip saddr . ct original proto-src timeout 90s : ip daddr . tcp dport Listing this won't show the "timeout 90s" because kernel never saw it to begin with. Also update evaluation step to reject a timeout that was set on the data part: Timeouts are only allowed for the key-value pair as a whole. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: fix compilation warning	Pablo Neira Ayuso	2022-12-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set pointer to list of expression to NULL and check that it is set on before using it. In function ‘expr_evaluate_concat’, inlined from ‘expr_evaluate’ at evaluate.c:2488:10: evaluate.c:1338:20: warning: ‘expressions’ may be used uninitialized [-Wmaybe-uninitialized] 1338 \| if (runaway) { \| ^ evaluate.c: In function ‘expr_evaluate’: evaluate.c:1321:33: note: ‘expressions’ was declared here 1321 \| const struct list_head *expressions; \| ^~~~~~~~~~~ Reported-by: Florian Westphal <fw@strlen.de> Fixes: 508f3a270531 ("netlink: swap byteorder of value component in concatenation of intervals") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: do not crash on runaway number of concatenation components	Pablo Neira Ayuso	2022-12-08	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Display error message in case user specifies more data components than those defined by the concatenation of selectors. # cat example.nft table ip x { chain y { type filter hook prerouting priority 0; policy drop; ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept } } # nft -f example.nft example.nft:4:3-22: Error: too many concatenation components ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept ~~~~~~~~~~~~~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Without this patch, nft crashes: ==464771==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60d000000418 at pc 0x7fbc17513aa5 bp 0x7ffc73d33c90 sp 0x7ffc73d33c88 READ of size 8 at 0x60d000000418 thread T0 #0 0x7fbc17513aa4 in expr_evaluate_concat /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1348 #1 0x7fbc1752a9da in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2476 #2 0x7fbc175175e2 in expr_evaluate_set_elem /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1504 #3 0x7fbc1752aa22 in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2482 #4 0x7fbc17512cb5 in list_member_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1310 #5 0x7fbc17518ca0 in expr_evaluate_set /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1590 [...] Fixes: 64bb3f43bb96 ("src: allow to use typeof of raw expressions in set declaration") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: support for selectors with different byteorder with interval concatenations	Pablo Neira Ayuso	2022-11-30	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assuming the following interval set with concatenation: set test { typeof ip saddr . meta mark flags interval } then, the following rule: ip saddr . meta mark @test requires bytecode that swaps the byteorder for the meta mark selector in case the set contains intervals and concatenations. inet x y [ meta load nfproto => reg 1 ] [ cmp eq reg 1 0x00000002 ] [ payload load 4b @ network header + 12 => reg 1 ] [ meta load mark => reg 9 ] [ byteorder reg 9 = hton(reg 9, 4, 4) ] <----- this is required ! [ lookup reg 1 set test dreg 0 ] This patch updates byteorder_conversion() to add the unary expression that introduces the byteorder expression. Moreover, store the meta mark range component of the element tuple in the set in big endian as it is required for the range comparisons. Undo the byteorder swap in the netlink delinearize path to listing the meta mark values accordingly. Update tests/py to validate that byteorder expression is emitted in the bytecode. Update tests/shell to validate insertion and listing of a named map declaration. A similar commit 806ab081dc9a ("netlink: swap byteorder for host-endian concat data") already exists in the tree to handle this for strings with prefix (e.g. eth*). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: datatype memleak after binop transfer	Pablo Neira Ayuso	2022-10-06	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following ruleset: ip version vmap { 4 : jump t3, 6 : jump t4 } results in a memleak. expr_evaluate_shift() overrides the datatype which results in a datatype memleak after the binop transfer that triggers a left-shift of the constant (in the map). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: bogus datatype assertion in binary operation evaluation	Pablo Neira Ayuso	2022-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Use datatype_equal(), otherwise dynamically allocated datatype fails to fulfill the datatype pointer check, triggering the assertion: nft: evaluate.c:1249: expr_evaluate_binop: Assertion `expr_basetype(left) == expr_basetype(right)' failed. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1636 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: add ethernet header size offset for implicit vlan dependency	Florian Westphal	2022-09-29	1	-1/+19
\| \| \| \| \| \| \| \| \| \|	'vlan id 1' must also add a ethernet header dep, else nft fetches the payload from header offset 0 instead of 14. Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: allow implicit ether -> vlan dep	Florian Westphal	2022-09-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	nft add rule inet filter input vlan id 2 Error: conflicting protocols specified: ether vs. vlan Refresh the current dependency after superseding the dummy dependency to make this work. Signed-off-by: Florian Westphal <fw@strlen.de>
*	doc, src: make some spelling and grammatical improvements	Jeremy Sowden	2022-09-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a couple of spelling mistakes: 'expresion' -> 'expression' and correct some non-native usages: 'allows to' -> 'allows one to' Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: un-break rule insert with intervals	Florian Westphal	2022-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \|	'rule inet dscpclassify dscp_match meta l4proto { udp } th dport { 3478 } th sport { 3478-3497, 16384-16387 } goto ct_set_ef' works with 'nft add', but not 'nft insert', the latter yields: "BUG: unhandled op 4". Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Don't parse string as verdict in map	Xiao Liang	2022-08-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In verdict map, string values are accidentally treated as verdicts. For example: table t { map foo { type ipv4_addr : verdict elements = { 192.168.0.1 : bar } } chain output { type filter hook output priority mangle; ip daddr vmap @foo } } Though "bar" is not a valid verdict (should be "jump bar" or something), the string is taken as the element value. Then NFTA_DATA_VALUE is sent to the kernel instead of NFTA_DATA_VERDICT. This would be rejected by recent kernels. On older ones (e.g. v5.4.x) that don't validate the type, a warning can be seen when the rule is hit, because of the corrupted verdict value: [5120263.467627] WARNING: CPU: 12 PID: 303303 at net/netfilter/nf_tables_core.c:229 nft_do_chain+0x394/0x500 [nf_tables] Indeed, we don't parse verdicts during evaluation, but only chain names, which is of type string rather than verdict. For example, "jump $var" is a verdict while "$var" is a string. Fixes: c64457cff967 ("src: Allow goto and jump to a variable") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: search stacked header list for matching payload dep	Florian Westphal	2022-08-05	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	"ether saddr 0:1:2:3:4:6 vlan id 2" works, but reverse fails: "vlan id 2 ether saddr 0:1:2:3:4:6" will give Error: conflicting protocols specified: vlan vs. ether After "proto: track full stack of seen l2 protocols, not just cumulative offset", we have a list of all l2 headers, so search those to see if we had this proto base in the past before rejecting this. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
*	proto: track full stack of seen l2 protocols, not just cumulative offset	Florian Westphal	2022-08-05	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For input, a cumulative size counter of all pushed l2 headers is enough, because we have the full expression tree available to us. For delinearization we need to track all seen l2 headers, else we lose information that we might need at a later time. Consider: rule netdev nt nc set update ether saddr . vlan id during delinearization, the vlan proto_desc replaces the ethernet one, and by the time we try to split the concatenation apart we will search the ether saddr offset vs. the templates for proto_vlan. This replaces the offset with an array that stores the protocol descriptions seen. Then, if the payload offset is larger than our description, search the l2 stack and adjust the offset until we're within the expected offset boundary. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: report missing interval flag when using prefix/range in concatenation	Pablo Neira Ayuso	2022-07-07	1	-5/+20
\| \| \| \| \| \| \| \|	If set declaration is missing the interval flag, and user specifies an element with either prefix or range, then bail out. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1592 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix segfault when adding elements to invalid set	Peter Tirsek	2022-06-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding elements to a set or map with an invalid definition causes nft to segfault. The following nftables.conf triggers the crash: flush ruleset create table inet filter set inet filter foo {} add element inet filter foo { foobar } Simply parsing and checking the config will trigger it: $ nft -c -f nftables.conf.crash Segmentation fault The error in the set/map definition is correctly caught and queued, but because the set is invalid and does not contain a key type, adding to it causes a NULL pointer dereference of set->key within setelem_evaluate(). I don't think it's necessary to queue another error since the underlying problem is correctly detected and reported when parsing the definition of the set. Simply checking the validity of set->key before using it seems to fix it, causing the error in the definition of the set to be reported properly. The element type error isn't caught, but that seems reasonable since the key type is invalid or unknown anyway: $ ./nft -c -f ~/nftables.conf.crash /home/pti/nftables.conf.crash:3:21-21: Error: set definition does not specify key set inet filter foo {} ^ [ Add tests to cover this case --pablo ] Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1597 Signed-off-by: Peter Tirsek <peter@tirsek.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: reset ctx->set after set interval evaluation	Pablo Neira Ayuso	2022-06-01	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise bogus error reports on set datatype mismatch might occur, such as: Error: datatype mismatch, expected Internet protocol, expression has type IPv4 address meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1 ~~~~~~~~~~~~ ^^^^^^^^^^^^ with an unrelated set declaration. table ip test { set set_with_interval { type ipv4_addr flags interval } chain prerouting { type nat hook prerouting priority dstnat; policy accept; meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1 } } This bug has been introduced in the evaluation step. Reported-by: Roman Petrov <nwhisper@gmail.com> Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge)" Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: fix always-true assertions	Florian Westphal	2022-04-26	1	-1/+1
\| \| \| \| \| \| \|	assert(1) is a no-op, this should be assert(0). Use BUG() instead. Add missing CATCHALL to avoid BUG(). Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: allow use of base integer types as set keys in concatenations	Florian Westphal	2022-04-18	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"typeof ip saddr . ipsec in reqid" won't work because reqid uses integer type, i.e. dtype->size is 0. With "typeof", the size can be derived from the expression length, via set->key. This computes the concat length based either on dtype->size or expression length. It also updates concat evaluation to permit a zero datatype size if the subkey expression has nonzero length (i.e., typeof was used). Signed-off-by: Florian Westphal <fw@strlen.de>
*	intervals: support to partial deletion with automerge	Pablo Neira Ayuso	2022-04-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splice the existing set element cache with the elements to be deleted and merge sort it. The elements to be deleted are identified by the EXPR_F_REMOVE flag. The set elements to be deleted is automerged in first place if the automerge flag is set on. There are four possible deletion scenarios: - Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set. - Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b. - Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a. - Split range, eg. delete [a-b] from range [x-y] where x < a and b < y. Update nft_evaluate() to use the safe list variant since new commands are dynamically registered to the list to update ranges. This patch also restores the set element existence check for Linux kernels <= 5.7. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: allow for zero length ranges	Pablo Neira Ayuso	2022-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \|	Allow for ranges such as, eg. 30-30. This is required by the new intervals.c code, which normalize constant, prefix set elements to all ranges. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	intervals: add support to automerge with kernel elements	Pablo Neira Ayuso	2022-04-13	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the interval codebase to support for merging elements in the kernel with userspace element updates. Add a list of elements to be purged to cmd and set objects. These elements representing outdated intervals are deleted before adding the updated ranges. This routine splices the list of userspace and kernel elements, then it mergesorts to identify overlapping and contiguous ranges. This splice operation is undone so the set userspace cache remains consistent. Incrementally update the elements in the cache, this allows to remove dd44081d91ce ("segtree: Fix add and delete of element in same batch"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: replace interval segment tree overlap and automerge	Pablo Neira Ayuso	2022-04-13	1	-3/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a rewrite of the segtree interval codebase. This patch now splits the original set_to_interval() function in three routines: - add set_automerge() to merge overlapping and contiguous ranges. The elements, expressed either as single value, prefix and ranges are all first normalized to ranges. This elements expressed as ranges are mergesorted. Then, there is a linear list inspection to check for merge candidates. This code only merges elements in the same batch, ie. it does not merge elements in the kernela and the userspace batch. - add set_overlap() to check for overlapping set elements. Linux kernel >= 5.7 already checks for overlaps, older kernels still needs this code. This code checks for two conflict types: 1) between elements in this batch. 2) between elements in this batch and kernelspace. The elements in the kernel are temporarily merged into the list of elements in the batch to check for this overlaps. The EXPR_F_KERNEL flag allows us to restore the set cache after the overlap check has been performed. - set_to_interval() now only transforms set elements, expressed as range e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END flag notation to represent e.g. [a,b+1), where b+1 has the EXPR_F_INTERVAL_END flag set on. More relevant updates: - The overlap and automerge routines are now performed in the evaluation phase. - The userspace set object representation now stores a reference to the existing kernel set object (in case there is already a set with this same name in the kernel). This is required by the new overlap and automerge approach. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: string prefix expression must retain original length	Florian Westphal	2022-04-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To make something like "eth" work for interval sets (match eth0, eth1, and so on...) we must treat the string as a 128 bit integer. Without this, segtree will do the wrong thing when applying the prefix, because we generate the prefix based on 'eth' as input, with a length of 3. The correct import needs to be done on "eth\0\0\0\0\0\0\0...", i.e., if the input buffer were an ipv6 address, it should look like "eth\0::", not "::eth". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>