nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	src: remove flagcmp expression	Pablo Neira Ayuso	2025-03-27	2	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This expression is not used anymore, since: ("src: transform flag match expression to binop expression from parser") remove it. This completes the revert of c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations"), except the parser chunk for backwards compatibility. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: transform flag match expression to binop expression from parser	Pablo Neira Ayuso	2025-03-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Transform flagcmp expression to a relational with binop on the left hand side, ie. relational / \ binop value / \ payload mask Add list_expr_to_binop() to make this transformation. Goal is two-fold: - Allow -o/--optimize to pick up on this representation. - Remove the flagcmp expression in a follow up patch. This prepare for the removal of the flagcmp expression added by: c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	expression: add __EXPR_MAX and use it to define EXPR_MAX	Pablo Neira Ayuso	2025-03-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	EXPR_MAX was never updated to the newest expression, add __EXPR_MAX and use it to define EXPR_MAX. Add case to expr_ops() other gcc complains with a warning on the __EXPR_MAX case is not handled. Fixes: 347039f64509 ("src: add symbol range expression to further compact intervals") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: replace struct stmt_ops by type field in struct stmt	Pablo Neira Ayuso	2025-03-18	5	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shrink struct stmt in 8 bytes. __stmt_ops_by_type() provides an operation for STMT_INVALID since this is required by -o/--optimize. There are many checks for stmt->ops->type, which is the most accessed field, that can be trivially replaced. BUG() uses statement type enum instead of name. Similar to: 68e76238749f ("src: expr: add and use expr_name helper"). 72931553828a ("src: expr: add expression etype") 2cc91e6198e7 ("src: expr: add and use internal expr_ops helper") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: print set element with multi-word description in single one line	Pablo Neira Ayuso	2025-03-18	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the set element: - represents a mapping - has a timeout - has a comment - has counter/quota/limit - concatenation (already printed in a single line before this patch) ie. if the set element requires several words, then print it in one single line. Before this patch: table ip x { set y { typeof ip saddr counter elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0, 192.168.10.135 counter packets 0 bytes 0 } } } After this patch: table ip x { set y { typeof ip saddr counter elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0, 192.168.10.135 counter packets 0 bytes 0 } } } Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_delinearize: support for bitfield payload statement with binary ↵	Pablo Neira Ayuso	2025-03-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operation Add a new function to deal with payload statement delinearization with binop expression. Infer the payload offset from the mask, then walk the template list to determine if estimated offset falls within a matching header field. If so, then validate that this is not a raw expression but an actual bitfield matching. Finally, trim the payload expression length accordingly and adjust the payload offset. instead of: @nh,8,5 set 0x0 it displays: ip dscp and 0x1 Update tests/py to cover for this enhancement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	tcpopt: add symbol table for mptcp suboptions	Florian Westphal	2025-03-06	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft can be used t match on specific multipath tcp subtypes: tcp option mptcp subtype 0 However, depending on which subtype to match, users need to look up the type/value to use in rfc8684. Add support for mnemonics and "nft describe tcp option mptcp subtype" to get the subtype list. Because the number of unique 'enum datatypes' is limited by ABI contraints this adds a new mptcp suboption type as integer alias. After this patch, nft supports all of the following: add element t s { mp-capable } add rule t c tcp option mptcp subtype mp-capable add rule t c tcp option mptcp subtype { mp-capable, mp-fail } For the 3rd case, listing will break because unlike for named sets, nft lacks the type information needed to pretty-print the integer values, i.e. nft will print the 3rd rule as 'subtype { 0, 6 }'. This is resolved in a followup patch. Other problematic constructs are: set s1 { typeof tcp option mptcp subtype . ip saddr elements = { mp-fail . 1.2.3.4 } } Followed by: tcp option mptcp subtype . ip saddr @s1 nft will print this as: tcp option mptcp unknown & 240) >> 4 . ip saddr @s1 All of these issues are not related to this patch, however, they also occur with other bit-sized extheader fields. Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add symbol range expression to further compact intervals	Pablo Neira Ayuso	2025-02-21	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update parser to use a new symbol range expression with smaller memory footprint than range expression + two symbol expressions. The evaluation step translates this into EXPR_RANGE_VALUE for interval sets. Note that maps or concatenations still use the less compact range expressions representation, those require more work to use this new symbol range expression. The parser also uses the classic range expression if variables are used. Testing with a 100k intervals, worst case scenario: no prefix or singleton elements. This shows a reduction from 49.58 Mbytes to 35.47 Mbytes (-29.56% memory footprint for this case). This follow up work to previous commits: 91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption") c9ee9032b0ee ("src: add EXPR_RANGE_VALUE expression and use it") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add and use payload_expr_trim_force	Florian Westphal	2025-02-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous commit fixed erroneous handling of raw expressions when RHS sets a zero value. Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0 Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \ @ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000 After this patch, this will instead display: @ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0 payload_expr_trim_force() only works when the payload has no known protocol (template) attached, i.e. will be printed as raw payload syntax. It performs sanity checks on @mask and then adjusts the payload expression length and offset according to the mask. Also add this check in __binop_postprocess() so we can also discard masks when matching, e.g. '@ih,7,5 2' becomes '@ih,7,5 0x2', not '@ih,0,16 & 0xffc0 == 0x20'. binop_postprocess now returns if it performed an action or not; if this returns true then arguments might have been freed so callers must no longer refer to any of the expressions attached to the binop. Next patch adds test cases for this. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rework singleton interval transformation to reduce memory consumption	Pablo Neira Ayuso	2025-01-10	3	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	set_to_intervals() expands range expressions into a list of singleton elements before building the netlink message that is sent to userspace. This is because the kernel expects this list of singleton elements where EXPR_F_INTERVAL_END denotes a closing interval. This expansion significantly increases memory consumption in userspace. This patch updates the logic to transform the range expression up to two temporary singleton element expressions through setelem_to_interval(). Then, these two elements are used to allocate the nftnl_set_elem objects through alloc_nftnl_setelem_interval() to build the netlink message, finally all these temporary objects are released. For anonymous sets, when adjacent ranges are found, the end element is not added to the set to pack the set representation as in the original set_to_intervals() routine. After this update, set_to_intervals() only deals with adding the non-matching all zero element to the interval set when it is not there as the kernel expects. In combination with the new EXPR_RANGE_VALUE expression, this shrinks runtime userspace memory consumption from 70.50 Mbytes to 43.38 Mbytes for a 100k intervals set sample. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	rule: constify set_is_non_concat_range()	Pablo Neira Ayuso	2025-01-10	1	-1/+1
\| \| \| \| \| \|	This is read-only, constify it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add EXPR_RANGE_VALUE expression and use it	Pablo Neira Ayuso	2025-01-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	set element with range takes 4 instances of struct expr: EXPR_SET_ELEM -> EXPR_RANGE -> (2) EXPR_VALUE where EXPR_RANGE represents two references to struct expr with constant value. This new EXPR_RANGE_VALUE trims it down to two expressions: EXPR_SET_ELEM -> EXPR_RANGE_VALUE with two direct low and high values that represent the range: struct { mpz_t low; mpz_t high; }; this two new direct values in struct expr do not modify its size. setelem_expr_to_range() translates EXPR_RANGE to EXPR_RANGE_VALUE, this conversion happens at a later stage. constant_range_expr_print() translates this structure to constant values to reuse the existing datatype_print() which relies in singleton values. The automerge routine has been updated to use EXPR_RANGE_VALUE. This requires a follow up patch to rework the conversion from range expression to singleton element to provide a noticeable memory consumption reduction. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: shrink line_offset in struct location to 4 bytes	Pablo Neira Ayuso	2025-01-02	1	-2/+1
\| \| \| \| \| \| \| \| \|	line_offset of 2^32 bytes should be enough. This requires the removal of the last_line field (in a previous patch) to shrink struct expr to 112 bytes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove last_line from struct location	Pablo Neira Ayuso	2025-01-02	1	-1/+0
\| \| \| \| \| \| \| \|	This 4 bytes field is never used, remove it. This does not shrink struct location in x86_64 due to alignment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove unused token_offset from struct location	Pablo Neira Ayuso	2025-01-02	1	-1/+0
\| \| \| \| \| \| \| \| \|	This saves 8 bytes in x86_64 in struct location which is embedded in every expression. This shrinks struct expr to 120 bytes according to pahole. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	expression: remove elem_flags from EXPR_SET_ELEM to shrink struct expr size	Pablo Neira Ayuso	2025-01-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Move NFTNL_SET_ELEM_F_INTERVAL_OPEN flag to the existing flags field in struct expr. This saves 4 bytes in struct expr, shrinking it to 128 bytes according to pahole. This reworks: 6089630f54ce ("segtree: Introduce flag for half-open range elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow binop expressions with variable right-hand operands	Jeremy Sowden	2024-12-04	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hitherto, the kernel has required constant values for the `xor` and `mask` attributes of boolean bitwise expressions. This has meant that the right-hand operand of a boolean binop must be constant. Now the kernel has support for AND, OR and XOR operations with right-hand operands passed via registers, we can relax this restriction. Allow non-constant right-hand operands if the left-hand operand is not constant, e.g.: ct mark & 0xffff0000 \| meta mark & 0xffff The kernel now supports performing AND, OR and XOR operations directly, on one register and an immediate value or on two registers, so we need to be able to generate and parse bitwise boolean expressions of this form. If a boolean operation has a constant RHS, we continue to send a mask-and-xor expression to the kernel. Add tests for {ct,meta} mark with variable RHS operands. JSON support is also included. This requires Linux kernel >= 6.13-rc. [ Originally posted as patch 1/8 and 6/8 which has been collapsed and simplified to focus on initial {ct,meta} mark support. Tests have been extracted from 8/8 including a tests/py fix to payload output due to incorrect output in original patchset. JSON support has been extracted from patch 7/8 --pablo] Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow to map key to nfqueue number	Florian Westphal	2024-11-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow to specify a numeric queue id as part of a map. The parser side is easy, but the reverse direction (listing) is not. 'queue' is a statement, it doesn't have an expression. Add a generic 'queue_type' datatype as a shim to the real basetype with constant expressions, this is used only for udata build/parse, it stores the "key" (the parser token, here "queue") as udata in kernel and can then restore the original key. Add a dumpfile to validate parser & output. JSON support is missing because JSON allow typeof only since quite recently. Joint work with Pablo. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1455 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: remove unused flags field	Pablo Neira Ayuso	2024-11-11	1	-2/+0
\| \| \| \| \| \| \|	Leftover unused struct datatype field, remove it. Fixes: e35aabd511c4 ("datatype: replace DTYPE_F_ALLOC by bitfield") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	monitor: Recognize flowtable add/del events	Phil Sutter	2024-11-06	3	-0/+12
\| \| \| \| \| \| \|	These were entirely ignored before, add the necessary code analogous to e.g. objects. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: fix extended netlink error reporting with large set elements	Pablo Neira Ayuso	2024-10-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Large sets can expand into several netlink messages, use sequence number and attribute offset to correlate the set element and the location. When set element command expands into several netlink messages, increment sequence number for each netlink message. Update struct cmd to store the range of netlink messages that result from this command. struct nlerr_loc remains in the same size in x86_64. # nft -f set-65535.nft set-65535.nft:65029:22-32: Error: Could not process rule: File exists create element x y { 1.1.254.253 } ^^^^^^^^^^^ Fixes: f8aec603aa7e ("src: initial extended netlink error reporting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	rule: netlink attribute offset is uint32_t for struct nlerr_loc	Pablo Neira Ayuso	2024-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The maximum netlink message length (nlh->nlmsg_len) is uint32_t, struct nlerr_loc stores the offset to the netlink attribute which must be uint32_t, not uint16_t. While at it, remove check for zero netlink attribute offset in nft_cmd_error() which should not ever happen, likely this check was there to prevent the uint16_t offset overflow. Fixes: f8aec603aa7e ("src: initial extended netlink error reporting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	mnl: update cmd_add_loc() to take struct nlmsghdr	Pablo Neira Ayuso	2024-10-28	1	-1/+1
\| \| \| \| \| \| \| \|	To prepare for a fix for very large sets. No functional change is intended. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	mnl: rename to mnl_seqnum_alloc() to mnl_seqnum_inc()	Pablo Neira Ayuso	2024-10-28	1	-1/+1
\| \| \| \| \| \| \| \|	rename mnl_seqnum_alloc() to mnl_seqnum_inc(). No functional change is intended. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: collapse set element commands from parser	Pablo Neira Ayuso	2024-10-28	4	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	498a5f0c219d ("rule: collapse set element commands") does not help to reduce memory consumption in the case of large sets defined by one element per line: add element ip x y { 1.1.1.1 } add element ip x y { 1.1.1.2 } ... This patch reduces memory consumption by ~75%, set elements are collapsed into an existing cmd object wherever possible to reduce the number of cmd objects. This patch also adds a special case for variables for sets similar to: be055af5c58d ("cmd: skip variable set elements when collapsing commands") This patch requires this small kernel fix: commit b53c116642502b0c85ecef78bff4f826a7dd4145 Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Fri May 20 00:02:06 2022 +0200 netfilter: nf_tables: set element extended ACK reporting support which is already included in recent -stable kernels: # cat ruleset.nft add table ip x add chain ip x y add set ip x y { type ipv4_addr; } create element ip x y { 1.1.1.1 } create element ip x y { 1.1.1.1 } # nft -f ruleset.nft ruleset.nft:5:25-31: Error: Could not process rule: File exists create element ip x y { 1.1.1.1 } ^^^^^^^ since there is no need to relate commands via sequence number anymore, this allows also removes the uncollapse step. Fixes: 498a5f0c219d ("rule: collapse set element commands") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: support for timeout never in elements	Pablo Neira Ayuso	2024-09-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow to specify elements that never expire in sets with global timeout. set x { typeof ip saddr timeout 1m elements = { 1.1.1.1 timeout never, 2.2.2.2, 3.3.3.3 timeout 2m } } in this example above: - 1.1.1.1 is a permanent element - 2.2.2.2 expires after 1 minute (uses default set timeout) - 3.3.3.3 expires after 2 minutes (uses specified timeout override) Use internal NFT_NEVER_TIMEOUT marker as UINT64_MAX to differenciate between use default set timeout and timeout never if "timeout N" is used in set declaration. Maximum supported timeout in milliseconds which is conveyed within a netlink attribute is 0x10c6f7a0b5ec. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	cache: consolidate reset command	Pablo Neira Ayuso	2024-08-26	2	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reset command does not utilize the cache infrastructure. This implicitly fixes a crash with anonymous sets because elements are not fetched. I initially tried to fix it by toggling the missing cache flags, but then ASAN reports memleaks. To address these issues relies on Phil's list filtering infrastructure which updates is expanded to accomodate filtering requirements of the reset commands, such as 'reset table ip' where only the family is sent to the kernel. After this update, tests/shell reports a few inconsistencies between reset and list commands: - reset rules chain t c2 display sets, but it should only list the given chain. - reset rules table t reset rules ip do not list elements in the set. In both cases, these are fully listing a given table and family, elements should be included. The consolidation also ensures list and reset will not differ. A few more notes: - CMD_OBJ_TABLE is used for: rules family table from the parser, due to the lack of a better enum, same applies to CMD_OBJ_CHAIN. - CMD_OBJ_ELEMENTS still does not use the cache, but same occurs in the CMD_GET command case which needs to be consolidated. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1763 Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands") Fixes: 1694df2de79f ("Implement 'reset rule' and 'reset rules' commands") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	cache: add filtering support for objects	Pablo Neira Ayuso	2024-08-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, full ruleset flag is set on to fetch objects. Follow a similar approach to these patches from Phil: de961b930660 ("cache: Filter set list on server side") and cb4b07d0b628 ("cache: Support filtering for a specific flowtable") in preparation to update the reset command to use the cache infrastructure. Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: replace DTYPE_F_ALLOC by bitfield	Pablo Neira Ayuso	2024-08-21	1	-11/+3
\| \| \| \| \| \| \|	Only user of the datatype flags field is DTYPE_F_ALLOC, replace it by bitfield, squash byteorder to 8 bits which is sufficient. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove DTYPE_F_PREFIX	Pablo Neira Ayuso	2024-08-21	1	-2/+1
\| \| \| \| \| \| \| \|	only ipv4 and ipv6 datatype support this, add datatype_prefix_notation() helper function to report that datatype prefers prefix notation, if possible. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: mnl: always dump all netdev hooks if no interface name was given	Florian Westphal	2024-08-21	1	-0/+2
\| \| \| \| \| \| \| \| \|	Instead of not returning any results for nft list hooks netdev Iterate all interfaces and then query all of them. Signed-off-by: Florian Westphal <fw@strlen.de>
*	cache: populate chains on demand from error path	Pablo Neira Ayuso	2024-08-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updates on verdict maps that require many non-base chains are slowed down due to fetching existing non-base chains into the cache. Chains are only required for error reporting hints if kernel reports ENOENT. Populate the cache from this error path only. Similar approach already exists from rule ENOENT error path since: deb7c5927fad ("cmd: add misspelling suggestions for rule commands") however, NFT_CACHE_CHAIN was toggled inconditionally for rule commands, rendering this on-demand cache population useless. before this patch, running Neels' nft_slew benchmark (peak values): created idx 4992 in 52587950 ns (128 in 7122 ms) ... deleted idx 128 in 43542500 ns (127 in 6187 ms) after this patch: created idx 4992 in 11361299 ns (128 in 1612 ms) ... deleted idx 1664 in 5239633 ns (128 in 733 ms) Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: drop obsolete hook argument form hook dump functions	Florian Westphal	2024-08-19	1	-1/+1
\| \| \| \| \| \| \| \|	since commit b98fee20bfe2 ("mnl: revisit hook listing"), handle.chain is never set in this path, so 'hook' is always set to -1, so the hook arg can be dropped. Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: remove decnet support	Florian Westphal	2024-07-30	1	-72/+0
\| \| \| \| \| \|	Removed two years ago with v6.1, ditch this from hook list code as well. Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add string preprocessor and use it for log prefix string	Pablo Neira Ayuso	2024-06-25	3	-3/+5
\| \| \| \| \| \| \| \|	Add a string preprocessor to identify and replace variables in a string. Rework existing support to variables in log prefix strings to use it. Fixes: e76bb3794018 ("src: allow for variables in the log prefix string") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Add support for table's persist flag	Phil Sutter	2024-04-19	1	-1/+3
\| \| \| \| \| \| \| \| \|	Bison parser lacked support for passing multiple flags, JSON parser did not support table flags at all. Document also 'owner' flag (and describe their relationship in nft.8. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: disentangle ICMP code types	Pablo Neira Ayuso	2024-04-04	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, ICMP{v4,v6,inet} code datatypes only describe those that are supported by the reject statement, but they can also be used for icmp code matching. Moreover, ICMP code types go hand-to-hand with ICMP types, that is, ICMP code symbols depend on the ICMP type. Thus, the output of: nft describe icmp_code look confusing because that only displays the values that are supported by the reject statement. Disentangle this by adding internal datatypes for the reject statement to handle the ICMP code symbol conversion to value as well as ruleset listing. The existing icmp_code, icmpv6_code and icmpx_code remain in place. For backward compatibility, a parser function is defined in case an existing ruleset relies on these symbols. As for the manpage, move existing ICMP code tables from the DATA TYPES section to the REJECT STATEMENT section, where this really belongs to. But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES section because that describe that this is an 8-bit integer field. After this patch: # nft describe icmp_code datatype icmp_code (icmp code) (basetype integer), 8 bits # nft describe icmpv6_code datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits # nft describe icmpx_code datatype icmpx_code (icmpx code) (basetype integer), 8 bits do not display the symbol table of the reject statement anymore. icmpx_code_type is not used anymore, but keep it in place for backward compatibility reasons. And update tests/shell accordingly. Fixes: 5fdd0b6a0600 ("nft: complete reject support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_delinearize: reverse cross-day meta hour range	Pablo Neira Ayuso	2024-03-20	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'") reverses the hour range in case that a cross-day range is used, eg. meta hour "03:00"-"14:00" counter accept which results in (Sidney, Australia AEDT time): meta hour != "14:00"-"03:00" counter accept kernel handles time in UTC, therefore, cross-day range may not be obvious according to local time. The ruleset listing above is not very intuitive to the reader depending on their timezone, therefore, complete netlink delinearize path to reverse the cross-day meta range. Update manpage to recommend to use a range expression when matching meta hour range. Recommend range expression for meta time and meta day too. Extend testcases/listing/meta_time to cover for this scenario. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: do not merge a set with a erroneous one	Florian Westphal	2024-03-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The included sample causes a crash because we attempt to range-merge a prefix expression with a symbolic expression. The first set is evaluated, the symbol expression evaluation fails and nft queues an error message ("Could not resolve hostname"). However, nft continues evaluation. nft then encounters the same set definition again and merges the new content with the preceeding one. But the first set structure is dodgy, it still contains the unresolved symbolic expression. That then makes nft crash (assert) in the set internals. There are various different incarnations of this issue, but the low level set processing code does not allow for any partially transformed expressions to still remain. Before: nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop BUG: invalid range expression type binop nft: src/expression.c:1479: range_expr_value_low: Assertion `0' failed. After: nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop invalid_range_expr_type_binop:4:18-25: Error: Could not resolve hostname: Name or service not known elements = { 1&.141.0.1 - 192.168.0.2} ^^^^^^^^ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: translate meter into dynamic set	Pablo Neira Ayuso	2024-03-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	129f9d153279 ("nft: migrate man page examples with `meter` directive to sets") already replaced meters by dynamic sets. This patch removes NFT_SET_ANONYMOUS flag from the implicit set that is instantiated via meter, so the listing shows a dynamic set instead which is the recommended approach these days. Therefore, a batch like this: add table t add chain t c add rule t c tcp dport 80 meter m size 128 { ip saddr timeout 1s limit rate 10/second } gets translated to a dynamic set: table ip t { set m { type ipv4_addr size 128 flags dynamic,timeout } chain c { tcp dport 80 update @m { ip saddr timeout 1s limit rate 10/second burst 5 packets } } } Check for NFT_SET_ANONYMOUS flag is also relaxed for list and flush meter commands: # nft list meter ip t m table ip t { set m { type ipv4_addr size 128 flags dynamic,timeout } } # nft flush meter ip t m As a side effect the legacy 'list meter' and 'flush meter' commands allow to flush a dynamic set to retain backward compatibility. This patch updates testcases/sets/0022type_selective_flush_0 and testcases/sets/0038meter_list_0 as well as the json output which now uses the dynamic set representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: permit use of host-endian constant values in set lookup keys	Pablo Neira Ayuso	2024-02-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFL found following crash: table ip filter { map ipsec_in { typeof ipsec in reqid . iif : verdict flags interval } chain INPUT { type filter hook input priority filter; policy drop; ipsec in reqid . 100 @ipsec_in } } Which yields: nft: evaluate.c:1213: expr_evaluate_unary: Assertion `!expr_is_constant(arg)' failed. All existing test cases with constant values use big endian values, but "iif" expects host endian values. As raw values were not supported before, concat byteorder conversion doesn't handle constants. Fix this: 1. Add constant handling so that the number is converted in-place, without unary expression. 2. Add the inverse handling on delinearization for non-interval set types. When dissecting the concat data soup, watch for integer constants where the datatype indicates host endian integer. Last, extend an existing test case with the afl input to cover in/output. A new test case is added to test linearization, delinearization and matching. Based on original patch from Florian Westphal, patch subject and description wrote by him. Fixes: b422b07ab2f9 ("src: permit use of constant values in set lookup keys") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: Describe rt symbol tables	Phil Sutter	2024-01-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a symbol_table_print() wrapper for the run-time populated rt_symbol_tables which formats output similar to expr_describe() and includes the data source. Since these tables reside in struct output_ctx there is no implicit connection between data type and therefore providing callbacks for relevant datat types which feed the data into said wrapper is a simpler solution than extending expr_describe() itself. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: do not allow to chain more than 16 binops	Florian Westphal	2023-12-22	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netlink_linearize.c has never supported more than 16 chained binops. Adding more is possible but overwrites the stack in netlink_gen_bitwise(). Add a recursion counter to catch this at eval stage. Its not enough to just abort once the counter hits NFT_MAX_EXPR_RECURSION. This is because there are valid test cases that exceed this. For example, evaluation of 1 \| 2 will merge the constans, so even if there are a dozen recursive eval calls this will not end up with large binop chain post-evaluation. v2: allow more than 16 binops iff the evaluation function did constant-merging. Signed-off-by: Florian Westphal <fw@strlen.de>
*	intervals: set_to_range can be static	Florian Westphal	2023-12-16	1	-1/+0
\| \| \| \|	Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: reject large raw payload and concat expressions	Florian Westphal	2023-12-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel will reject this too, but unfortunately nft may try to cram the data into the underlying libnftnl expr. This causes heap corruption or BUG: nld buffer overflow: want to copy 132, max 64 After: Error: Concatenation of size 544 exceeds maximum size of 512 udp length . @th,0,512 . @th,512,512 { 47-63 . 0xe373135363130 . 0x33131303735353203 } ^^^^^^^^^ resp. same warning for an over-sized raw expression. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: add and use nft_data_memcpy helper	Florian Westphal	2023-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a stack overflow somewhere in this code, we end up memcpy'ing a way too large expr into a fixed-size on-stack buffer. This is hard to diagnose, most of this code gets inlined so the crash happens later on return from alloc_nftnl_setelem. Condense the mempy into a helper and add a BUG so we can catch the overflow before it occurs. ->value is too small (4, should be 16), but for normal cases (well-formed data must fit into max reg space, i.e. 64 byte) the chain buffer that comes after value in the structure provides a cushion. In order to have the new BUG() not trigger on valid data, bump value to the correct size, this is userspace so the additional 60 bytes of stack usage is no concern. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: reset statement length context before evaluating statement	Pablo Neira Ayuso	2023-12-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch consolidates ctx->stmt_len reset in stmt_evaluate() to avoid this problem. Note that stmt_evaluate_meta() and stmt_evaluate_ct() already reset it after the statement evaluation. Moreover, statement dependency can be generated while evaluating a meta and ct statement. Payload statement dependency already manually stashes this before calling stmt_evaluate(). Add a new stmt_dependency_evaluate() function to stash statement length context when evaluating a new statement dependency and use it for all of the existing statement dependencies. Florian also says: 'meta mark set vlan id map { 1 : 0x00000001, 4095 : 0x00004095 }' will crash. Reason is that the l2 dependency generated here is errounously expanded to a 32bit-one, so the evaluation path won't recognize this as a L2 dependency. Therefore, pctx->stacked_ll_count is 0 and __expr_evaluate_payload() crashes with a null deref when dereferencing pctx->stacked_ll[0]. nft-test.py gains a fugly hack to tolerate '!map typeof vlan id : meta mark'. For more generic support we should find something more acceptable, e.g. !map typeof( everything here is a key or data ) timeout ... tests/py update and assert(pctx->stacked_ll_count) by Florian Westphal. Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: remove xfree() and use plain free()	Thomas Haller	2023-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmalloc() (and similar x-functions) are used for allocation. They wrap malloc()/realloc() but will abort the program on ENOMEM. The meaning of xmalloc() is that it wraps malloc() but aborts on failure. I don't think x-functions should have the notion, that this were potentially a different memory allocator that must be paired with a particular xfree(). Even if the original intent was that the allocator is abstracted (and possibly not backed by standard malloc()/free()), then that doesn't seem a good idea. Nowadays libc allocators are pretty good, and we would need a very special use cases to switch to something else. In other words, it will never happen that xmalloc() is not backed by malloc(). Also there were a few places, where a xmalloc() was already "wrongly" paired with free() (for example, iface_cache_release(), exit_cookie(), nft_run_cmd_from_buffer()). Or note how pid2name() returns an allocated string from fscanf(), which needs to be freed with free() (and not xfree()). This requirement bubbles up the callers portid2name() and name_by_portid(). This case was actually handled correctly and the buffer was freed with free(). But it shows that mixing different allocators is cumbersome to get right. Of course, we don't actually have different allocators and whether to use free() or xfree() makes no different. The point is that xfree() serves no actual purpose except raising irrelevant questions about whether x-functions are correctly paired with xfree(). Note that xfree() also used to accept const pointers. It is bad to unconditionally for all deallocations. Instead prefer to use plain free(). To free a const pointer use free_const() which obviously wraps free, as indicated by the name. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add free_const() and use it instead of xfree()	Thomas Haller	2023-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost everywhere xmalloc() and friends is used instead of malloc(). This is almost everywhere paired with xfree(). xfree() has two problems. First, it brings the wrong notion that xmalloc() should be paired with xfree(), as if xmalloc() would not use the plain malloc() allocator. In practices, xfree() just wraps free(), and it wouldn't make sense any other way. xfree() should go away. This will be addressed in the next commit. The problem addressed by this commit is that xfree() accepts a const pointer. Paired with the practice of almost always using xfree() instead of free(), all our calls to xfree() cast away constness of the pointer, regardless whether that is necessary. Declaring a pointer as const should help us to catch wrong uses. If the xfree() function always casts aways const, the compiler doesn't help. There are many places that rightly cast away const during free. But not all of them. Add a free_const() macro, which is like free(), but accepts const pointers. We should always make an intentional choice whether to use free() or free_const(). Having a free_const() macro makes this very common choice clearer, instead of adding a (void*) cast at many places. Note that we now pair xmalloc() allocations with a free() call (instead of xfree(). That inconsistency will be resolved in the next commit. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	gmputil: add nft_gmp_free() to free strings from mpz_get_str()	Thomas Haller	2023-11-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mpz_get_str() (with NULL as first argument) will allocate a buffer using the allocator functions (mp_set_memory_functions()). We should free those buffers with the corresponding free function. Add nft_gmp_free() for that and use it. The name nft_gmp_free() is chosen because "mini-gmp.c" already has an internal define called gmp_free(). There wouldn't be a direct conflict, but using the same name is confusing. And maybe our own defines should have a clear nft prefix. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>