summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* src: move fuzzer functionality to separate toolFlorian Westphal2025-11-201-25/+0
| | | | | | | | | | | | | This means some loss of functionality since you can no longer combine --fuzzer with options like --debug, --define, --include. On the upside, this adds new --random-outflags mode which will randomly switch --terse, --numeric, --echo ... on/off. Update README to reflect this change. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* support for afl++ (american fuzzy lop++) fuzzerFlorian Westphal2025-11-112-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afl comes with a compiler frontend that can add instrumentation suitable for running nftables via the "afl-fuzz" fuzzer. This change adds a "--with-fuzzer" option to configure script and enables specific handling in nftables and libnftables to speed up the fuzzing process. It also adds the "--fuzzer" command line option. afl-fuzz initialisation gets delayed until after the netlink context is set up and symbol tables such as (e.g. route marks) have been parsed. When afl-fuzz restarts the process with a new input round, it will resume *after* this point (see __AFL_INIT macro in main.c). With --fuzzer <stage>, nft will perform multiple fuzzing rounds per invocation: this increases processing rate by an order of magnitude. The argument to '--fuzzer' specifies the last stage to run: 1: 'parser': Only run / exercise the flex/bison parser. 2: 'eval': stop after the evaluation phase. This attempts to build a complete ruleset in memory, does symbol resolution, adds needed shift/masks to payload instructions etc. 3: 'netlink-ro': 'netlink-ro' builds the netlink buffer to send to the kernel, without actually doing so. 4: 'netlink-rw': Pass generated command/ruleset will be passed to the kernel. You can combine it with the '--check' option to send data to the kernel but without actually committing any changes. This could still end up triggering a kernel crash if there are bugs in the valiation / transaction / abort phases. Use 'netlink-ro' if you want to prevent nft from ever submitting any changes to the kernel or if you are only interested in fuzzing nftables and its libraries. In case a kernel splat is detected, the fuzzing process stops and all further fuzzer attemps are blocked until reboot. Signed-off-by: Florian Westphal <fw@strlen.de>
* rule: add missing documentation for cmd_obj enumFernando Fernandez Mancera2025-11-051-0/+3
| | | | | | | | In cmd_obj enum hooks, tunnel and tunnels elements documentation were missing. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
* utils: Cover for missing newline after BUG() messagesPhil Sutter2025-10-301-1/+1
| | | | | | | | | | | Relieve callers from having to suffix their messages with a newline escape sequence, have the macro append it to the format string instead. This is mostly a fix for (the many) calls to BUG() without a newline suffix. Adjust the previously correct ones since they emit an extra newline now. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add refcount assertsFlorian Westphal2025-10-292-1/+9
| | | | | | | | | | | | | | | | _get() functions must not be used when refcnt is 0, as expr_free() releases expressions on 1 -> 0 transition. Also, check that a refcount would not overflow from UINT_MAX to 0. Use INT_MAX to also catch refcount leaks sooner, we don't expect 2**31 get()s on same object. This helps catching use-after-free refcounting bugs even when nft is built without ASAN support. v3: use a macro + BUG to get more info without a coredump. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: fix fmt string warningsFlorian Westphal2025-10-233-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for some reason several functions had a __gmp_fmtstring annotation, but that was an empty macro. After fixing it up, we get several new warnings: In file included from src/datatype.c:28: src/datatype.c:174:24: note: in expansion of macro 'error' 174 | return error(&sym->location, | ^~~~~ src/datatype.c:405:24: note: in expansion of macro 'error' 405 | return error(&sym->location, "Could not parse %s; did you mean `%s'?", | ^~~~~ Fmt string says '%s', but unqailified void *, add 'const char *' cast, it is safe in both cases. In file included from src/evaluate.c:29: src/evaluate.c: In function 'byteorder_conversion': src/evaluate.c:232:35: warning: format '%s' expects a matching 'char *' argument [-Wformat=] 232 | "Byteorder mismatch: %s expected %s, %s got %s", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Actual bug, fmt string has one '%s' too many, remove it. All other warnings were due to '%u' instead of '%lu' / '%zu'. Signed-off-by: Florian Westphal <fw@strlen.de>
* meta: introduce meta ibrhwaddr supportFernando Fernandez Mancera2025-10-141-0/+2
| | | | | | | | | | | | | | | Can be used in bridge prerouting hook to redirect the packet to the receiving physical device for processing. table bridge nat { chain PREROUTING { type filter hook prerouting priority 0; policy accept; ether daddr de:ad:00:00:be:ef meta pkttype set host ether daddr set meta ibrhwaddr accept } } Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
* mnl: Support simple wildcards in netdev hooksPhil Sutter2025-09-301-0/+2
| | | | | | | | | | | | When building NFTA_{FLOWTABLE_,}HOOK_DEVS attributes, detect trailing asterisks in interface names and transmit the leading part in a NFTA_DEVICE_PREFIX attribute. Deserialization (i.e., appending asterisk to interface prefixes returned in NFTA_DEVICE_PREFIX atributes happens in libnftnl. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* monitor: Inform JSON printer when reporting an object delete eventPhil Sutter2025-09-111-2/+3
| | | | | | | | | | | Since kernel commit a1050dd07168 ("netfilter: nf_tables: Reintroduce shortened deletion notifications"), type-specific data is no longer dumped when notifying for a deleted object. JSON output was not aware of this and tried to print bogus data. Fixes: 9e88aae28e9f4 ("monitor: Use libnftables JSON output") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* table: Embed creating nft version into userdataPhil Sutter2025-08-281-0/+1
| | | | | | | | | | Upon listing a table which was created by a newer version of nftables, warn about the potentially incomplete content. Suggested-by: Florian Westphal <fw@strlen.de> Cc: Dan Winship <danwinship@redhat.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace compound_expr_alloc() by type safe functionPablo Neira Ayuso2025-08-271-2/+0
| | | | | | | Replace compound_expr_alloc() by {set,list,concat}_expr_alloc() to validate expression type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* expression: replace compound_expr_remove() by type safe functionPablo Neira Ayuso2025-08-271-1/+3
| | | | | | | Replace this function by {list,concat,set}_expr_remove() to validate expression type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* expression: remove compound_expr_add()Pablo Neira Ayuso2025-08-271-1/+0
| | | | | | | No more users of this function after conversion to type safe variant, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace compound_expr_add() by type safe list_expr_add()Pablo Neira Ayuso2025-08-271-0/+1
| | | | | | Replace compound_expr_add() by list_expr_add() to validate type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace compound_expr_add() by type safe concat_expr_add()Pablo Neira Ayuso2025-08-271-0/+1
| | | | | | Replace compound_expr_add by concat_expr_add() to validate type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace compound_expr_add() by type safe set_expr_add()Pablo Neira Ayuso2025-08-271-0/+3
| | | | | | | | Replace compound_expr_add() by set_expr_add() to validate type. Add __set_expr_add() to skip size updates in src/intervals.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add expr_type_catchall() helper and use itPablo Neira Ayuso2025-08-271-0/+3
| | | | | | Add helper function to check if this is a catchall expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tunnel: add tunnel object and statement json supportFernando Fernandez Mancera2025-08-272-0/+6
| | | | | Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tunnel: add geneve supportPablo Neira Ayuso2025-08-271-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | This patch extends the tunnel metadata object to define geneve tunnel specific configurations: table netdev x { tunnel y { id 10 ip saddr 192.168.2.10 ip daddr 192.168.2.11 sport 10 dport 20 ttl 10 geneve { class 0x1010 opt-type 0x1 data "0x12345678" class 0x1020 opt-type 0x2 data "0x87654321" class 0x2020 opt-type 0x3 data "0x87654321abcdeffe" } } } Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tunnel: add vxlan supportFernando Fernandez Mancera2025-08-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | This patch extends the tunnel metadata object to define vxlan tunnel specific configurations: table netdev x { tunnel y { id 10 ip saddr 192.168.2.10 ip daddr 192.168.2.11 sport 10 dport 20 ttl 10 vxlan { gbp 200 } } } Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add tunnel statement and expression supportPablo Neira Ayuso2025-08-272-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to attach tunnel metadata through the tunnel statement. The following example shows how to redirect traffic to the erspan0 tunnel device which will take the tunnel configuration that is specified by the ruleset. table netdev x { tunnel y { id 10 ip saddr 192.168.2.10 ip daddr 192.168.2.11 sport 10 dport 20 ttl 10 erspan { version 1 index 2 } } chain x { type filter hook ingress device veth0 priority 0; ip daddr 10.141.10.123 tunnel name y fwd to erspan0 } } This patch also allows to match on tunnel metadata via tunnel expression. Joint work with Fernando. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tunnel: add erspan supportPablo Neira Ayuso2025-08-271-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the tunnel metadata object to define erspan tunnel specific configurations: table netdev x { tunnel y { id 10 ip saddr 192.168.2.10 ip daddr 192.168.2.11 sport 10 dport 20 ttl 10 erspan { version 1 index 2 } } } Joint work with Fernando. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add tunnel template supportPablo Neira Ayuso2025-08-272-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds tunnel template support, this allows to attach a metadata template that provides the configuration for the tunnel driver. Example of generic tunnel configuration: table netdev x { tunnel y { id 10 ip saddr 192.168.2.10 ip daddr 192.168.2.11 sport 10 dport 20 ttl 10 } } This still requires the tunnel statement to attach this metadata template, this comes in a follow up patch. Joint work with Fernando. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* fib: restore JSON output for relational expressionsPablo Neira Ayuso2025-08-201-1/+1
| | | | | | | | | | | | | | | | | | | JSON output for the fib expression changed: - "result": "check" + "result": "oif" This breaks third party JSON parsers, revert this change for relational expressions only via workaround until there are clear rules on how to proceed with JSON schema updates. As for set and map statements, keep this new "check" result type since it is not possible to peek on rhs in such case to guess if the NFT_FIB_F_PRESENT flag needs to be set on. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1806 Fixes: f4b646032acf ("fib: allow to check if route exists in maps") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* expression: Introduce is_symbol_value_expr() macroPhil Sutter2025-07-311-0/+2
| | | | | | | | | Annotate and combine the 'etype' and 'symtype' checks done in bison parser for readability and because JSON parser will start doing the same in a follow-up patch. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: Support NFNL_HOOK_TYPE_NFT_FLOWTABLEPhil Sutter2025-07-151-0/+2
| | | | | | | | New kernels dump info for flowtable hooks the same way as for base chains. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de>
* src: detach set, list and concatenation expression layoutPablo Neira Ayuso2025-07-101-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These three expressions use the same layout, but they have a different purpose. Several fields are specific of a given expression: - set_flags is only required by set expressions. - field_len and field_count are only used by concatenation expressions. Add accessors to validate the expression type before accessing the union fields: #define expr_set(__expr) (assert((__expr)->etype == EXPR_SET), &(__expr)->expr_set) #define expr_concat(__expr) (assert((__expr)->etype == EXPR_CONCAT), &(__expr)->expr_concat) #define expr_list(__expr) (assert((__expr)->etype == EXPR_LIST), &(__expr)->expr_list) This should help catch subtle bugs due to type confusion. assert() could be later enabled only in debugging builds to run tests, keep it by now. compound_expr_*() still works and it needs the same initial layout for all of these expressions: struct list_head expressions; unsigned int size; This is implicitly reducing the size of one of the largest structs in the union area of struct expr, still EXPR_SET_ELEM remains the largest so no gain is achieved in this iteration. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add conntrack information to trace monitor modeFlorian Westphal2025-07-081-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upcoming kernel change provides the packets conntrack state in the trace message data. This allows to see if packet is seen as original or reply, the conntrack state (new, establieshed, related) and the status bits which show if e.g. NAT was applied. Alsoi include conntrack ID so users can use conntrack tool to query the kernel for more information via ctnetlink. This improves debugging when e.g. packets do not pick up the expected NAT mapping, which could e.g. also happen because of expectations following the NAT binding of the owning conntrack entry. Example output ("conntrack: " lines are new): trace id 32 t PRE_RAW packet: iif "enp0s3" ether saddr [..] trace id 32 t PRE_RAW rule tcp flags syn meta nftrace set 1 (verdict continue) trace id 32 t PRE_RAW policy accept trace id 32 t PRE_MANGLE conntrack: ct direction original ct state new ct id 2641368242 trace id 32 t PRE_MANGLE packet: iif "enp0s3" ether saddr [..] trace id 32 t ct_new_pre rule jump rpfilter (verdict jump rpfilter) trace id 32 t PRE_MANGLE policy accept trace id 32 t INPUT conntrack: ct direction original ct state new ct status dnat-done ct id 2641368242 trace id 32 t INPUT packet: iif "enp0s3" [..] trace id 32 t public_in rule tcp dport 443 accept (verdict accept) v3: remove clash bit again, kernel won't expose it anymore. v2: add more status bits: helper, clash, offload, hw-offload. add flag explanation to documentation. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: split monitor trace code into new trace.cFlorian Westphal2025-07-072-5/+8
| | | | | | | | Preparation patch to avoid putting more trace functionality into netlink.c. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* fib: allow to check if route exists in mapsPablo Neira Ayuso2025-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | f686a17eafa0 ("fib: Support existence check") adds EXPR_F_BOOLEAN as a workaround to infer from the rhs of the relational expression if the fib lookup wants to check for a specific output interface or, instead, simply check for existence. This, however, does not work with maps. The NFT_FIB_F_PRESENT flag can be used both with NFT_FIB_RESULT_OIF and NFT_FIB_RESULT_OFINAME, my understanding is that they serve the same purpose which is to check if a route exists, so they are redundant. Add a 'check' fib result to check for routes while still keeping the inference workaround for backward compatibility, but prefer the new syntax in the listing. Update man nft(8) and tests/py. Fixes: f686a17eafa0 ("fib: Support existence check") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: pass name to cache_add()Pablo Neira Ayuso2025-06-231-1/+1
| | | | | | | | Consolidate the name hash in the cache_add() function. No functional changes are intended. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: print count variable in normal set listingsFlorian Westphal2025-06-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Also print the number of allocated set elements if the set provided an upper size limit and there is at least one element. Example: table ip t { set s { type ipv4_addr size 65535 # count 1 flags dynamic counter elements = { 1.1.1.1 counter packets 1 bytes 11 } } ... JSON output is unchanged as this only has informational purposes. This change breaks tests, followup patch addresses this. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: restrict allowed subtypes of concatenationsFlorian Westphal2025-06-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to restrict this, included bogon asserts with: BUG: unknown expression type prefix nft: src/netlink_linearize.c:940: netlink_gen_expr: Assertion `0' failed. Prefix expressions are only allowed if the concatenation is used within a set element, not when specifying the lookup key. For the former, anything that represents a value is allowed. For the latter, only what will generate data (fill a register) is permitted. At this time we do not have an annotation that tells if the expression is on the left hand side (lookup key) or right hand side (set element). Add a new list recursion counter for this. If its 0 then we're building the lookup key, if its the latter the concatenation is the RHS part of a relational expression and prefix, ranges and so on are allowed. IOW, we don't really need a recursion counter, another type of annotation that would tell if the expression is placed on the left or right hand side of another expression would work too. v2: explicitly list all 'illegal' expression types instead of using a default label for them. This will raise a compiler warning to remind us to adjust the case labels in case a new expression type gets added in the future. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: rename recursion counter to recursion.binopFlorian Westphal2025-06-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | The existing recursion counter is used by the binop expression to detect if we've completely followed all the binops. We can only chain up to NFT_MAX_EXPR_RECURSION binops, but the evaluation step can perform constant-folding, so we must recurse until we found the rightmost (last) binop in the chain. Then we can check the post-eval chain to see if it is something that can be serialized later (i.e., if we are within the NFT_MAX_EXPR_RECURSION after constant folding) or not. Thus we can't reuse the existing ctx->recursion counter for other expressions; entering the initial expr_evaluate_binop with ctx->recursion > 0 would break things. Therefore rename this to an embedded structure. This allows us to add a new recursion counter in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink: Pass netlink_ctx to netlink_delinearize_setelem()Phil Sutter2025-05-251-3/+3
| | | | | | | | | Prepare for calling netlink_io_error() which needs the context pointer. Trade this in for the cache pointer since no caller uses a special one. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove flagcmp expressionPablo Neira Ayuso2025-03-272-14/+0
| | | | | | | | | | | | | | This expression is not used anymore, since: ("src: transform flag match expression to binop expression from parser") remove it. This completes the revert of c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations"), except the parser chunk for backwards compatibility. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: transform flag match expression to binop expression from parserPablo Neira Ayuso2025-03-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Transform flagcmp expression to a relational with binop on the left hand side, ie. relational / \ binop value / \ payload mask Add list_expr_to_binop() to make this transformation. Goal is two-fold: - Allow -o/--optimize to pick up on this representation. - Remove the flagcmp expression in a follow up patch. This prepare for the removal of the flagcmp expression added by: c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* expression: add __EXPR_MAX and use it to define EXPR_MAXPablo Neira Ayuso2025-03-271-2/+2
| | | | | | | | | | | | EXPR_MAX was never updated to the newest expression, add __EXPR_MAX and use it to define EXPR_MAX. Add case to expr_ops() other gcc complains with a warning on the __EXPR_MAX case is not handled. Fixes: 347039f64509 ("src: add symbol range expression to further compact intervals") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace struct stmt_ops by type field in struct stmtPablo Neira Ayuso2025-03-185-2/+14
| | | | | | | | | | | | | | | | | | | | | Shrink struct stmt in 8 bytes. __stmt_ops_by_type() provides an operation for STMT_INVALID since this is required by -o/--optimize. There are many checks for stmt->ops->type, which is the most accessed field, that can be trivially replaced. BUG() uses statement type enum instead of name. Similar to: 68e76238749f ("src: expr: add and use expr_name helper"). 72931553828a ("src: expr: add expression etype") 2cc91e6198e7 ("src: expr: add and use internal expr_ops helper") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: print set element with multi-word description in single one linePablo Neira Ayuso2025-03-182-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the set element: - represents a mapping - has a timeout - has a comment - has counter/quota/limit - concatenation (already printed in a single line before this patch) ie. if the set element requires several words, then print it in one single line. Before this patch: table ip x { set y { typeof ip saddr counter elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0, 192.168.10.135 counter packets 0 bytes 0 } } } After this patch: table ip x { set y { typeof ip saddr counter elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0, 192.168.10.135 counter packets 0 bytes 0 } } } Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: support for bitfield payload statement with binary ↵Pablo Neira Ayuso2025-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | operation Add a new function to deal with payload statement delinearization with binop expression. Infer the payload offset from the mask, then walk the template list to determine if estimated offset falls within a matching header field. If so, then validate that this is not a raw expression but an actual bitfield matching. Finally, trim the payload expression length accordingly and adjust the payload offset. instead of: @nh,8,5 set 0x0 it displays: ip dscp and 0x1 Update tests/py to cover for this enhancement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tcpopt: add symbol table for mptcp suboptionsFlorian Westphal2025-03-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nft can be used t match on specific multipath tcp subtypes: tcp option mptcp subtype 0 However, depending on which subtype to match, users need to look up the type/value to use in rfc8684. Add support for mnemonics and "nft describe tcp option mptcp subtype" to get the subtype list. Because the number of unique 'enum datatypes' is limited by ABI contraints this adds a new mptcp suboption type as integer alias. After this patch, nft supports all of the following: add element t s { mp-capable } add rule t c tcp option mptcp subtype mp-capable add rule t c tcp option mptcp subtype { mp-capable, mp-fail } For the 3rd case, listing will break because unlike for named sets, nft lacks the type information needed to pretty-print the integer values, i.e. nft will print the 3rd rule as 'subtype { 0, 6 }'. This is resolved in a followup patch. Other problematic constructs are: set s1 { typeof tcp option mptcp subtype . ip saddr elements = { mp-fail . 1.2.3.4 } } Followed by: tcp option mptcp subtype . ip saddr @s1 nft will print this as: tcp option mptcp unknown & 240) >> 4 . ip saddr @s1 All of these issues are not related to this patch, however, they also occur with other bit-sized extheader fields. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add symbol range expression to further compact intervalsPablo Neira Ayuso2025-02-211-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | Update parser to use a new symbol range expression with smaller memory footprint than range expression + two symbol expressions. The evaluation step translates this into EXPR_RANGE_VALUE for interval sets. Note that maps or concatenations still use the less compact range expressions representation, those require more work to use this new symbol range expression. The parser also uses the classic range expression if variables are used. Testing with a 100k intervals, worst case scenario: no prefix or singleton elements. This shows a reduction from 49.58 Mbytes to 35.47 Mbytes (-29.56% memory footprint for this case). This follow up work to previous commits: 91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption") c9ee9032b0ee ("src: add EXPR_RANGE_VALUE expression and use it") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add and use payload_expr_trim_forceFlorian Westphal2025-02-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous commit fixed erroneous handling of raw expressions when RHS sets a zero value. Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0 Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \ @ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000 After this patch, this will instead display: @ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0 payload_expr_trim_force() only works when the payload has no known protocol (template) attached, i.e. will be printed as raw payload syntax. It performs sanity checks on @mask and then adjusts the payload expression length and offset according to the mask. Also add this check in __binop_postprocess() so we can also discard masks when matching, e.g. '@ih,7,5 2' becomes '@ih,7,5 0x2', not '@ih,0,16 & 0xffc0 == 0x20'. binop_postprocess now returns if it performed an action or not; if this returns true then arguments might have been freed so callers must no longer refer to any of the expressions attached to the binop. Next patch adds test cases for this. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: rework singleton interval transformation to reduce memory consumptionPablo Neira Ayuso2025-01-103-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | set_to_intervals() expands range expressions into a list of singleton elements before building the netlink message that is sent to userspace. This is because the kernel expects this list of singleton elements where EXPR_F_INTERVAL_END denotes a closing interval. This expansion significantly increases memory consumption in userspace. This patch updates the logic to transform the range expression up to two temporary singleton element expressions through setelem_to_interval(). Then, these two elements are used to allocate the nftnl_set_elem objects through alloc_nftnl_setelem_interval() to build the netlink message, finally all these temporary objects are released. For anonymous sets, when adjacent ranges are found, the end element is not added to the set to pack the set representation as in the original set_to_intervals() routine. After this update, set_to_intervals() only deals with adding the non-matching all zero element to the interval set when it is not there as the kernel expects. In combination with the new EXPR_RANGE_VALUE expression, this shrinks runtime userspace memory consumption from 70.50 Mbytes to 43.38 Mbytes for a 100k intervals set sample. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* rule: constify set_is_non_concat_range()Pablo Neira Ayuso2025-01-101-1/+1
| | | | | | This is read-only, constify it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add EXPR_RANGE_VALUE expression and use itPablo Neira Ayuso2025-01-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set element with range takes 4 instances of struct expr: EXPR_SET_ELEM -> EXPR_RANGE -> (2) EXPR_VALUE where EXPR_RANGE represents two references to struct expr with constant value. This new EXPR_RANGE_VALUE trims it down to two expressions: EXPR_SET_ELEM -> EXPR_RANGE_VALUE with two direct low and high values that represent the range: struct { mpz_t low; mpz_t high; }; this two new direct values in struct expr do not modify its size. setelem_expr_to_range() translates EXPR_RANGE to EXPR_RANGE_VALUE, this conversion happens at a later stage. constant_range_expr_print() translates this structure to constant values to reuse the existing datatype_print() which relies in singleton values. The automerge routine has been updated to use EXPR_RANGE_VALUE. This requires a follow up patch to rework the conversion from range expression to singleton element to provide a noticeable memory consumption reduction. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: shrink line_offset in struct location to 4 bytesPablo Neira Ayuso2025-01-021-2/+1
| | | | | | | | | line_offset of 2^32 bytes should be enough. This requires the removal of the last_line field (in a previous patch) to shrink struct expr to 112 bytes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove last_line from struct locationPablo Neira Ayuso2025-01-021-1/+0
| | | | | | | | This 4 bytes field is never used, remove it. This does not shrink struct location in x86_64 due to alignment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove unused token_offset from struct locationPablo Neira Ayuso2025-01-021-1/+0
| | | | | | | | | This saves 8 bytes in x86_64 in struct location which is embedded in every expression. This shrinks struct expr to 120 bytes according to pahole. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>