summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* src: add input flags for nft_ctxThomas Haller2023-08-242-0/+8
| | | | | | | | | | | | Similar to the existing output flags, add input flags. No flags are yet implemented, that will follow. One difference to nft_ctx_output_set_flags(), is that the setter for input flags returns the previously set flags. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ct expectation: fix 'list object x' vs. 'list objects in table' confusionFlorian Westphal2023-07-311-0/+1
| | | | | | | | | | Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
* include: missing dccpopt.h breaks make distcheckPablo Neira Ayuso2023-07-141-0/+1
| | | | | | Add it to Makefile.am. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Implement 'reset {set,map,element}' commandsPhil Sutter2023-07-133-4/+9
| | | | | | | | | | | All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cli: Make cli_init() return to callerPhil Sutter2023-07-041-1/+1
| | | | | | | | | | | | | | | Avoid direct exit() calls as that leaves the caller-allocated nft_ctx object in place. Making sure it is freed helps with valgrind-analyses at least. To signal desired exit from CLI, introduce global cli_quit boolean and make all cli_exit() implementations also set cli_rc variable to the appropriate return code. The logic is to finish CLI only if cli_quit is true which asserts proper cleanup as it is set only by the respective cli_exit() function. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: avoid IPPROTO_MAX for array definitionsFlorian Westphal2023-06-211-1/+1
| | | | | | | | | | | | | | | ip header can only accomodate 8but value, but IPPROTO_MAX has been bumped due to uapi reasons to support MPTCP (262, which is used to toggle on multipath support in tcp). This results in: exthdr.c:349:11: warning: result of comparison of constant 263 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare] if (type < array_size(exthdr_protocols)) ~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ redude array sizes back to what can be used on-wire. Signed-off-by: Florian Westphal <fw@strlen.de>
* ct timeout: fix 'list object x' vs. 'list objects in table' confusionFlorian Westphal2023-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | <empty ruleset> $ nft list ct timeout table t Error: No such file or directory list ct timeout table t ^ This is expected to list all 'ct timeout' objects. The failure is correct, the table 't' does not exist. But now lets add one: $ nft add table t $ nft list ct timeout table t Segmentation fault (core dumped) ... and thats not expected, nothing should be shown and nft should exit normally. Because of missing TIMEOUTS command enum, the backend thinks it should do an object lookup, but as frontend asked for 'list of objects' rather than 'show this object', handle.obj.name is NULL, which then results in this crash. Update the command enums so that backend knows what the frontend asked for. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add json support for last statementPablo Neira Ayuso2023-06-201-0/+2
| | | | | | | | | | This patch adds json support for the last statement, it works for me here. However, tests/py still displays a warning: any/last.t: WARNING: line 12: '{"nftables": [{"add": {"rule": {"family": "ip", "table": "test-ip4", "chain": "input", "expr": [{"last": {"used": 300000}}]}}}]}': '[{"last": {"used": 300000}}]' mismatches '[{"last": null}]' Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* exthdr: add boolean DCCP option matchingJeremy Sowden2023-06-013-0/+45
| | | | | | | | | | Iptables supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend exthdr expressions to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: support bpf id decode in nft list hooksFlorian Westphal2023-05-221-3/+21
| | | | | | | | | | | This allows 'nft list hooks' to also display the bpf program id attached. Example: hook input { -0000000128 nf_hook_run_bpf id 6 .. Signed-off-by: Florian Westphal <fw@strlen.de>
* datatype: add hint error handlerPablo Neira Ayuso2023-05-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | If user provides a symbol that cannot be parsed and the datatype provides an error handler, provide a hint through the misspell infrastructure. For instance: # cat test.nft table ip x { map y { typeof ip saddr : verdict elements = { 1.2.3.4 : filter_server1 } } } # nft -f test.nft test.nft:4:26-39: Error: Could not parse netfilter verdict; did you mean `jump filter_server1'? elements = { 1.2.3.4 : filter_server1 } ^^^^^^^^^^^^^^ While at it, normalize error to "Could not parse symbolic %s expression". Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* meta: introduce meta broute supportSriram Yagnaraman2023-04-291-0/+2
| | | | | | | | | | | Can be used in bridge prerouting hook to divert a packet to the ip stack for routing. This is a replacement for "ebtables -t broute" functionality. Link: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20230224095251.11249-1-sriram.yagnaraman@est.tech/ Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: fix enum/integer mismatchesFlorian Westphal2023-04-292-2/+2
| | | | | | | | | | | | | | | | | | | gcc 13 complains about type confusion: cache.c:1178:5: warning: conflicting types for 'nft_cache_update' due to enum/integer mismatch; have 'int(struct nft_ctx *, unsigned int, struct list_head *, const struct nft_cache_filter *)' [-Wenum-int-mismatch] cache.h:74:5: note: previous declaration of 'nft_cache_update' with type 'int(struct nft_ctx *, enum cmd_ops, struct list_head *, const struct nft_cache_filter *)' Same for: rule.c:1915:13: warning: conflicting types for 'obj_type_name' due to enum/integer mismatch; have 'const char *(enum stmt_types)' [-Wenum-int-mismatch] 1915 | const char *obj_type_name(enum stmt_types type) | ^~~~~~~~~~~~~ expression.c:1543:24: warning: conflicting types for 'expr_ops_by_type' due to enum/integer mismatch; have 'const struct expr_ops *(uint32_t)' {aka 'const struct expr_ops *(unsigned int)'} [-Wenum-int-mismatch] 1543 | const struct expr_ops *expr_ops_by_type(uint32_t value) | ^~~~~~~~~~~~~~~~ Convert to the stricter type (enum) where possible. Signed-off-by: Florian Westphal <fw@strlen.de>
* mnl: set SO_SNDBUF before SO_SNDBUFFORCEPablo Neira Ayuso2023-04-242-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set SO_SNDBUF before SO_SNDBUFFORCE: Unpriviledged user namespace does not have CAP_NET_ADMIN on the host (user_init_ns) namespace. SO_SNDBUF always succeeds in Linux, always try SO_SNDBUFFORCE after it. Moreover, suggest the user to bump socket limits if EMSGSIZE after having see EPERM previously, when calling SO_SNDBUFFORCE. Provide a hint to the user too: # nft -f test.nft netlink: Error: Could not process rule: Message too long Please, rise /proc/sys/net/core/wmem_max on the host namespace. Hint: 4194304 bytes Dave Pfike says: Prior to this patch, nft inside a systemd-nspawn container was failing to install my ruleset (which includes a large-ish map), with the error netlink: Error: Could not process rule: Message too long strace reveals: setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted) This is despite the nspawn process supposedly having CAP_NET_ADMIN. A web search reveals at least one other user having the same issue: https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/ Reported-by: Dave Pifke <dave@pifke.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: support shifts larger than the width of the left operandPablo Neira Ayuso2023-03-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: fix a couple of typo's in commentsJeremy Sowden2023-03-121-1/+1
| | | | | Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* cmd: move command functions to src/cmd.cPablo Neira Ayuso2023-03-112-6/+6
| | | | | | Move several command functions to src/cmd.c to debloat src/rule.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add last statementPablo Neira Ayuso2023-02-282-0/+11
| | | | | | | | | | | | | | | | | | | | | This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: expand table command before evaluationPablo Neira Ayuso2023-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nested syntax notation results in one single table command which includes all other objects. This differs from the flat notation where there is usually one command per object. This patch adds a previous step to the evaluation phase to expand the objects that are contained in the table into independent commands, so both notations have similar representations. Remove the code to evaluate the nested representation in the evaluation phase since commands are independently evaluated after the expansion. The commands are expanded after the set element collapse step, in case that there is a long list of singleton element commands to be added to the set, to shorten the command list iteration. This approach also avoids interference with the object cache that is populated in the evaluation, which might refer to objects coming in the existing command list that is being processed. There is still a post_expand phase to detach the elements from the set which could be consolidated by updating the evaluation step to handle the CMD_OBJ_SETELEMS command type. This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that contains rules") which broke rule addition/insertion by index because the expansion code after the evaluation messes up the cache. Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: use start condition with new destroy commandPablo Neira Ayuso2023-02-211-0/+1
| | | | | | | | | | | | | tests/py reports the following problem: any/ct.t: ERROR: line 116: add rule ip test-ip4 output ct event set new | related | destroy | label: This rule should not have failed. any/ct.t: ERROR: line 117: add rule ip test-ip4 output ct event set new,related,destroy,label: This rule should not have failed. any/ct.t: ERROR: line 118: add rule ip test-ip4 output ct event set new,destroy: This rule should not have failed. Use start condition and update parser to handle 'destroy' keyword. Fixes: e1dfd5cc4c46 ("src: add support to command "destroy") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add support to command "destroy"Fernando F. Mancera2023-02-062-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | "destroy" command performs a deletion as "delete" command but does not fail if the object does not exist. As there is no NLM_F_* flag for ignoring such error, it needs to be ignored directly on error handling. Example of use: # nft list ruleset table ip filter { chain output { } } # nft destroy table ip missingtable # echo $? 0 # nft list ruleset table ip filter { chain output { } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Implement 'reset rule' and 'reset rules' commandsPhil Sutter2023-01-185-1/+15
| | | | | | | | Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add gretap supportPablo Neira Ayuso2023-01-021-0/+2
| | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add geneve matching supportPablo Neira Ayuso2023-01-021-0/+13
| | | | | | Add support for GENEVE vni and (ether) type header field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add gre supportPablo Neira Ayuso2023-01-023-0/+17
| | | | | | | | | | | | | GRE has a number of fields that are conditional based on flags, which requires custom dependency code similar to icmp and icmpv6. Matching on optional fields is not supported at this stage. Since this is a layer 3 tunnel protocol, an implicit dependency on NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this patch adds new infrastructure to remove an outer dependency based on the inner protocol from delinearize path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: display (inner) tag in --debug=proto-ctxPablo Neira Ayuso2023-01-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add vxlan matching supportPablo Neira Ayuso2023-01-026-3/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the initial infrastructure to support for inner header tunnel matching and its first user: vxlan. A new struct proto_desc field for payload and meta expression to specify that the expression refers to inner header matching is used. The existing codebase to generate bytecode is fully reused, allowing for reusing existing supported layer 2, 3 and 4 protocols. Syntax requires to specify vxlan before the inner protocol field: ... vxlan ip protocol udp ... vxlan ip saddr 1.2.3.0/24 This also works with concatenations and anonymous sets, eg. ... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 } You have to restrict vxlan matching to udp traffic, otherwise it complains on missing transport protocol dependency, e.g. ... udp dport 4789 vxlan ip daddr 1.2.3.4 The bytecode that is generated uses the new inner expression: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] JSON support is not included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add dl_proto_ctx()Pablo Neira Ayuso2023-01-021-1/+7
| | | | | | | | | | Add dl_proto_ctx() to access protocol context (struct proto_ctx and struct payload_dep_ctx) from the delinearize path. This patch comes in preparation for supporting outer and inner protocol context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add eval_proto_ctx()Pablo Neira Ayuso2023-01-022-1/+4
| | | | | | | | | | | Add eval_proto_ctx() to access protocol context (struct proto_ctx). Rename struct proto_ctx field to _pctx to highlight that this field is internal and the helper function should be used. This patch comes in preparation for supporting outer and inner protocol context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* xt: Rewrite unsupported compat expression dumpingPhil Sutter2022-12-132-0/+3
| | | | | | | | | Choose a format which provides more information and is easily parseable. Then teach parsers about it and make it explicitly reject the ruleset giving a meaningful explanation. Also update the man pages with some more details. Signed-off-by: Phil Sutter <phil@nwl.cc>
* xt: Purify enum nft_xt_typePhil Sutter2022-12-131-1/+1
| | | | | | Remove NFT_XT_MAX from the enum, it is not a valid xt type. Signed-off-by: Phil Sutter <phil@nwl.cc>
* xt: Delay libxtables access until translationPhil Sutter2022-12-131-5/+4
| | | | | | | | | | | | | | | | | | There is no point in spending efforts setting up the xt match/target when it is not printed afterwards. So just store the statement data from libnftnl in struct xt_stmt and perform the extension lookup from xt_stmt_xlate() instead. This means some data structures are only temporarily allocated for the sake of passing to libxtables callbacks, no need to drag them around. Also no need to clone the looked up extension, it is needed only to call the functions it provides. While being at it, select numeric output in xt_xlate_*_params - otherwise there will be reverse DNS lookups which should not happen by default. Signed-off-by: Phil Sutter <phil@nwl.cc>
* Warn for tables with compat expressions in rulesPhil Sutter2022-11-181-0/+1
| | | | | | | | | | | | | | | | | | While being able to "look inside" compat expressions using nft is a nice feature, it is also (yet another) pitfall for unaware users, deceiving them into assuming interchangeability (or at least compatibility) between iptables-nft and nft. In reality, which involves 'nft list ruleset | nft -f -', any correctly translated compat expressions will turn into native nftables ones not understood by (the version of) iptables-nft which created them in the first place. Other compat expressions will vanish, potentially compromising the firewall ruleset. Emit a warning (as comment) to give users a chance to stop and reconsider before shooting their own foot. Signed-off-by: Phil Sutter <phil@nwl.cc>
* parser_bison: display too many levels of nesting errorPablo Neira Ayuso2022-10-071-0/+1
| | | | | | | | | | | | | Instead of hitting this assertion: nft: parser_bison.y:70: open_scope: Assertion `state->scope < array_size(state->scopes) - 1' failed. Aborted this is easier to trigger with implicit chains where one level of nesting from the existing chain scope is supported. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1615 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* include: resync nf_tables.h cache copyPablo Neira Ayuso2022-09-161-1/+27
| | | | | | Get this header in sync with nf-next as of 6.0-rc. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* expr: update EXPR_MAX and add missing commentsFlorian Westphal2022-08-301-1/+6
| | | | | | | | | | | | | | | WHen flagcmp and catchall expressions got added the EXPR_MAX definition wasn't changed. Should have no impact in practice however, this value is only checked to prevent crash when old nft release is used to list a ruleset generated by a newer nft release and a unknown 'typeof' expression. v2: Pablo suggested to add EXPR_MAX into enum so its easier to spot. Adding __EXPR_MAX + define EXPR_MAX (__EXPR_MAX - 1) causes '__EXPR_MAX not handled in switch' warnings, hence the 'EXPR_MAX =' solution. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: also postprocess OP_AND in set element contextFlorian Westphal2022-08-051-1/+3
| | | | | | | | | | | | | Pablo reports: add rule netdev nt y update @macset { vlan id timeout 5s } listing still shows the raw expression: update @macset { @ll,112,16 & 0xfff timeout 5s } so also cover the 'set element' case. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* proto: track full stack of seen l2 protocols, not just cumulative offsetFlorian Westphal2022-08-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | For input, a cumulative size counter of all pushed l2 headers is enough, because we have the full expression tree available to us. For delinearization we need to track all seen l2 headers, else we lose information that we might need at a later time. Consider: rule netdev nt nc set update ether saddr . vlan id during delinearization, the vlan proto_desc replaces the ethernet one, and by the time we try to split the concatenation apart we will search the ether saddr offset vs. the templates for proto_vlan. This replaces the offset with an array that stores the protocol descriptions seen. Then, if the payload offset is larger than our description, search the l2 stack and adjust the offset until we're within the expected offset boundary. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: postprocess binary ands in concatenationsFlorian Westphal2022-08-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | Input: update ether saddr . vlan id timeout 5s @macset ether saddr . vlan id @macset Before this patch, gets rendered as: update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s } @ll,48,48 . @ll,112,16 & 0xfff @macset After this, listing will show: update @macset { @ll,48,48 . vlan id timeout 5s } @ll,48,48 . vlan id @macset The @ll, ... is due to vlan description replacing the ethernet one, so payload decode fails to take the concatenation apart (the ethernet header payload info is matched vs. vlan template). This will be adjusted by a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>
* cache: prepare nft_cache_evaluate() to return errorPablo Neira Ayuso2022-07-181-2/+3
| | | | | | | Move flags as parameter reference and add list of error messages to prepare for sanity checks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* scanner: don't pop active flex scanner scopeFlorian Westphal2022-06-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can pop a flex scope that is still active, i.e. the scanner_pop_start_cond() for the scope has not been done. Example: counter ipsec out ip daddr 192.168.1.2 counter name "ipsec_out" Here, parser fails because 'daddr' is parsed as STRING, not as DADDR token. Bug is as follows: COUNTER changes scope to COUNTER. (COUNTER). Next, IPSEC scope gets pushed, stack is: COUNTER, IPSEC. Then, the 'COUNTER' scope close happens. Because active scope has changed, we cannot pop (we would pop the 'ipsec' scope in flex). The pop operation gets delayed accordingly. Next, IP gets pushed, stack is: COUNTER, IPSEC, IP, plus the information that one scope closure/pop was delayed. Then, the IP scope is closed. Because a pop operation was delayed, we pop again, which brings us back to COUNTER state. This is bogus: The pop operation CANNOT be done yet, because the ipsec scope is still open, but the existing code lacks the information to detect this. After popping the IP scope, we must remain in IPSEC scope until bison parser calls scanner_pop_start_cond(, IPSEC). This adds a counter per flex scope so that we can detect this case. In above case, after the IP scope gets closed, the "new" (previous) scope (IPSEC) will be treated as active and its close is attempted again on the next call to scanner_pop_start_cond(). After this patch, transition in above rule is: push counter (COUNTER) push IPSEC (COUNTER, IPSEC) pop COUNTER (delayed: COUNTER, IPSEC, pending-pop for COUNTER), push IP (COUNTER, IPSEC, IP, pending-pop for COUNTER) pop IP (COUNTER, IPSEC, pending-pop for COUNTER) parse DADDR (we're in IPSEC scope, its valid token) pop IPSEC (pops all remaining scopes). We could also resurrect the commit: "scanner: flags: move to own scope", the test case passes with the new scope closure logic. Fixes: bff106c5b277 ("scanner: add support for scope nesting") Signed-off-by: Florian Westphal <fw@strlen.de>
* mnl: store netlink error location for set elementsPablo Neira Ayuso2022-06-271-4/+5
| | | | | | | | | | | | | | | | | Store set element location in the per-command netlink error location array. This allows for fine grain error reporting when adding and deleting elements. # nft -f test.nft test.nft:5:4-20: Error: Could not process rule: File exists 00:01:45:09:0b:26 : drop, ^^^^^^^^^^^^^^^^^ test.nft contains a large map with one redundant entry. Thus, users do not have to find the needle in the stack. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove NFT_NLATTR_LOC_MAX limit for netlink location error reportingPablo Neira Ayuso2022-06-271-5/+8
| | | | | | | Set might have more than 16 elements, use a runtime array to store netlink error location. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* intervals: Do not sort cached set elements over and over againPhil Sutter2022-06-191-0/+1
| | | | | | | | | | | | | | | | | | | | When adding element(s) to a non-empty set, code merged the two lists and sorted the result. With many individual 'add element' commands this causes substantial overhead. Make use of the fact that existing_set->init is sorted already, sort only the list of new elements and use list_splice_sorted() to merge the two sorted lists. Add set_sort_splice() and use it for set element overlap detection and automerge. A test case adding ~25k elements in individual commands completes in about 1/4th of the time with this patch applied. Joint work with Pablo. Fixes: 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* rule: collapse set element commandsPablo Neira Ayuso2022-06-192-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Robots might generate a long list of singleton element commands such as: add element t s { 1.0.1.0/24 } ... add element t s { 1.0.2.0/23 } collapse them into one single command before the evaluation step, ie. add element t s { 1.0.1.0/24, ..., 1.0.2.0/23 } this speeds up overlap detection and set element automerge operations in this worst case scenario. Since 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements"), the new interval tracking relies on mergesort. The pattern above triggers the set sorting for each element. This patch adds a list to cmd objects that store collapsed commands. Moreover, expressions also contain a reference to the original command, to uncollapse the commands after the evaluation step. These commands are uncollapsed after the evaluation step to ensure error reporting works as expected (command and netlink message are mapped 1:1). For the record: - nftables versions <= 1.0.2 did not perform any kind of overlap check for the described scenario above (because set cache only contained elements in the kernel in this case). This is a problem for kernels < 5.7 which rely on userspace to detect overlaps. - the overlap detection could be skipped for kernels >= 5.7. - The extended netlink error reporting available for set elements since 5.19-rc might allow to remove the uncollapse step, in this case, error reporting does not rely on the netlink sequence to refer to the command triggering the problem. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Revert "scanner: flags: move to own scope"Florian Westphal2022-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | Excess nesting of scanner scopes is very fragile and error prone: rule `iif != lo ip daddr 127.0.0.1/8 counter limit rate 1/second log flags all prefix "nft_lo4 " drop` fails with `Error: No symbol type information` hinting at `prefix` Problem is that we nest via: counter limit log flags By the time 'prefix' is scanned, state is still stuck in 'counter' due to this nesting. Working around "prefix" isn't enough, any other keyword, e.g. "level" in 'flags all level debug' will be parsed as 'string' too. So, revert this. Fixes: a16697097e2b ("scanner: flags: move to own scope") Reported-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* intervals: support to partial deletion with automergePablo Neira Ayuso2022-04-132-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Splice the existing set element cache with the elements to be deleted and merge sort it. The elements to be deleted are identified by the EXPR_F_REMOVE flag. The set elements to be deleted is automerged in first place if the automerge flag is set on. There are four possible deletion scenarios: - Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set. - Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b. - Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a. - Split range, eg. delete [a-b] from range [x-y] where x < a and b < y. Update nft_evaluate() to use the safe list variant since new commands are dynamically registered to the list to update ranges. This patch also restores the set element existence check for Linux kernels <= 5.7. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* intervals: add support to automerge with kernel elementsPablo Neira Ayuso2022-04-131-1/+2
| | | | | | | | | | | | | | | | | | Extend the interval codebase to support for merging elements in the kernel with userspace element updates. Add a list of elements to be purged to cmd and set objects. These elements representing outdated intervals are deleted before adding the updated ranges. This routine splices the list of userspace and kernel elements, then it mergesorts to identify overlapping and contiguous ranges. This splice operation is undone so the set userspace cache remains consistent. Incrementally update the elements in the cache, this allows to remove dd44081d91ce ("segtree: Fix add and delete of element in same batch"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: update mnl_nft_setelem_del() to allow for more reusePablo Neira Ayuso2022-04-131-1/+2
| | | | | | Pass handle and element list as parameters to allow for code reuse. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove rbtree datastructurePablo Neira Ayuso2022-04-132-99/+0
| | | | | | Not used by anyone anymore, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>