summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* netlink_delinearize: also postprocess OP_AND in set element contextFlorian Westphal2022-08-051-1/+3
| | | | | | | | | | | | | Pablo reports: add rule netdev nt y update @macset { vlan id timeout 5s } listing still shows the raw expression: update @macset { @ll,112,16 & 0xfff timeout 5s } so also cover the 'set element' case. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* proto: track full stack of seen l2 protocols, not just cumulative offsetFlorian Westphal2022-08-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | For input, a cumulative size counter of all pushed l2 headers is enough, because we have the full expression tree available to us. For delinearization we need to track all seen l2 headers, else we lose information that we might need at a later time. Consider: rule netdev nt nc set update ether saddr . vlan id during delinearization, the vlan proto_desc replaces the ethernet one, and by the time we try to split the concatenation apart we will search the ether saddr offset vs. the templates for proto_vlan. This replaces the offset with an array that stores the protocol descriptions seen. Then, if the payload offset is larger than our description, search the l2 stack and adjust the offset until we're within the expected offset boundary. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: postprocess binary ands in concatenationsFlorian Westphal2022-08-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | Input: update ether saddr . vlan id timeout 5s @macset ether saddr . vlan id @macset Before this patch, gets rendered as: update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s } @ll,48,48 . @ll,112,16 & 0xfff @macset After this, listing will show: update @macset { @ll,48,48 . vlan id timeout 5s } @ll,48,48 . vlan id @macset The @ll, ... is due to vlan description replacing the ethernet one, so payload decode fails to take the concatenation apart (the ethernet header payload info is matched vs. vlan template). This will be adjusted by a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>
* cache: prepare nft_cache_evaluate() to return errorPablo Neira Ayuso2022-07-181-2/+3
| | | | | | | Move flags as parameter reference and add list of error messages to prepare for sanity checks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* scanner: don't pop active flex scanner scopeFlorian Westphal2022-06-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can pop a flex scope that is still active, i.e. the scanner_pop_start_cond() for the scope has not been done. Example: counter ipsec out ip daddr 192.168.1.2 counter name "ipsec_out" Here, parser fails because 'daddr' is parsed as STRING, not as DADDR token. Bug is as follows: COUNTER changes scope to COUNTER. (COUNTER). Next, IPSEC scope gets pushed, stack is: COUNTER, IPSEC. Then, the 'COUNTER' scope close happens. Because active scope has changed, we cannot pop (we would pop the 'ipsec' scope in flex). The pop operation gets delayed accordingly. Next, IP gets pushed, stack is: COUNTER, IPSEC, IP, plus the information that one scope closure/pop was delayed. Then, the IP scope is closed. Because a pop operation was delayed, we pop again, which brings us back to COUNTER state. This is bogus: The pop operation CANNOT be done yet, because the ipsec scope is still open, but the existing code lacks the information to detect this. After popping the IP scope, we must remain in IPSEC scope until bison parser calls scanner_pop_start_cond(, IPSEC). This adds a counter per flex scope so that we can detect this case. In above case, after the IP scope gets closed, the "new" (previous) scope (IPSEC) will be treated as active and its close is attempted again on the next call to scanner_pop_start_cond(). After this patch, transition in above rule is: push counter (COUNTER) push IPSEC (COUNTER, IPSEC) pop COUNTER (delayed: COUNTER, IPSEC, pending-pop for COUNTER), push IP (COUNTER, IPSEC, IP, pending-pop for COUNTER) pop IP (COUNTER, IPSEC, pending-pop for COUNTER) parse DADDR (we're in IPSEC scope, its valid token) pop IPSEC (pops all remaining scopes). We could also resurrect the commit: "scanner: flags: move to own scope", the test case passes with the new scope closure logic. Fixes: bff106c5b277 ("scanner: add support for scope nesting") Signed-off-by: Florian Westphal <fw@strlen.de>
* mnl: store netlink error location for set elementsPablo Neira Ayuso2022-06-271-4/+5
| | | | | | | | | | | | | | | | | Store set element location in the per-command netlink error location array. This allows for fine grain error reporting when adding and deleting elements. # nft -f test.nft test.nft:5:4-20: Error: Could not process rule: File exists 00:01:45:09:0b:26 : drop, ^^^^^^^^^^^^^^^^^ test.nft contains a large map with one redundant entry. Thus, users do not have to find the needle in the stack. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove NFT_NLATTR_LOC_MAX limit for netlink location error reportingPablo Neira Ayuso2022-06-271-5/+8
| | | | | | | Set might have more than 16 elements, use a runtime array to store netlink error location. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* intervals: Do not sort cached set elements over and over againPhil Sutter2022-06-191-0/+1
| | | | | | | | | | | | | | | | | | | | When adding element(s) to a non-empty set, code merged the two lists and sorted the result. With many individual 'add element' commands this causes substantial overhead. Make use of the fact that existing_set->init is sorted already, sort only the list of new elements and use list_splice_sorted() to merge the two sorted lists. Add set_sort_splice() and use it for set element overlap detection and automerge. A test case adding ~25k elements in individual commands completes in about 1/4th of the time with this patch applied. Joint work with Pablo. Fixes: 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* rule: collapse set element commandsPablo Neira Ayuso2022-06-192-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Robots might generate a long list of singleton element commands such as: add element t s { 1.0.1.0/24 } ... add element t s { 1.0.2.0/23 } collapse them into one single command before the evaluation step, ie. add element t s { 1.0.1.0/24, ..., 1.0.2.0/23 } this speeds up overlap detection and set element automerge operations in this worst case scenario. Since 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements"), the new interval tracking relies on mergesort. The pattern above triggers the set sorting for each element. This patch adds a list to cmd objects that store collapsed commands. Moreover, expressions also contain a reference to the original command, to uncollapse the commands after the evaluation step. These commands are uncollapsed after the evaluation step to ensure error reporting works as expected (command and netlink message are mapped 1:1). For the record: - nftables versions <= 1.0.2 did not perform any kind of overlap check for the described scenario above (because set cache only contained elements in the kernel in this case). This is a problem for kernels < 5.7 which rely on userspace to detect overlaps. - the overlap detection could be skipped for kernels >= 5.7. - The extended netlink error reporting available for set elements since 5.19-rc might allow to remove the uncollapse step, in this case, error reporting does not rely on the netlink sequence to refer to the command triggering the problem. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Revert "scanner: flags: move to own scope"Florian Westphal2022-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | Excess nesting of scanner scopes is very fragile and error prone: rule `iif != lo ip daddr 127.0.0.1/8 counter limit rate 1/second log flags all prefix "nft_lo4 " drop` fails with `Error: No symbol type information` hinting at `prefix` Problem is that we nest via: counter limit log flags By the time 'prefix' is scanned, state is still stuck in 'counter' due to this nesting. Working around "prefix" isn't enough, any other keyword, e.g. "level" in 'flags all level debug' will be parsed as 'string' too. So, revert this. Fixes: a16697097e2b ("scanner: flags: move to own scope") Reported-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* intervals: support to partial deletion with automergePablo Neira Ayuso2022-04-132-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Splice the existing set element cache with the elements to be deleted and merge sort it. The elements to be deleted are identified by the EXPR_F_REMOVE flag. The set elements to be deleted is automerged in first place if the automerge flag is set on. There are four possible deletion scenarios: - Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set. - Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b. - Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a. - Split range, eg. delete [a-b] from range [x-y] where x < a and b < y. Update nft_evaluate() to use the safe list variant since new commands are dynamically registered to the list to update ranges. This patch also restores the set element existence check for Linux kernels <= 5.7. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* intervals: add support to automerge with kernel elementsPablo Neira Ayuso2022-04-131-1/+2
| | | | | | | | | | | | | | | | | | Extend the interval codebase to support for merging elements in the kernel with userspace element updates. Add a list of elements to be purged to cmd and set objects. These elements representing outdated intervals are deleted before adding the updated ranges. This routine splices the list of userspace and kernel elements, then it mergesorts to identify overlapping and contiguous ranges. This splice operation is undone so the set userspace cache remains consistent. Incrementally update the elements in the cache, this allows to remove dd44081d91ce ("segtree: Fix add and delete of element in same batch"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: update mnl_nft_setelem_del() to allow for more reusePablo Neira Ayuso2022-04-131-1/+2
| | | | | | Pass handle and element list as parameters to allow for code reuse. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove rbtree datastructurePablo Neira Ayuso2022-04-132-99/+0
| | | | | | Not used by anyone anymore, remove it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace interval segment tree overlap and automergePablo Neira Ayuso2022-04-134-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a rewrite of the segtree interval codebase. This patch now splits the original set_to_interval() function in three routines: - add set_automerge() to merge overlapping and contiguous ranges. The elements, expressed either as single value, prefix and ranges are all first normalized to ranges. This elements expressed as ranges are mergesorted. Then, there is a linear list inspection to check for merge candidates. This code only merges elements in the same batch, ie. it does not merge elements in the kernela and the userspace batch. - add set_overlap() to check for overlapping set elements. Linux kernel >= 5.7 already checks for overlaps, older kernels still needs this code. This code checks for two conflict types: 1) between elements in this batch. 2) between elements in this batch and kernelspace. The elements in the kernel are temporarily merged into the list of elements in the batch to check for this overlaps. The EXPR_F_KERNEL flag allows us to restore the set cache after the overlap check has been performed. - set_to_interval() now only transforms set elements, expressed as range e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END flag notation to represent e.g. [a,b+1), where b+1 has the EXPR_F_INTERVAL_END flag set on. More relevant updates: - The overlap and automerge routines are now performed in the evaluation phase. - The userspace set object representation now stores a reference to the existing kernel set object (in case there is already a set with this same name in the kernel). This is required by the new overlap and automerge approach. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add EXPR_F_KERNEL to identify expression in the kernelPablo Neira Ayuso2022-04-131-0/+2
| | | | | | This allows to identify the set elements that reside in the kernel. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* include: add missing `#include`Jeremy Sowden2022-04-051-0/+1
| | | | | | | datatype.h uses bool and so should include <stdbool.h>. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: allow to use typeof of raw expressions in set declarationPablo Neira Ayuso2022-03-292-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set length and byteorder. Add missing information to the set userdata area for raw payload expressions which allows to rebuild the set typeof from the listing path. A few examples: - With anonymous sets: nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e } - With named sets: table x { set y { typeof ip saddr . @ih,32,32 elements = { 1.1.1.1 . 0x14 } } } Incremental updates are also supported, eg. nft add element x y { 3.3.3.3 . 0x28 } expr_evaluate_concat() is used to evaluate both set key definitions and set key values, using two different function might help to simplify this code in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* scanner: dup, fwd, tproxy: Move to own scopesPhil Sutter2022-03-011-0/+3
| | | | | | With these three scopes in place, keyword 'to' may be isolated. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: meta: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | | This allows to isolate 'length' and 'protocol' keywords shared by other scopes as well. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: at: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | | Modification of raw TCP option rule is a bit more complicated to avoid pushing tcp_hdr_option_type into the introduced scope by accident. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: nat: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | | | | | Unify nat, masquerade and redirect statements, they widely share their syntax. Note the workaround of adding "prefix" to SCANSTATE_IP. This is required to fix for 'snat ip prefix ...' style expressions. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: policy: move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | Isolate 'performance' and 'memory' keywords. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: flags: move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | This isolates at least 'constant', 'dynamic' and 'all' keywords. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: reject: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | Two more keywords isolated. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: import, export: Move to own scopesPhil Sutter2022-03-011-0/+2
| | | | | | | In theory, one could use a common scope for both import and export commands, their parameters are identical. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: reset: move to own ScopePhil Sutter2022-03-011-0/+1
| | | | | | Isolate two more keywords shared with list command. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: monitor: Move to own ScopePhil Sutter2022-03-011-0/+1
| | | | | | Some keywords are shared with list command. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: type: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | As a side-effect, this fixes for use of 'classid' as set data type. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: dst, frag, hbh, mh: Move to own scopesPhil Sutter2022-03-011-0/+4
| | | | | | | These are the remaining IPv6 extension header expressions, only rt expression was scoped already. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: ah, esp: Move to own scopesPhil Sutter2022-03-011-0/+2
| | | | | | They share 'sequence' keyword with icmp and tcp expressions. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: osf: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | It shares two keywords with PARSER_SC_IP. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: dccp, th: Move to own scopesPhil Sutter2022-03-011-0/+2
| | | | | | | With them in place, heavily shared keywords 'sport' and 'dport' may be isolated. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: udp{,lite}: Move to own scopePhil Sutter2022-03-011-0/+2
| | | | | | | All used keywords are shared with others, so no separation for now apart from 'csumcov' which was actually missing from scanner.l. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: comp: Move to own scope.Phil Sutter2022-03-011-0/+1
| | | | | | Isolates only 'cpi' keyword for now. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: synproxy: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | Quite a few keywords are shared with PARSER_SC_TCP. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: igmp: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | | At least isolates 'mrt' and 'group' keywords, the latter is shared with log statement. Signed-off-by: Phil Sutter <phil@nwl.cc>
* scanner: icmp{,v6}: Move to own scopePhil Sutter2022-03-011-0/+1
| | | | | | Unify the two, header fields are almost identical. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add tcp option reset supportFlorian Westphal2022-02-282-0/+11
| | | | | | | This allows to replace a tcp option with nops, similar to the TCPOPTSTRIP feature of iptables. Signed-off-by: Florian Westphal <fw@strlen.de>
* json: add flow statement json export + parserFlorian Westphal2022-02-071-0/+2
| | | | | | | | | | | flow statement has no export, its shown as: ".. }, "flow add @ft" ] } }" With this patch: ".. }, {"flow": {"op": "add", "flowtable": "@ft"}}]}}" Signed-off-by: Florian Westphal <fw@strlen.de>
* src: store more than one payload dependencyJeremy Sowden2022-01-151-7/+6
| | | | | | | | Change the payload-dependency context to store a dependency for every protocol layer. This allows us to eliminate more redundant protocol expressions. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add a helper that returns a payload dependency for a particular baseJeremy Sowden2022-01-151-0/+2
| | | | | | | | | | | Currently, with only one base and dependency stored this is superfluous, but it will become more useful when the next commit adds support for storing a payload for every base. Remove redundant `ctx->pbase` check. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: 'nft list chain' prints anonymous chains correctlyPablo Neira Ayuso2022-01-152-0/+4
| | | | | | | | | If the user is requesting a chain listing, e.g. nft list chain x y and a rule refers to an anonymous chain that cannot be found in the cache, then fetch such anonymous chain and its ruleset. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: do not use the nft_cache_filter object from mnl.cPablo Neira Ayuso2022-01-151-1/+1
| | | | | | Pass the table and chain strings to mnl_nft_rule_dump() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add ruleset optimization infrastructurePablo Neira Ayuso2022-01-152-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new -o/--optimize option to enable ruleset optimization. You can combine this option with the dry run mode (--check) to review the proposed ruleset updates without actually loading the ruleset, e.g. # nft -c -o -f ruleset.test Merging: ruleset.nft:16:3-37: ip daddr 192.168.0.1 counter accept ruleset.nft:17:3-37: ip daddr 192.168.0.2 counter accept ruleset.nft:18:3-37: ip daddr 192.168.0.3 counter accept into: ip daddr { 192.168.0.1, 192.168.0.2, 192.168.0.3 } counter packets 0 bytes 0 accept This infrastructure collects the common statements that are used in rules, then it builds a matrix of rules vs. statements. Then, it looks for common statements in consecutive rules which allows to merge rules. This ruleset optimization always performs an implicit dry run to validate that the original ruleset is correct. Then, on a second pass, it performs the ruleset optimization and add the rules into the kernel (unless --check has been specified by the user). From libnftables perspective, there is a new API to enable this feature: uint32_t nft_ctx_get_optimize(struct nft_ctx *ctx); void nft_ctx_set_optimize(struct nft_ctx *ctx, uint32_t flags); This patch adds support for the first optimization: Collapse a linear list of rules matching on a single selector into a set as exposed in the example above. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: error reporting with -f and read from stdinPablo Neira Ayuso2022-01-151-0/+2
| | | | | | | | | | | | | | | | | | | | | Reading from stdin requires to store the ruleset in a buffer so error reporting works accordingly, eg. # cat ruleset.nft | nft -f - /dev/stdin:3:13-13: Error: unknown identifier 'x' ip saddr $x ^ The error reporting infrastructure performs a fseek() on the file descriptor which does not work in this case since the data from the descriptor has been already consumed. This patch adds a new stdin input descriptor to perform this special handling which consists on re-routing this request through the buffer functions. Fixes: 935f82e7dd49 ("Support 'nft -f -' to read from stdin") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* erec: expose print_location() and line_location()Pablo Neira Ayuso2022-01-152-1/+5
| | | | | | | Add a few helper functions to reuse code in the new rule optimization infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipopt: drop unused 'ptr' argumentFlorian Westphal2021-12-071-1/+1
| | | | | | | | | Its always 0, so remove it. Looks like this was intended to support variable options that have array-like members, but so far this isn't implemented, better remove dead code and implement it properly when such support is needed. Signed-off-by: Florian Westphal <fw@strlen.de>
* cache: Support filtering for a specific flowtablePhil Sutter2021-12-032-1/+3
| | | | | | | | | | Extend nft_cache_filter to hold a flowtable name so 'list flowtable' command causes fetching the requested flowtable only. Dump flowtables just once instead of for each table, merely assign fetched data to tables inside the loop. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cache: Filter set list on server sidePhil Sutter2021-12-031-1/+1
| | | | | | | | | Fetch either all tables' sets at once, a specific table's sets or even a specific set if needed instead of iterating over the list of previously fetched tables and fetching for each, then ignoring anything returned that doesn't match the filter. Signed-off-by: Phil Sutter <phil@nwl.cc>