summaryrefslogtreecommitdiffstats
path: root/src/evaluate.c
Commit message (Collapse)AuthorAgeFilesLines
...
* src: expand table command before evaluationPablo Neira Ayuso2023-02-241-39/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nested syntax notation results in one single table command which includes all other objects. This differs from the flat notation where there is usually one command per object. This patch adds a previous step to the evaluation phase to expand the objects that are contained in the table into independent commands, so both notations have similar representations. Remove the code to evaluate the nested representation in the evaluation phase since commands are independently evaluated after the expansion. The commands are expanded after the set element collapse step, in case that there is a long list of singleton element commands to be added to the set, to shorten the command list iteration. This approach also avoids interference with the object cache that is populated in the evaluation, which might refer to objects coming in the existing command list that is being processed. There is still a post_expand phase to detach the elements from the set which could be consolidated by updating the evaluation step to handle the CMD_OBJ_SETELEMS command type. This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that contains rules") which broke rule addition/insertion by index because the expansion code after the evaluation messes up the cache. Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: infer family from mappingPablo Neira Ayuso2023-02-211-5/+40
| | | | | | | If the key in the nat mapping is either ip or ip6, then set the nat family accordingly, no need for explicit family in the nat statement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: print error on missing family in nat statementPablo Neira Ayuso2023-02-211-3/+29
| | | | | | | | | | | Print error message in case family cannot be inferred, before this patch, $? shows 1 after nft execution but no error message was printed. While at it, update error reporting for consistency in similar use cases. Fixes: e5c9c8fe0bcc ("evaluate: stmt_evaluate_nat_map() only if stmt->nat.ipportmap == true") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: relax type-checking for integer arguments in mark statementsJeremy Sowden2023-02-071-2/+6
| | | | | | | | | | | | In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
* src: add support to command "destroy"Fernando F. Mancera2023-02-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | "destroy" command performs a deletion as "delete" command but does not fail if the object does not exist. As there is no NLM_F_* flag for ignoring such error, it needs to be ignored directly on error handling. Example of use: # nft list ruleset table ip filter { chain output { } } # nft destroy table ip missingtable # echo $? 0 # nft list ruleset table ip filter { chain output { } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: set eval ctx for add/update statements with integer constantsFlorian Westphal2023-01-261-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric reports that nft asserts when using integer basetype constants with 'typeof' sets. Example: table netdev t { set s { typeof ether saddr . vlan id flags dynamic,timeout } chain c { } } loads fine. But adding a rule with add/update statement fails: nft 'add rule netdev t c set update ether saddr . 0 @s' nft: netlink_linearize.c:867: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. When the 'ether saddr . 0' concat expression is processed, there is no set definition available anymore to deduce the required size of the integer constant. nft eval step then derives the required length using the data types. '0' has integer basetype, so the deduced length is 0. The assertion triggers because serialization step finds that it needs one more register. 2 are needed to store the ethernet address, another register is needed for the vlan id. Update eval step to make the expression context store the set key information when processing the preceeding set reference, then let stmt_evaluate_set() preserve the existing context instead of zeroing it again via stmt_evaluate_arg(). This makes concat expression evaluation compute the total size needed based on the sets key definition. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* Implement 'reset rule' and 'reset rules' commandsPhil Sutter2023-01-181-0/+2
| | | | | | | | Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add gre supportPablo Neira Ayuso2023-01-021-12/+31
| | | | | | | | | | | | | GRE has a number of fields that are conditional based on flags, which requires custom dependency code similar to icmp and icmpv6. Matching on optional fields is not supported at this stage. Since this is a layer 3 tunnel protocol, an implicit dependency on NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this patch adds new infrastructure to remove an outer dependency based on the inner protocol from delinearize path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: display (inner) tag in --debug=proto-ctxPablo Neira Ayuso2023-01-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add vxlan matching supportPablo Neira Ayuso2023-01-021-9/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the initial infrastructure to support for inner header tunnel matching and its first user: vxlan. A new struct proto_desc field for payload and meta expression to specify that the expression refers to inner header matching is used. The existing codebase to generate bytecode is fully reused, allowing for reusing existing supported layer 2, 3 and 4 protocols. Syntax requires to specify vxlan before the inner protocol field: ... vxlan ip protocol udp ... vxlan ip saddr 1.2.3.0/24 This also works with concatenations and anonymous sets, eg. ... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 } You have to restrict vxlan matching to udp traffic, otherwise it complains on missing transport protocol dependency, e.g. ... udp dport 4789 vxlan ip daddr 1.2.3.4 The bytecode that is generated uses the new inner expression: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] JSON support is not included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add eval_proto_ctx()Pablo Neira Ayuso2023-01-021-73/+115
| | | | | | | | | | | Add eval_proto_ctx() to access protocol context (struct proto_ctx). Rename struct proto_ctx field to _pctx to highlight that this field is internal and the helper function should be used. This patch comes in preparation for supporting outer and inner protocol context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: fix shift exponent underflow in concatenation evaluationPablo Neira Ayuso2022-12-221-1/+1
| | | | | | | | | | | | | | There is an underflow of the index that iterates over the concatenation: ../include/datatype.h:292:15: runtime error: shift exponent 4294967290 is too large for 32-bit type 'unsigned int' set the datatype to invalid which is fine to evaluate a concatenation in a set/map statement. Update b8e1940aa190 ("tests: add a test case for map update from packet path with concat") so it does not need a workaround to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_linearize: fix timeout with map updatesFlorian Westphal2022-12-121-0/+3
| | | | | | | | | | | | | | | | Map updates can use timeouts, just like with sets, but the linearization step did not pass this info to the kernel. meta l4proto tcp update @pinned { ip saddr . ct original proto-src timeout 90s : ip daddr . tcp dport Listing this won't show the "timeout 90s" because kernel never saw it to begin with. Also update evaluation step to reject a timeout that was set on the data part: Timeouts are only allowed for the key-value pair as a whole. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: fix compilation warningPablo Neira Ayuso2022-12-121-2/+2
| | | | | | | | | | | | | | | | | | | Set pointer to list of expression to NULL and check that it is set on before using it. In function ‘expr_evaluate_concat’, inlined from ‘expr_evaluate’ at evaluate.c:2488:10: evaluate.c:1338:20: warning: ‘expressions’ may be used uninitialized [-Wmaybe-uninitialized] 1338 | if (runaway) { | ^ evaluate.c: In function ‘expr_evaluate’: evaluate.c:1321:33: note: ‘expressions’ was declared here 1321 | const struct list_head *expressions; | ^~~~~~~~~~~ Reported-by: Florian Westphal <fw@strlen.de> Fixes: 508f3a270531 ("netlink: swap byteorder of value component in concatenation of intervals") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: do not crash on runaway number of concatenation componentsPablo Neira Ayuso2022-12-081-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Display error message in case user specifies more data components than those defined by the concatenation of selectors. # cat example.nft table ip x { chain y { type filter hook prerouting priority 0; policy drop; ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept } } # nft -f example.nft example.nft:4:3-22: Error: too many concatenation components ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept ~~~~~~~~~~~~~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Without this patch, nft crashes: ==464771==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60d000000418 at pc 0x7fbc17513aa5 bp 0x7ffc73d33c90 sp 0x7ffc73d33c88 READ of size 8 at 0x60d000000418 thread T0 #0 0x7fbc17513aa4 in expr_evaluate_concat /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1348 #1 0x7fbc1752a9da in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2476 #2 0x7fbc175175e2 in expr_evaluate_set_elem /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1504 #3 0x7fbc1752aa22 in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2482 #4 0x7fbc17512cb5 in list_member_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1310 #5 0x7fbc17518ca0 in expr_evaluate_set /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1590 [...] Fixes: 64bb3f43bb96 ("src: allow to use typeof of raw expressions in set declaration") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for selectors with different byteorder with interval concatenationsPablo Neira Ayuso2022-11-301-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assuming the following interval set with concatenation: set test { typeof ip saddr . meta mark flags interval } then, the following rule: ip saddr . meta mark @test requires bytecode that swaps the byteorder for the meta mark selector in case the set contains intervals and concatenations. inet x y [ meta load nfproto => reg 1 ] [ cmp eq reg 1 0x00000002 ] [ payload load 4b @ network header + 12 => reg 1 ] [ meta load mark => reg 9 ] [ byteorder reg 9 = hton(reg 9, 4, 4) ] <----- this is required ! [ lookup reg 1 set test dreg 0 ] This patch updates byteorder_conversion() to add the unary expression that introduces the byteorder expression. Moreover, store the meta mark range component of the element tuple in the set in big endian as it is required for the range comparisons. Undo the byteorder swap in the netlink delinearize path to listing the meta mark values accordingly. Update tests/py to validate that byteorder expression is emitted in the bytecode. Update tests/shell to validate insertion and listing of a named map declaration. A similar commit 806ab081dc9a ("netlink: swap byteorder for host-endian concat data") already exists in the tree to handle this for strings with prefix (e.g. eth*). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: datatype memleak after binop transferPablo Neira Ayuso2022-10-061-1/+0
| | | | | | | | | | | | | | The following ruleset: ip version vmap { 4 : jump t3, 6 : jump t4 } results in a memleak. expr_evaluate_shift() overrides the datatype which results in a datatype memleak after the binop transfer that triggers a left-shift of the constant (in the map). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: bogus datatype assertion in binary operation evaluationPablo Neira Ayuso2022-10-061-1/+1
| | | | | | | | | | Use datatype_equal(), otherwise dynamically allocated datatype fails to fulfill the datatype pointer check, triggering the assertion: nft: evaluate.c:1249: expr_evaluate_binop: Assertion `expr_basetype(left) == expr_basetype(right)' failed. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1636 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: add ethernet header size offset for implicit vlan dependencyFlorian Westphal2022-09-291-1/+19
| | | | | | | | | | 'vlan id 1' must also add a ethernet header dep, else nft fetches the payload from header offset 0 instead of 14. Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: allow implicit ether -> vlan depFlorian Westphal2022-09-281-0/+1
| | | | | | | | | | nft add rule inet filter input vlan id 2 Error: conflicting protocols specified: ether vs. vlan Refresh the current dependency after superseding the dummy dependency to make this work. Signed-off-by: Florian Westphal <fw@strlen.de>
* doc, src: make some spelling and grammatical improvementsJeremy Sowden2022-09-221-2/+2
| | | | | | | | | | | | | Fix a couple of spelling mistakes: 'expresion' -> 'expression' and correct some non-native usages: 'allows to' -> 'allows one to' Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: un-break rule insert with intervalsFlorian Westphal2022-09-201-0/+1
| | | | | | | | | 'rule inet dscpclassify dscp_match meta l4proto { udp } th dport { 3478 } th sport { 3478-3497, 16384-16387 } goto ct_set_ef' works with 'nft add', but not 'nft insert', the latter yields: "BUG: unhandled op 4". Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Don't parse string as verdict in mapXiao Liang2022-08-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In verdict map, string values are accidentally treated as verdicts. For example: table t { map foo { type ipv4_addr : verdict elements = { 192.168.0.1 : bar } } chain output { type filter hook output priority mangle; ip daddr vmap @foo } } Though "bar" is not a valid verdict (should be "jump bar" or something), the string is taken as the element value. Then NFTA_DATA_VALUE is sent to the kernel instead of NFTA_DATA_VERDICT. This would be rejected by recent kernels. On older ones (e.g. v5.4.x) that don't validate the type, a warning can be seen when the rule is hit, because of the corrupted verdict value: [5120263.467627] WARNING: CPU: 12 PID: 303303 at net/netfilter/nf_tables_core.c:229 nft_do_chain+0x394/0x500 [nf_tables] Indeed, we don't parse verdicts during evaluation, but only chain names, which is of type string rather than verdict. For example, "jump $var" is a verdict while "$var" is a string. Fixes: c64457cff967 ("src: Allow goto and jump to a variable") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: search stacked header list for matching payload depFlorian Westphal2022-08-051-6/+15
| | | | | | | | | | | | | | "ether saddr 0:1:2:3:4:6 vlan id 2" works, but reverse fails: "vlan id 2 ether saddr 0:1:2:3:4:6" will give Error: conflicting protocols specified: vlan vs. ether After "proto: track full stack of seen l2 protocols, not just cumulative offset", we have a list of all l2 headers, so search those to see if we had this proto base in the past before rejecting this. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* proto: track full stack of seen l2 protocols, not just cumulative offsetFlorian Westphal2022-08-051-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | For input, a cumulative size counter of all pushed l2 headers is enough, because we have the full expression tree available to us. For delinearization we need to track all seen l2 headers, else we lose information that we might need at a later time. Consider: rule netdev nt nc set update ether saddr . vlan id during delinearization, the vlan proto_desc replaces the ethernet one, and by the time we try to split the concatenation apart we will search the ether saddr offset vs. the templates for proto_vlan. This replaces the offset with an array that stores the protocol descriptions seen. Then, if the payload offset is larger than our description, search the l2 stack and adjust the offset until we're within the expected offset boundary. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: report missing interval flag when using prefix/range in concatenationPablo Neira Ayuso2022-07-071-5/+20
| | | | | | | | If set declaration is missing the interval flag, and user specifies an element with either prefix or range, then bail out. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1592 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: fix segfault when adding elements to invalid setPeter Tirsek2022-06-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding elements to a set or map with an invalid definition causes nft to segfault. The following nftables.conf triggers the crash: flush ruleset create table inet filter set inet filter foo {} add element inet filter foo { foobar } Simply parsing and checking the config will trigger it: $ nft -c -f nftables.conf.crash Segmentation fault The error in the set/map definition is correctly caught and queued, but because the set is invalid and does not contain a key type, adding to it causes a NULL pointer dereference of set->key within setelem_evaluate(). I don't think it's necessary to queue another error since the underlying problem is correctly detected and reported when parsing the definition of the set. Simply checking the validity of set->key before using it seems to fix it, causing the error in the definition of the set to be reported properly. The element type error isn't caught, but that seems reasonable since the key type is invalid or unknown anyway: $ ./nft -c -f ~/nftables.conf.crash /home/pti/nftables.conf.crash:3:21-21: Error: set definition does not specify key set inet filter foo {} ^ [ Add tests to cover this case --pablo ] Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1597 Signed-off-by: Peter Tirsek <peter@tirsek.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: reset ctx->set after set interval evaluationPablo Neira Ayuso2022-06-011-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise bogus error reports on set datatype mismatch might occur, such as: Error: datatype mismatch, expected Internet protocol, expression has type IPv4 address meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1 ~~~~~~~~~~~~ ^^^^^^^^^^^^ with an unrelated set declaration. table ip test { set set_with_interval { type ipv4_addr flags interval } chain prerouting { type nat hook prerouting priority dstnat; policy accept; meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1 } } This bug has been introduced in the evaluation step. Reported-by: Roman Petrov <nwhisper@gmail.com> Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge)" Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: fix always-true assertionsFlorian Westphal2022-04-261-1/+1
| | | | | | | assert(1) is a no-op, this should be assert(0). Use BUG() instead. Add missing CATCHALL to avoid BUG(). Signed-off-by: Florian Westphal <fw@strlen.de>
* src: allow use of base integer types as set keys in concatenationsFlorian Westphal2022-04-181-7/+17
| | | | | | | | | | | | | | | | "typeof ip saddr . ipsec in reqid" won't work because reqid uses integer type, i.e. dtype->size is 0. With "typeof", the size can be derived from the expression length, via set->key. This computes the concat length based either on dtype->size or expression length. It also updates concat evaluation to permit a zero datatype size if the subkey expression has nonzero length (i.e., typeof was used). Signed-off-by: Florian Westphal <fw@strlen.de>
* intervals: support to partial deletion with automergePablo Neira Ayuso2022-04-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Splice the existing set element cache with the elements to be deleted and merge sort it. The elements to be deleted are identified by the EXPR_F_REMOVE flag. The set elements to be deleted is automerged in first place if the automerge flag is set on. There are four possible deletion scenarios: - Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set. - Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b. - Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a. - Split range, eg. delete [a-b] from range [x-y] where x < a and b < y. Update nft_evaluate() to use the safe list variant since new commands are dynamically registered to the list to update ranges. This patch also restores the set element existence check for Linux kernels <= 5.7. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: allow for zero length rangesPablo Neira Ayuso2022-04-131-1/+1
| | | | | | | | | Allow for ranges such as, eg. 30-30. This is required by the new intervals.c code, which normalize constant, prefix set elements to all ranges. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* intervals: add support to automerge with kernel elementsPablo Neira Ayuso2022-04-131-3/+5
| | | | | | | | | | | | | | | | | | Extend the interval codebase to support for merging elements in the kernel with userspace element updates. Add a list of elements to be purged to cmd and set objects. These elements representing outdated intervals are deleted before adding the updated ranges. This routine splices the list of userspace and kernel elements, then it mergesorts to identify overlapping and contiguous ranges. This splice operation is undone so the set userspace cache remains consistent. Incrementally update the elements in the cache, this allows to remove dd44081d91ce ("segtree: Fix add and delete of element in same batch"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: replace interval segment tree overlap and automergePablo Neira Ayuso2022-04-131-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a rewrite of the segtree interval codebase. This patch now splits the original set_to_interval() function in three routines: - add set_automerge() to merge overlapping and contiguous ranges. The elements, expressed either as single value, prefix and ranges are all first normalized to ranges. This elements expressed as ranges are mergesorted. Then, there is a linear list inspection to check for merge candidates. This code only merges elements in the same batch, ie. it does not merge elements in the kernela and the userspace batch. - add set_overlap() to check for overlapping set elements. Linux kernel >= 5.7 already checks for overlaps, older kernels still needs this code. This code checks for two conflict types: 1) between elements in this batch. 2) between elements in this batch and kernelspace. The elements in the kernel are temporarily merged into the list of elements in the batch to check for this overlaps. The EXPR_F_KERNEL flag allows us to restore the set cache after the overlap check has been performed. - set_to_interval() now only transforms set elements, expressed as range e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END flag notation to represent e.g. [a,b+1), where b+1 has the EXPR_F_INTERVAL_END flag set on. More relevant updates: - The overlap and automerge routines are now performed in the evaluation phase. - The userspace set object representation now stores a reference to the existing kernel set object (in case there is already a set with this same name in the kernel). This is required by the new overlap and automerge approach. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: string prefix expression must retain original lengthFlorian Westphal2022-04-131-1/+3
| | | | | | | | | | | | | | | | To make something like "eth*" work for interval sets (match eth0, eth1, and so on...) we must treat the string as a 128 bit integer. Without this, segtree will do the wrong thing when applying the prefix, because we generate the prefix based on 'eth*' as input, with a length of 3. The correct import needs to be done on "eth\0\0\0\0\0\0\0...", i.e., if the input buffer were an ipv6 address, it should look like "eth\0::", not "::eth". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: keep prefix expression lengthFlorian Westphal2022-04-131-0/+1
| | | | | | | | | | | | | Else, range_expr_value_high() will see a 0 length when doing: mpz_init_bitmask(tmp, expr->len - expr->prefix_len); This wasn't a problem so far because prefix expressions generated from "string*" were never passed down to the prefix->range conversion functions. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: make byteorder conversion on string base type a no-opFlorian Westphal2022-04-131-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prerequisite for support of interface names in interval sets: table inet filter { set s { type ifname flags interval elements = { "foo" } } chain input { type filter hook input priority filter; policy accept; iifname @s counter } } Will yield: "Byteorder mismatch: meta expected big endian, got host endian". This is because of: /* Data for range lookups needs to be in big endian order */ if (right->set->flags & NFT_SET_INTERVAL && byteorder_conversion(ctx, &rel->left, BYTEORDER_BIG_ENDIAN) < 0) It doesn't make sense to me to add checks to all callers of byteorder_conversion(), so treat this similar to EXPR_CONCAT and turn TYPE_STRING byteorder change into a no-op. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow to use integer type header fields via typeof set declarationPablo Neira Ayuso2022-03-291-1/+1
| | | | | | | | | | | | | | | Header fields such as udp length cannot be used in concatenations because it is using the generic integer_type: test.nft:3:10-19: Error: can not use variable sized data types (integer) in concat expressions typeof udp length . @th,32,32 ^^^^^^^^^^~~~~~~~~~~~~ This patch slightly extends ("src: allow to use typeof of raw expressions in set declaration") to set on NFTNL_UDATA_SET_KEY_PAYLOAD_LEN in userdata if TYPE_INTEGER is used. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow to use typeof of raw expressions in set declarationPablo Neira Ayuso2022-03-291-12/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set length and byteorder. Add missing information to the set userdata area for raw payload expressions which allows to rebuild the set typeof from the listing path. A few examples: - With anonymous sets: nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e } - With named sets: table x { set y { typeof ip saddr . @ih,32,32 elements = { 1.1.1.1 . 0x14 } } } Incremental updates are also supported, eg. nft add element x y { 3.3.3.3 . 0x28 } expr_evaluate_concat() is used to evaluate both set key definitions and set key values, using two different function might help to simplify this code in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: copy field_count for anonymous object maps as wellFlorian Westphal2022-03-211-11/+17
| | | | | | | | | | without this test fails with: W: [FAILED] tests/shell/testcases/maps/anon_objmap_concat: got 134 BUG: invalid range expression type concat nft: expression.c:1452: range_expr_value_low: Assertion `0' failed. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: init cmd pointer for new on-stack contextFlorian Westphal2022-03-041-0/+1
| | | | | | | else, this will segfault when trying to print the "table 'x' doesn't exist" error message. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add tcp option reset supportFlorian Westphal2022-02-281-0/+7
| | | | | | | This allows to replace a tcp option with nops, similar to the TCPOPTSTRIP feature of iptables. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: attempt to set_eval flag if dynamic updates requestedFlorian Westphal2022-01-111-0/+10
| | | | | | | | | | | | | When passing no upper size limit, the dynset expression forces an internal 64k upperlimit. In some cases, this can result in 'nft -f' to restore the ruleset. Avoid this by always setting the EVAL flag on a set definition when we encounter packet-path update attempt in the batch. Reported-by: Yi Chen <yiche@redhat.com> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: reject: support ethernet as L2 protocol for inet tableJeremy Sowden2021-12-151-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are evaluating a `reject` statement in the `inet` family, we may have `ether` and `ip` or `ip6` as the L2 and L3 protocols in the evaluation context: table inet filter { chain input { type filter hook input priority filter; ether saddr aa:bb:cc:dd:ee:ff ip daddr 192.168.0.1 reject } } Since no `reject` option is given, nft attempts to infer one and fails: BUG: unsupported familynft: evaluate.c:2766:stmt_evaluate_reject_inet_family: Assertion `0' failed. Aborted The reason it fails is that the ethernet protocol numbers for IPv4 and IPv6 (`ETH_P_IP` and `ETH_P_IPV6`) do not match `NFPROTO_IPV4` and `NFPROTO_IPV6`. Add support for the ethernet protocol numbers. Replace the current `BUG("unsupported family")` error message with something more informative that tells the user to provide an explicit reject option. Add a Python test case. Fixes: 5fdd0b6a0600 ("nft: complete reject support") Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001360 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: correct typo'sJeremy Sowden2021-12-151-2/+2
| | | | | | | There are a couple of mistakes in comments. Fix them. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Fix payload statement mask on Big EndianPhil Sutter2021-11-301-2/+2
| | | | | | | | The mask used to select bits to keep must be exported in the same byteorder as the payload statement itself, also the length of the exported data must match the number of bytes extracted earlier. Signed-off-by: Phil Sutter <phil@nwl.cc>
* evaluate: grab reference in set expression evaluationPablo Neira Ayuso2021-11-081-2/+2
| | | | | | | Do not clone expression when evaluation a set expression, grabbing the reference counter to reuse the object is sufficient. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: clone variable expression if there is more than one referencePablo Neira Ayuso2021-11-081-1/+10
| | | | | | | | | Clone the expression that defines the variable value if there are multiple references to it in the ruleset. This saves heap memory consumption in case the variable defines a set with a huge number of elements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: raw payload match and mangle on inner header / payload dataPablo Neira Ayuso2021-11-081-0/+3
| | | | | | | | | | | | | | | This patch adds support to match on inner header / payload data: # nft add rule x y @ih,32,32 0x14000000 counter you can also mangle payload data: # nft add rule x y @ih,32,32 set 0x14000000 counter This update triggers a checksum update at the layer 4 header via csum_flags, mangling odd bytes is also aligned to 16-bits. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: postpone transport protocol match check after nat expression ↵Pablo Neira Ayuso2021-11-031-6/+7
| | | | | | | | | evaluation Fix bogus error report when using transport protocol as map key. Fixes: 50780456a01a ("evaluate: check for missing transport protocol match in nat map with concatenations") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>