summaryrefslogtreecommitdiffstats
path: root/src/evaluate.c
Commit message (Collapse)AuthorAgeFilesLines
* datatype: rename "dtype_clone()" to datatype_clone()Thomas Haller2023-09-081-2/+2
| | | | | | | | | | | | | The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: fix get element for concatenated setFlorian Westphal2023-09-061-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | given: table ip filter { set test { type ipv4_addr . ether_addr . mark flags interval elements = { 198.51.100.0/25 . 00:0b:0c:ca:cc:10-c1:a0:c1:cc:10:00 . 0x0000006f, } } } We get lookup failure: nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f } Error: Could not process rule: No such file or directory Its possible to work around this via dummy range somewhere in the key, e.g. nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f-0x6f } but that shouldn't be needed, so make sure the INTERVAL flag is enabled for the queried element if the set is of interval type. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: revisit anonymous set with single element optimizationPablo Neira Ayuso2023-09-021-20/+40
| | | | | | | | | | | | | | | | | | | | | This patch reworks it to perform this optimization from the evaluation step of the relational expression. Hence, when optimizing for protocol flags, use OP_EQ instead of OP_IMPLICIT, that is: tcp flags { syn } becomes (to represent an exact match): tcp flags == syn given OP_IMPLICIT and OP_EQ are not equivalent for flags. 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") disabled this optimization, which is enabled again after this patch. Fixes: 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: place byteorder conversion after numgen for IP address datatypesJorge Ortiz2023-09-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | The numgen extension generates numbers in little-endian. This can be very tricky when trying to combine it with IP addresses, which use big endian. This change adds a new byteorder operation to convert data type endianness. Before this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ nat snat ip addr_min reg 1 ] After this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ byteorder reg 1 = hton(reg 1, 4, 4) ] [ nat snat ip addr_min reg 1 ] Regression tests have been modified to include these new cases. Signed-off-by: Jorge Ortiz Escribano <jorge.ortiz.escribano@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: simplify chain_alloc()Pablo Neira Ayuso2023-08-311-1/+1
| | | | | | | Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: do not remove anonymous set with protocol flags and single elementPablo Neira Ayuso2023-08-301-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Set lookups with flags search for an exact match, however: tcp flags { syn } gets transformed into: tcp flags syn which is matching on the syn flag only (non-exact match). This optimization is safe for ct state though, because only one bit is ever set on in the ct state bitmask. Since protocol flags allow for combining flags, skip this optimization to retain exact match semantics. Another possible solution is to turn OP_IMPLICIT into OP_EQ for exact flag match to re-introduce this optimization and deal with this corner case. Fixes: fee6bda06403 ("evaluate: remove anon sets with exactly one element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: don't needlessly clear full string buffer in ↵Thomas Haller2023-08-291-1/+4
| | | | | | | stmt_evaluate_log_prefix() Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: rework SNPRINTF_BUFFER_SIZE() and handle truncationThomas Haller2023-08-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before, the macro asserts against truncation. This is despite the callers still checked for truncation and tried to handle it. Probably for good reason. With stmt_evaluate_log_prefix() it's not clear that the code ensures that truncation cannot happen, so we must not assert against it, but handle it. Also, - wrap the macro in "do { ... } while(0)" to make it more function-like. - evaluate macro arguments exactly once, to make it more function-like. - take pointers to the arguments that are being modified. - use assert() instead of abort(). - use size_t type for arguments related to the buffer size. - drop "size". It was mostly redundant to "offset". We can know everything we want based on "len" and "offset" alone. - "offset" previously was incremented before checking for truncation. So it would point somewhere past the buffer. This behavior does not seem useful. Instead, on truncation "len" will be zero (as before) and "offset" will point one past the buffer (one past the terminating NUL). Thereby, also fix a warning from clang: evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size = 0; ^ meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size; ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: fix check for truncation in stmt_evaluate_log_prefix()Thomas Haller2023-08-291-1/+1
| | | | | | | | | | | | | | | | | | | Otherwise, nft crashes with prefix longer than 127 bytes: # nft add rule x y log prefix \"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\" ==159385==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffed5bf4a10 at pc 0x7f3134839269 bp 0x7ffed5bf48b0 sp 0x7ffed5bf4060 WRITE of size 129 at 0x7ffed5bf4a10 thread T0 #0 0x7f3134839268 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:778 #1 0x7f3133e3074e in __mpz_export_data /tmp/nftables/src/gmputil.c:110 #2 0x7f3133d21d3c in expr_to_string /tmp/nftables/src/expression.c:192 #3 0x7f3133ded103 in netlink_gen_log_stmt /tmp/nftables/src/netlink_linearize.c:1148 #4 0x7f3133df33a1 in netlink_gen_stmt /tmp/nftables/src/netlink_linearize.c:1682 [...] Fixes: e76bb3794018 ('src: allow for variables in the log prefix string') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: Drop dead code from expr_evaluate_mapping()Phil Sutter2023-08-291-11/+8
| | | | | | | | | | | | | Since commit 343a51702656a ("src: store expr, not dtype to track data in sets"), set->data is allocated for object maps in set_evaluate(), all other map types have set->data initialized by the parser already, set_evaluate() also checks that. Drop the confusing check, later in the function set->data is dereferenced unconditionally. Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets") Signed-off-by: Phil Sutter <phil@nwl.cc>
* include: include <std{bool,int}.h> via <nft.h>Thomas Haller2023-08-251-1/+0
| | | | | | | | | | | | | | | | | | | | There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is *always* included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add <nft.h> header and include it as firstThomas Haller2023-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blockingThomas Haller2023-08-241-2/+8
| | | | | | | | | | | | | | | | | | | getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: error out on meter overlap with an existing set/map declarationPablo Neira Ayuso2023-08-231-0/+18
| | | | | | | | | | | | | | | | | | | One of the problems with meters is that they use the set/map infrastructure behind the scenes which might be confusing to users. This patch errors out in case user declares a meter whose name overlaps with an existing set/map: meter.nft:15:18-91: Error: File exists; meter ‘syn4-meter’ overlaps an existing set ‘syn4-meter’ in family inet tcp dport 22 meter syn4-meter { ip saddr . tcp dport timeout 5m limit rate 20/minute } counter accept ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An old 5.10 kernel bails out simply with EEXIST, with this patch a better hint is provided. Dynamic sets are preferred over meters these days. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ct expectation: fix 'list object x' vs. 'list objects in table' confusionFlorian Westphal2023-07-311-0/+1
| | | | | | | | | | Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
* Implement 'reset {set,map,element}' commandsPhil Sutter2023-07-131-0/+5
| | | | | | | | | | | All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
* evaluate: Cache looked up set for list commandsPhil Sutter2023-07-131-0/+1
| | | | | | | | | | Evaluation phase checks the given table and set exist in cache. Relieve execution phase from having to perform the lookup again by storing the set reference in cmd->set. Just have to increase the ref counter so cmd_free() does the right thing (which lacked handling of MAP and METER objects for some reason). Signed-off-by: Phil Sutter <phil@nwl.cc>
* evaluate: Merge some cases in cmd_evaluate_list()Phil Sutter2023-07-131-32/+4
| | | | | | | | The code for set, map and meter were almost identical apart from the specific last check. Fold them together and make the distinction in that spot only. Signed-off-by: Phil Sutter <phil@nwl.cc>
* evaluate: place byteorder conversion before rshift in payload statementPablo Neira Ayuso2023-07-081-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For bitfield that spans more than one byte, such as ip6 dscp, byteorder conversion needs to be done before rshift. Add unary expression for this conversion only in the case of meta and ct statements. Before this patch: # nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp' ip6 x y [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <--------- incorrect [ meta set mark with reg 1 ] After this patch: # nft --debug=netlink add rule ip6 x y 'meta mark set ip6 dscp' ip6 x y [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ byteorder reg 1 = ntoh(reg 1, 2, 2) ] <-------- correct [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ meta set mark with reg 1 ] For the matching case, binary transfer already deals with the rshift to adjust left and right hand side of the expression, the unary conversion is not needed in such case. Fixes: 8221d86e616b ("tests: py: add test-cases for ct and packet mark payload expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ct timeout: fix 'list object x' vs. 'list objects in table' confusionFlorian Westphal2023-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | <empty ruleset> $ nft list ct timeout table t Error: No such file or directory list ct timeout table t ^ This is expected to list all 'ct timeout' objects. The failure is correct, the table 't' does not exist. But now lets add one: $ nft add table t $ nft list ct timeout table t Segmentation fault (core dumped) ... and thats not expected, nothing should be shown and nft should exit normally. Because of missing TIMEOUTS command enum, the backend thinks it should do an object lookup, but as frontend asked for 'list of objects' rather than 'show this object', handle.obj.name is NULL, which then results in this crash. Update the command enums so that backend knows what the frontend asked for. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: do not abort when prefix map has non-map elementFlorian Westphal2023-06-201-4/+13
| | | | | | | | | | | Before: nft: evaluate.c:1849: __mapping_expr_expand: Assertion `i->etype == EXPR_MAPPING' failed. after: Error: expected mapping, not set element snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24, 10.141.12.1 } Signed-off-by: Florian Westphal <fw@strlen.de>
* exthdr: add boolean DCCP option matchingJeremy Sowden2023-06-011-0/+1
| | | | | | | | | | Iptables supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend exthdr expressions to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: permit use of constant values in set lookup keysFlorian Westphal2023-05-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Something like: Given: set s { type ipv4_addr . ipv4_addr . inet_service .. } something like add rule ip saddr . 1.2.3.4 . 80 @s goto c1 fails with: "Error: Can't parse symbolic invalid expressions". This fails because the relational expression first evaluates the left hand side, so when concat evaluation sees '1.2.3.4' no key context is available. Check if the RHS is a set reference, and, if so, evaluate the right hand side. This sets a pointer to the set key in the evaluation context structure which then makes the concat evaluation step parse 1.2.3.4 and 80 as ipv4 address and 16bit port number. On delinearization, extend relop postprocessing to copy the datatype from the rhs (set reference, has proper datatype according to set->key) to the lhs (concat expression). Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: set NFT_SET_EVAL flag if dynamic set already existsPablo Neira Ayuso2023-05-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nft reports EEXIST when reading an existing set whose NFT_SET_EVAL has been previously inferred from the ruleset. # cat test.nft table ip test { set dlist { type ipv4_addr size 65535 } chain output { type filter hook output priority filter; policy accept; udp dport 1234 update @dlist { ip daddr } counter packets 0 bytes 0 } } # nft -f test.nft # nft -f test.nft test.nft:2:6-10: Error: Could not process rule: File exists set dlist { ^^^^^ Phil Sutter says: In the first call, the set lacking 'dynamic' flag does not exist and is therefore added to the cache. Consequently, both the 'add set' command and the set statement point at the same set object. In the second call, a set with same name exists already, so the object created for 'add set' command is not added to cache and consequently not updated with the missing flag. The kernel thus rejects the NEWSET request as the existing set differs from the new one. Set on the NFT_SET_EVAL flag if the existing set sets it on. Fixes: 8d443adfcc8c1 ("evaluate: attempt to set_eval flag if dynamic updates requested") Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: skip optimization if anonymous set uses stateful statementPablo Neira Ayuso2023-05-101-1/+1
| | | | | | | | fee6bda06403 ("evaluate: remove anon sets with exactly one element") introduces an optimization to remove use of sets with single element. Skip this optimization if set element contains stateful statements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: allow stateful statements with anonymous verdict mapsPablo Neira Ayuso2023-05-101-1/+2
| | | | | | | | | | | | | | Evaluation fails to accept stateful statements in verdict maps, relax the following check for anonymous sets: test.nft:4:29-35: Error: missing statement in map declaration ip saddr vmap { 127.0.0.1 counter : drop, * counter : accept } ^^^^^^^ The existing code generates correctly the counter in the anonymous verdict map. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: bail out if new flowtable does not specify hook and priorityPablo Neira Ayuso2023-04-241-1/+5
| | | | | | | | | | | | | | | | | | | | If user forgets to specify the hook and priority and the flowtable does not exist, then bail out: # cat flowtable-incomplete.nft table t { flowtable f { devices = { lo } } } # nft -f /tmp/k flowtable-incomplete.nft:2:12-12: Error: missing hook and priority in flowtable declaration flowtable f { ^ Update one existing tests/shell to specify a hook and priority. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow for updating devices on existing netdev chainPablo Neira Ayuso2023-04-241-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to add/remove devices to an existing chain: # cat ruleset.nft table netdev x { chain y { type filter hook ingress devices = { eth0 } priority 0; policy accept; } } # nft -f ruleset.nft # nft add chain netdev x y '{ devices = { eth1 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth0, eth1 } priority 0; policy accept; } } # nft delete chain netdev x y '{ devices = { eth0 }; }' # nft list ruleset table netdev x { chain y { type filter hook ingress devices = { eth1 } priority 0; policy accept; } } This feature allows for creating an empty netdev chain, with no devices. In such case, no packets are seen until a device is registered. This patch includes extended netlink error reporting: # nft add chain netdev x y '{ devices = { x } ; }' Error: Could not process rule: No such file or directory add chain netdev x y { devices = { x } ; } ^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: bogus missing transport protocolPablo Neira Ayuso2023-04-051-3/+8
| | | | | | | | | | | | | | | | | | | Users have to specify a transport protocol match such as meta l4proto tcp before the redirect statement, even if the redirect statement already implicitly refers to the transport protocol, for instance: test.nft:3:16-53: Error: transport protocol mapping is only valid after transport protocol match redirect to :tcp dport map { 83 : 8083, 84 : 8084 } ~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Evaluate the redirect expression before the mandatory check for the transport protocol match, so protocol context already provides a transport protocol. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: honor statement length in bitwise evaluationPablo Neira Ayuso2023-03-281-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get length from statement, instead infering it from the expression that is used to set the value. In the particular case of {ct|meta} mark, this is 32 bits. Otherwise, bytecode generation is not correct: # nft -c --debug=netlink 'add rule ip6 x y ct mark set ip6 dscp << 2 | 0x10' [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 1) ] [ bitwise reg 1 = ( reg 1 << 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x00000fef ) ^ 0x00000010 ] <--- incorrect! [ ct set mark with reg 1 ] the previous bitwise shift already upgraded to 32-bits (not visible from the netlink debug output above). After this patch, the last | 0x10 uses 32-bits: [ bitwise reg 1 = ( reg 1 & 0xffffffef ) ^ 0x00000010 ] note that mask 0xffffffef is used instead of 0x00000fef. Patch ("evaluate: support shifts larger than the width of the left operand") provides the statement length through eval context. Use it to evaluate the bitwise expression accordingly, otherwise bytecode is incorrect: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000' ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x1e000000 ) ^ 0x000000ff ] <-- incorrect byteorder for OR [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <-- no needed for single ip dscp byte [ ct set mark with reg 1 ] Correct bytecode: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000 ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0x0000001e ) ^ 0xff000000 ] [ ct set mark with reg 1 ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: honor statement length in integer evaluationPablo Neira Ayuso2023-03-281-2/+8
| | | | | | | | | | | | | | Otherwise, bogus error is reported: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000' Error: Value 4278190080 exceeds valid range 0-63 add rule ip x y ct mark set ip dscp & 0x0f << 1 | 0xff000000 ^^^^^^^^^^ Use the statement length as the maximum value in the mark statement expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: set up integer type to shift expressionPablo Neira Ayuso2023-03-281-0/+1
| | | | | | | | | | Otherwise expr_evaluate_value() fails with invalid datatype: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1' BUG: invalid basetype invalid nft: evaluate.c:440: expr_evaluate_value: Assertion `0' failed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: relax type-checking for integer arguments in mark statementsPablo Neira Ayuso2023-03-281-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Moreover, swap expression byteorder before to make it compatible with the statement byteorder, to ensure rulesets are portable. # nft --debug=netlink add rule ip t c 'meta mark set ip saddr' ip t c [ payload load 4b @ network header + 12 => reg 1 ] [ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <----------- byteorder swap [ meta set mark with reg 1 ] Based on original work from Jeremy Sowden. The following patches are required for this to work: evaluate: get length from statement instead of lhs expression evaluate: don't eval unary arguments evaluate: support shifts larger than the width of the left operand netlink_delinearize: correct type and byte-order of shifts evaluate: insert byte-order conversions for expressions between 9 and 15 bits Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: don't eval unary argumentsJeremy Sowden2023-03-281-4/+2
| | | | | | | | | | | | When a unary expression is inserted to implement a byte-order conversion, the expression being converted has already been evaluated and so `expr_evaluate_unary` doesn't need to do so. This is required by {ct|meta} statements with bitwise operations, which might result in byteorder conversion of the expression. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: support shifts larger than the width of the left operandPablo Neira Ayuso2023-03-281-18/+44
| | | | | | | | | | | | | | | | | | | | | | | | | If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: insert byte-order conversions for expressions between 9 and 15 bitsJeremy Sowden2023-03-221-1/+1
| | | | | | | | | | | Round up expression lengths when determining whether to insert a byte-order conversion. For example, if one is masking a network header which spans a byte boundary, the mask will span two bytes and so it will need to be in NBO. Fixes: bb03cbcd18a1 ("evaluate: no need to swap byte-order for values of fewer than 16 bits.") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Revert "evaluate: relax type-checking for integer arguments in mark statements"Pablo Neira Ayuso2023-03-141-6/+2
| | | | | | | | | | | This patch reverts eab3eb7f146c ("evaluate: relax type-checking for integer arguments in mark statements") since it might cause ruleset portability issues when moving a ruleset from little to big endian host (and vice-versa). Let's revert this until we agree on what to do in this case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: improve error reporting for unsupported chain typePablo Neira Ayuso2023-03-111-9/+0
| | | | | | | | | | | | | | | | 8c75d3a16960 ("Reject invalid chain priority values in user space") provides error reporting from the evaluation phase. Instead, this patch infers the error after the kernel reports EOPNOTSUPP. test.nft:3:28-40: Error: Chains of type "nat" must have a priority value above -200 type nat hook prerouting priority -300; ^^^^^^^^^^^^^ This patch also adds another common issue for users compiling their own kernels if they forget to enable CONFIG_NFT_NAT in their .config file. Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Reject invalid chain priority values in user spacePhil Sutter2023-03-101-0/+9
| | | | | | | | The kernel doesn't accept nat type chains with a priority of -200 or below. Catch this and provide a better error message than the kernel's EOPNOTSUPP. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add last statementPablo Neira Ayuso2023-02-281-0/+1
| | | | | | | | | | | | | | | | | | | | | This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: expand value to range when nat mapping contains intervalsPablo Neira Ayuso2023-02-281-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | If the data in the mapping contains a range, then upgrade value to range. Otherwise, the following error is displayed: /dev/stdin:11:57-75: Error: Could not process rule: Invalid argument dnat ip to iifname . ip saddr map { enp2s0 . 10.1.1.136 : 1.1.2.69, enp2s0 . 10.1.1.1-10.1.1.135 : 1.1.2.66-1.84.236.78 } ^^^^^^^^^^^^^^^^^^^ The kernel rejects this command because userspace sends a single value while the kernel expects the range that represents the min and the max IP address to be used for NAT. The upgrade is also done when concatenation with intervals is used in the rhs of the mapping. For anonymous sets, expansion cannot be done from expr_evaluate_mapping() because the EXPR_F_INTERVAL flag is inferred from the elements. For explicit sets, this can be done from expr_evaluate_mapping() because the user already specifies the interval flag in the rhs of the map definition. Update tests/shell and tests/py to improve testing coverage in this case. Fixes: 9599d9d25a6b ("src: NAT support for intervals in maps") Fixes: 66746e7dedeb ("src: support for nat with interval concatenation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: expand table command before evaluationPablo Neira Ayuso2023-02-241-39/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nested syntax notation results in one single table command which includes all other objects. This differs from the flat notation where there is usually one command per object. This patch adds a previous step to the evaluation phase to expand the objects that are contained in the table into independent commands, so both notations have similar representations. Remove the code to evaluate the nested representation in the evaluation phase since commands are independently evaluated after the expansion. The commands are expanded after the set element collapse step, in case that there is a long list of singleton element commands to be added to the set, to shorten the command list iteration. This approach also avoids interference with the object cache that is populated in the evaluation, which might refer to objects coming in the existing command list that is being processed. There is still a post_expand phase to detach the elements from the set which could be consolidated by updating the evaluation step to handle the CMD_OBJ_SETELEMS command type. This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that contains rules") which broke rule addition/insertion by index because the expansion code after the evaluation messes up the cache. Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: infer family from mappingPablo Neira Ayuso2023-02-211-5/+40
| | | | | | | If the key in the nat mapping is either ip or ip6, then set the nat family accordingly, no need for explicit family in the nat statement. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: print error on missing family in nat statementPablo Neira Ayuso2023-02-211-3/+29
| | | | | | | | | | | Print error message in case family cannot be inferred, before this patch, $? shows 1 after nft execution but no error message was printed. While at it, update error reporting for consistency in similar use cases. Fixes: e5c9c8fe0bcc ("evaluate: stmt_evaluate_nat_map() only if stmt->nat.ipportmap == true") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: relax type-checking for integer arguments in mark statementsJeremy Sowden2023-02-071-2/+6
| | | | | | | | | | | | In order to be able to set ct and meta marks to values derived from payload expressions, we need to relax the requirement that the type of the statement argument must match that of the statement key. Instead, we require that the base-type of the argument is integer and that the argument is small enough to fit. Add one testcase for tests/py. Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
* src: add support to command "destroy"Fernando F. Mancera2023-02-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | "destroy" command performs a deletion as "delete" command but does not fail if the object does not exist. As there is no NLM_F_* flag for ignoring such error, it needs to be ignored directly on error handling. Example of use: # nft list ruleset table ip filter { chain output { } } # nft destroy table ip missingtable # echo $? 0 # nft list ruleset table ip filter { chain output { } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: set eval ctx for add/update statements with integer constantsFlorian Westphal2023-01-261-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric reports that nft asserts when using integer basetype constants with 'typeof' sets. Example: table netdev t { set s { typeof ether saddr . vlan id flags dynamic,timeout } chain c { } } loads fine. But adding a rule with add/update statement fails: nft 'add rule netdev t c set update ether saddr . 0 @s' nft: netlink_linearize.c:867: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. When the 'ether saddr . 0' concat expression is processed, there is no set definition available anymore to deduce the required size of the integer constant. nft eval step then derives the required length using the data types. '0' has integer basetype, so the deduced length is 0. The assertion triggers because serialization step finds that it needs one more register. 2 are needed to store the ethernet address, another register is needed for the vlan id. Update eval step to make the expression context store the set key information when processing the preceeding set reference, then let stmt_evaluate_set() preserve the existing context instead of zeroing it again via stmt_evaluate_arg(). This makes concat expression evaluation compute the total size needed based on the sets key definition. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* Implement 'reset rule' and 'reset rules' commandsPhil Sutter2023-01-181-0/+2
| | | | | | | | Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: add gre supportPablo Neira Ayuso2023-01-021-12/+31
| | | | | | | | | | | | | GRE has a number of fields that are conditional based on flags, which requires custom dependency code similar to icmp and icmpv6. Matching on optional fields is not supported at this stage. Since this is a layer 3 tunnel protocol, an implicit dependency on NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this patch adds new infrastructure to remove an outer dependency based on the inner protocol from delinearize path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: display (inner) tag in --debug=proto-ctxPablo Neira Ayuso2023-01-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>