summaryrefslogtreecommitdiffstats
path: root/src/netlink_delinearize.c
Commit message (Collapse)AuthorAgeFilesLines
* src: disentangle ICMP code typesPablo Neira Ayuso2024-04-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ICMP{v4,v6,inet} code datatypes only describe those that are supported by the reject statement, but they can also be used for icmp code matching. Moreover, ICMP code types go hand-to-hand with ICMP types, that is, ICMP code symbols depend on the ICMP type. Thus, the output of: nft describe icmp_code look confusing because that only displays the values that are supported by the reject statement. Disentangle this by adding internal datatypes for the reject statement to handle the ICMP code symbol conversion to value as well as ruleset listing. The existing icmp_code, icmpv6_code and icmpx_code remain in place. For backward compatibility, a parser function is defined in case an existing ruleset relies on these symbols. As for the manpage, move existing ICMP code tables from the DATA TYPES section to the REJECT STATEMENT section, where this really belongs to. But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES section because that describe that this is an 8-bit integer field. After this patch: # nft describe icmp_code datatype icmp_code (icmp code) (basetype integer), 8 bits # nft describe icmpv6_code datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits # nft describe icmpx_code datatype icmpx_code (icmpx code) (basetype integer), 8 bits do not display the symbol table of the reject statement anymore. icmpx_code_type is not used anymore, but keep it in place for backward compatibility reasons. And update tests/shell accordingly. Fixes: 5fdd0b6a0600 ("nft: complete reject support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: unused code in reverse cross-day meta hour rangePablo Neira Ayuso2024-04-021-8/+4
| | | | | | | | | | | | | | | | | | | | | f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'") reverses a cross-day range expressed as "22:00"-"02:00" UTC time into != "02:00"-"22:00" so meta hour ranges works. Listing is however confusing, hence, 44d144cd593e ("netlink_delinearize: reverse cross-day meta hour range") introduces code to reverse a cross-day. However, it also adds code to reverse a range in == to-from form (assuming OP_IMPLICIT) which is never exercised from the listing path because the range expression is not currently used, instead two instructions (cmp gte and cmp lte) are used to represent the range. Remove this branch otherwise a reversed notation will be used to display meta hour ranges once the range instruction is to represent this. Add test for cross-day scenario in EADT timezone. Fixes: 44d144cd593e ("netlink_delinearize: reverse cross-day meta hour range") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: reverse cross-day meta hour rangePablo Neira Ayuso2024-03-201-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'") reverses the hour range in case that a cross-day range is used, eg. meta hour "03:00"-"14:00" counter accept which results in (Sidney, Australia AEDT time): meta hour != "14:00"-"03:00" counter accept kernel handles time in UTC, therefore, cross-day range may not be obvious according to local time. The ruleset listing above is not very intuitive to the reader depending on their timezone, therefore, complete netlink delinearize path to reverse the cross-day meta range. Update manpage to recommend to use a range expression when matching meta hour range. Recommend range expression for meta time and meta day too. Extend testcases/listing/meta_time to cover for this scenario. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: restore binop syntax when listing ruleset for flagsPablo Neira Ayuso2024-03-201-46/+19
| | | | | | | | | | | c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations") provides a similar syntax to iptables using a prefix representation for flag matching. Restore original representation using binop when listing the ruleset. The parser still accepts the prefix notation for backward compatibility. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: permit use of host-endian constant values in set lookup keysPablo Neira Ayuso2024-02-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AFL found following crash: table ip filter { map ipsec_in { typeof ipsec in reqid . iif : verdict flags interval } chain INPUT { type filter hook input priority filter; policy drop; ipsec in reqid . 100 @ipsec_in } } Which yields: nft: evaluate.c:1213: expr_evaluate_unary: Assertion `!expr_is_constant(arg)' failed. All existing test cases with constant values use big endian values, but "iif" expects host endian values. As raw values were not supported before, concat byteorder conversion doesn't handle constants. Fix this: 1. Add constant handling so that the number is converted in-place, without unary expression. 2. Add the inverse handling on delinearization for non-interval set types. When dissecting the concat data soup, watch for integer constants where the datatype indicates host endian integer. Last, extend an existing test case with the afl input to cover in/output. A new test case is added to test linearization, delinearization and matching. Based on original patch from Florian Westphal, patch subject and description wrote by him. Fixes: b422b07ab2f9 ("src: permit use of constant values in set lookup keys") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: move concat and value postprocessing to helpersFlorian Westphal2024-02-131-35/+47
| | | | | | No functional changes intended. Signed-off-by: Florian Westphal <fw@strlen.de>
* include: include <string.h> in <nft.h>Thomas Haller2023-09-281-1/+0
| | | | | | | | <string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* datatype: fix leak and cleanup reference counting for struct datatypeThomas Haller2023-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* include: include <stdlib.h> in <nft.h>Thomas Haller2023-09-111-1/+0
| | | | | | | | | | | | | | It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: remove check for NULL before calling expr_free()Pablo Neira Ayuso2023-08-311-2/+1
| | | | | | expr_free() already handles NULL pointer, remove redundant check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* include: include <std{bool,int}.h> via <nft.h>Thomas Haller2023-08-251-1/+0
| | | | | | | | | | | | | | | | | | | | There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is *always* included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add <nft.h> header and include it as firstThomas Haller2023-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: delinearize: copy set keytype if neededFlorian Westphal2023-07-271-0/+2
| | | | | | | | | | | | | Output before: add @dynmark { 0xa020304 [invalid type] timeout 1s : 0x00000002 } comment "also check timeout-gc" after: add @dynmark { 10.2.3.4 timeout 1s : 0x00000002 } comment "also check timeout-gc" This is a followup to 76c358ccfea0 ("src: maps: update data expression dtype based on set"), which did fix the map expression, but not the key. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: permit use of constant values in set lookup keysFlorian Westphal2023-05-241-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Something like: Given: set s { type ipv4_addr . ipv4_addr . inet_service .. } something like add rule ip saddr . 1.2.3.4 . 80 @s goto c1 fails with: "Error: Can't parse symbolic invalid expressions". This fails because the relational expression first evaluates the left hand side, so when concat evaluation sees '1.2.3.4' no key context is available. Check if the RHS is a set reference, and, if so, evaluate the right hand side. This sets a pointer to the set key in the evaluation context structure which then makes the concat evaluation step parse 1.2.3.4 and 80 as ipv4 address and 16bit port number. On delinearization, extend relop postprocessing to copy the datatype from the rhs (set reference, has proper datatype according to set->key) to the lhs (concat expression). Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: do not reset protocol context for nat protocol expressionPablo Neira Ayuso2023-04-051-3/+1
| | | | | | | | This patch reverts 403b46ada490 ("netlink_delinearize: kill dependency before eval of 'redirect' stmt"). Since ("evaluate: bogus missing transport protocol"), this workaround is not required anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: correct type and byte-order of shiftsJeremy Sowden2023-03-281-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Downgrade to base type integer instead of the specific type from the expression that is used in the shift operation. Without this, listing a rule like: ct mark set ip dscp lshift 2 or 0x10 will return: ct mark set ip dscp << 2 | cs2 because the type of the OR's right operand will be transitively derived from `ip dscp`. However, this is not valid syntax: # nft add rule t c ct mark set ip dscp '<<' 2 '|' cs2 Error: Could not parse integer add rule t c ct mark set ip dscp << 2 | cs2 ^^^ Use xinteger_type to print the output in hexadecimal. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinerize: incorrect byteorder in mark statement listingPablo Neira Ayuso2023-03-281-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using ip dscp in combination with bitwise operation: # nft --debug=netlink add rule ip x y 'ct mark set ip dscp | 0x4' ip x y [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ] [ bitwise reg 1 = ( reg 1 & 0xfffffffb ) ^ 0x00000004 ] [ ct set mark with reg 1 ] the listing is showing in the incorrect byteorder: # nft list ruleset table ip x { chain y { ct mark set ip dscp | 0x4000000 } } handle and and or operations in host byteorder. The following command: # nft --debug=netlink add rule ip6 x y 'ct mark set ip6 dscp | 0x4' ip6 x y [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ] [ byteorder reg 1 = ntoh(reg 1, 2, 1) ] [ bitwise reg 1 = ( reg 1 & 0xfffffffb ) ^ 0x00000004 ] [ ct set mark with reg 1 ] works fine (without requiring this patch) because there is an explicit byteorder expression. However, ip dscp takes only 1-byte, so it does not require the byteorder expression. Use host byteorder if the rhs of bitwise AND OR is larger than lhs payload expression and such expression is equal or less than 1-byte. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: support shifts larger than the width of the left operandPablo Neira Ayuso2023-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* meta: don't crash if meta key isn't knownFlorian Westphal2023-03-271-1/+3
| | | | | | | If older nft version is used for dumping, 'key' might be outside of the range of known templates. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add last statementPablo Neira Ayuso2023-02-281-0/+14
| | | | | | | | | | | | | | | | | | | | | This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: Sanitize concat data element decodingPhil Sutter2023-02-211-1/+1
| | | | | | | | | The call to netlink_get_register() might return NULL, catch this before dereferencing the pointer. Fixes: db59a5c1204c9 ("netlink_delinearize: fix decoding of concat data element") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: add postprocessing for payload binopsJeremy Sowden2023-02-071-0/+39
| | | | | | | | | | If a user uses a payload expression as a statement argument: nft add rule t c meta mark set ip dscp lshift 2 or 0x10 we may need to undo munging during delinearization. Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
* src: add gretap supportPablo Neira Ayuso2023-01-021-1/+2
| | | | Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add gre supportPablo Neira Ayuso2023-01-021-0/+48
| | | | | | | | | | | | | GRE has a number of fields that are conditional based on flags, which requires custom dependency code similar to icmp and icmpv6. Matching on optional fields is not supported at this stage. Since this is a layer 3 tunnel protocol, an implicit dependency on NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this patch adds new infrastructure to remove an outer dependency based on the inner protocol from delinearize path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: display (inner) tag in --debug=proto-ctxPablo Neira Ayuso2023-01-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add vxlan matching supportPablo Neira Ayuso2023-01-021-6/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the initial infrastructure to support for inner header tunnel matching and its first user: vxlan. A new struct proto_desc field for payload and meta expression to specify that the expression refers to inner header matching is used. The existing codebase to generate bytecode is fully reused, allowing for reusing existing supported layer 2, 3 and 4 protocols. Syntax requires to specify vxlan before the inner protocol field: ... vxlan ip protocol udp ... vxlan ip saddr 1.2.3.0/24 This also works with concatenations and anonymous sets, eg. ... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 } You have to restrict vxlan matching to udp traffic, otherwise it complains on missing transport protocol dependency, e.g. ... udp dport 4789 vxlan ip daddr 1.2.3.4 The bytecode that is generated uses the new inner expression: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] JSON support is not included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add dl_proto_ctx()Pablo Neira Ayuso2023-01-021-51/+68
| | | | | | | | | | Add dl_proto_ctx() to access protocol context (struct proto_ctx and struct payload_dep_ctx) from the delinearize path. This patch comes in preparation for supporting outer and inner protocol context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: fix decoding of concat data elementFlorian Westphal2022-12-121-0/+8
| | | | | | | | | | | | | | Its possible to use update as follows: meta l4proto tcp update @pinned { ip saddr . ct original proto-src : ip daddr . ct original proto-dst } ... but when listing, only the first element of the concatenation is shown. Check if the element size is too small and parse subsequent registers as well. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: complete payload expression in payload statementPablo Neira Ayuso2022-10-311-3/+4
| | | | | | | | | | | | | | Call payload_expr_complete() to complete payload expression in payload statement, otherwise expr->payload.desc is set to proto_unknown. Call stmt_payload_binop_postprocess() introduced by 50ca788ca4d0 ("netlink: decode payload statment") if payload_expr_complete() fails to provide a protocol description (eg. ip dscp). Follow up patch does not allow to remove redundant payload dependency if proto_unknown is used to deal with the raw payload expression case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: do not transfer binary operation to non-anonymous setsPablo Neira Ayuso2022-10-121-0/+3
| | | | | | | | | | | | | | | | Michael Braun says: This results for nft list ruleset in nft: netlink_delinearize.c:1945: binop_adjust_one: Assertion `value->len >= binop->right->len' failed. This is due to binop_adjust_one setting value->len to left->len, which is shorther than right->len. Additionally, it does not seem correct to alter set elements from parsing a rule, so remove that part all together. Reported-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* monitor: Sanitize startup race conditionPhil Sutter2022-09-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During startup, 'nft monitor' first fetches the current ruleset and then keeps this cache up to date based on received events. This is racey, as any ruleset changes in between the initial fetch and the socket opening are not recognized. This script demonstrates the problem: | #!/bin/bash | | while true; do | nft flush ruleset | iptables-nft -A FORWARD | done & | maniploop=$! | | trap "kill $maniploop; kill \$!; wait" EXIT | | while true; do | nft monitor rules >/dev/null & | sleep 0.2 | kill $! | done If the table add event is missed, the rule add event callback fails to deserialize the rule and calls abort(). Avoid the inconvenient program exit by returning NULL from netlink_delinearize_rule() instead of aborting and make callers check the return value. Signed-off-by: Phil Sutter <phil@nwl.cc>
* netlink_delinearize: also postprocess OP_AND in set element contextFlorian Westphal2022-08-051-0/+2
| | | | | | | | | | | | | Pablo reports: add rule netdev nt y update @macset { vlan id timeout 5s } listing still shows the raw expression: update @macset { @ll,112,16 & 0xfff timeout 5s } so also cover the 'set element' case. Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* proto: track full stack of seen l2 protocols, not just cumulative offsetFlorian Westphal2022-08-051-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | For input, a cumulative size counter of all pushed l2 headers is enough, because we have the full expression tree available to us. For delinearization we need to track all seen l2 headers, else we lose information that we might need at a later time. Consider: rule netdev nt nc set update ether saddr . vlan id during delinearization, the vlan proto_desc replaces the ethernet one, and by the time we try to split the concatenation apart we will search the ether saddr offset vs. the templates for proto_vlan. This replaces the offset with an array that stores the protocol descriptions seen. Then, if the payload offset is larger than our description, search the l2 stack and adjust the offset until we're within the expected offset boundary. Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: postprocess binary ands in concatenationsFlorian Westphal2022-08-051-5/+40
| | | | | | | | | | | | | | | | | | | | | | Input: update ether saddr . vlan id timeout 5s @macset ether saddr . vlan id @macset Before this patch, gets rendered as: update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s } @ll,48,48 . @ll,112,16 & 0xfff @macset After this, listing will show: update @macset { @ll,48,48 . vlan id timeout 5s } @ll,48,48 . vlan id @macset The @ll, ... is due to vlan description replacing the ethernet one, so payload decode fails to take the concatenation apart (the ethernet header payload info is matched vs. vlan template). This will be adjusted by a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: allow postprocessing on concatenated elementsFlorian Westphal2022-08-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is no case where the individual expressions inside a mapped concatenation need to be munged. However, to support proper delinearization for an input like 'rule netdev nt nc set update ether saddr . vlan id timeout 5s @macset' we need to allow this. Right now, this gets listed as: update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s } because the ethernet protocol is replaced by vlan beforehand, so we fail to map @ll,48,48 to a vlan protocol. Likewise, we can't map the vlan info either because we cannot cope with the 'and' operation properly, nor is it removed. Prepare for this by deleting and re-adding so that we do not corrupt the linked list. After this, the list can be safely changed and a followup patch can start to delete/reallocate expressions. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: memleak when parsing concatenation dataPablo Neira Ayuso2022-06-231-0/+1
| | | | | | | netlink_get_register() clones the expression in the register, release after using it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: release last register on exitPablo Neira Ayuso2022-05-161-1/+1
| | | | | | | | | | | | | | netlink_release_registers() does not release the expression in the last 32-bit register. struct netlink_parse_ctx { ... struct expr *registers[MAX_REGS + 1]; This array is MAX_REGS + 1 (verdict register + 16 32-bit registers). Fixes: 371c3a0bc3c2 ("netlink_delinearize: release expressions in context registers") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add tcp option reset supportFlorian Westphal2022-02-281-0/+4
| | | | | | | This allows to replace a tcp option with nops, similar to the TCPOPTSTRIP feature of iptables. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: store more than one payload dependencyJeremy Sowden2022-01-151-5/+9
| | | | | | | | Change the payload-dependency context to store a dependency for every protocol layer. This allows us to eliminate more redundant protocol expressions. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add a helper that returns a payload dependency for a particular baseJeremy Sowden2022-01-151-2/+2
| | | | | | | | | | | Currently, with only one base and dependency stored this is superfluous, but it will become more useful when the next commit adds support for storing a payload for every base. Remove redundant `ctx->pbase` check. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: 'nft list chain' prints anonymous chains correctlyPablo Neira Ayuso2022-01-151-0/+8
| | | | | | | | | If the user is requesting a chain listing, e.g. nft list chain x y and a rule refers to an anonymous chain that cannot be found in the cache, then fetch such anonymous chain and its ruleset. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: simplify logic governing storing payload dependenciesJeremy Sowden2022-01-151-13/+4
| | | | | | | | | | | | | | | | | | | | There are several places where we check whether `ctx->pdctx.pbase` equal to `PROTO_BASE_INVALID` and don't bother trying to free the dependency if so. However, these checks are redundant. In `payload_match_expand` and `trace_gen_stmts`, we skip a call to `payload_dependency_kill`, but that calls `payload_dependency_exists` to check a dependency exists before doing anything else. In `ct_meta_common_postprocess`, we skip an open-coded equivalent to `payload_dependency_kill` which performs some different checks, but the first is the same: a call to `payload_dependency_exists`. Therefore, we can drop the redundant checks and simplify the flow- control in the functions. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: reduce indentationJeremy Sowden2022-01-151-7/+3
| | | | | | | | Re-arrange some switch-cases and conditionals to reduce levels of indentation. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: remove arithmetic on booleansJeremy Sowden2022-01-151-4/+4
| | | | | | | | Instead of subtracting a boolean from the protocol base for stacked payloads, just decrement the base variable itself. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: fix typoJeremy Sowden2022-01-151-1/+1
| | | | | | | Correct spelling in comment. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: zero shift removalFlorian Westphal2021-12-091-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove shifts-by-0. These can occur after binop postprocessing has adjusted the RHS value to account for a mask operation. Example: frag frag-off @s4 Is internally represented via: [ exthdr load ipv6 2b @ 44 + 2 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x0000f8ff ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 >> 0x00000003 ) ] [ lookup reg 1 set s ] First binop masks out unwanted parts of the 16-bit field. Second binop needs to left-shift so that lookups in the set will work. When decoding, the first binop is removed after the exthdr load has been adjusted accordingly. Constant propagation adjusts the shift-value to 0 on removal. This change then gets rid of the shift-by-0 entirely. After this change, 'frag frag-off @s4' input is shown as-is. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: and/shift postprocessingFlorian Westphal2021-12-091-0/+7
| | | | | | | | | | | | | | | | | Before this patch: in: frag frag-off @s4 in: ip version @s8 out: (@nh,0,8 & 0xf0) >> 4 == @s8 out: (frag unknown & 0xfff8 [invalid type]) >> 3 == @s4 after: out: frag frag-off >> 0 == @s4 out: ip version >> 0 == @s8 Next patch adds support for zero-shift removal. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: binop: make accesses to expr->left/right conditionalFlorian Westphal2021-12-011-19/+31
| | | | | | | | | | | This function can be called for different expression types, including some (EXPR_MAP) where expr->left/right alias to different member variables. This makes accesses to those members conditional by checking the expression type ahead of the access. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: rename misleading variableFlorian Westphal2021-12-011-12/+12
| | | | | | | | | | | | relational_binop_postprocess() is called for EXPR_RELATIONAL, so "expr->right" is safe to use. But the RHS can be something other than a value. This has been extended to handle other types, so rename to 'right'. No code changes intended. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: use correct member typeFlorian Westphal2021-12-011-1/+1
| | | | | | | expr is a map, so this should use expr->map, not expr->left. These fields are aliased, so this would break if that is ever changed. Signed-off-by: Florian Westphal <fw@strlen.de>