nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	libnftables: move init-once guard inside xt_init()	Thomas Haller	2023-09-19	2	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A library should not restrict being used by multiple threads or make assumptions about how it's being used. Hence a "init_once" pattern without no locking is racy, a code smell and should be avoided. Note that libxtables is full of global variables and when linking against it, libnftables cannot be used from multiple threads either. That is not easy to fix. Move the ugliness of "init_once" away from nft_ctx_new(), so that the problem is concentrated closer to libxtables. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	libnftables: drop gmp_init() and mp_set_memory_functions()	Thomas Haller	2023-09-19	2	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting global handles for libgmp via mp_set_memory_functions() is very ugly. When we don't use mini-gmp, then potentially there are other users of the library in the same process, and every process fighting about the allocation functions is not gonna work. It also means, we must not reset the allocation functions after somebody already allocated GMP data with them. Which we cannot ensure, as we don't know what other parts of the process are doing. It's also unnecessary. The default allocation functions for gmp and mini-gmp already abort the process on allocation failure ([1], [2]), just like our xmalloc(). Just don't do this. [1] https://gmplib.org/repo/gmp/file/8225bdfc499f/memory.c#l37 [2] https://git.netfilter.org/nftables/tree/src/mini-gmp.c?id=6d19a902c1d77cb51b940b1ce65f31b1cad38b74#n286 Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: perform mark datatype compatibility check from maps	Pablo Neira Ayuso	2023-09-19	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Wrap datatype compatibility check into a helper function and use it for map evaluation, otherwise the following bogus error message is displayed: Error: datatype mismatch, map expects packet mark, mapping expression has type integer Add unit tests to improve coverage for this usecase. Fixes: 5d8e33ddb112 ("evaluate: relax type-checking for integer arguments in mark statements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: expand sets and maps before evaluation	Pablo Neira Ayuso	2023-09-19	4	-40/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3975430b12d9 ("src: expand table command before evaluation") moved ruleset expansion before evaluation, except for sets and maps. For sets and maps there is still a post_expand() phase. This patch moves sets and map expansion to allocate an independent CMD_OBJ_SETELEMS command to add elements to named set and maps which is evaluated, this consolidates the ruleset expansion to happen always before the evaluation step for all objects, except for anonymous sets and maps. This approach avoids an interference with the set interval code which detects overlaps and merges of adjacents ranges. This set interval routine uses set->init to maintain a cache of existing elements. Then, the post_expand() phase incorrectly expands set->init cache and it triggers a bogus ENOENT errors due to incorrect bytecode (placing element addition before set creation) in combination with user declared sets using the flat syntax notation. Since the evaluation step (coming after the expansion) creates implicit/anonymous sets and maps, those are not expanded anymore. These anonymous sets still need to be evaluated from set_evaluate() path and the netlink bytecode generation path, ie. do_add_set(), needs to deal with anonymous sets. Note that, for named sets, do_add_set() does not use set->init. Such content is part of the existing cache, and the CMD_OBJ_SETELEMS command is responsible for adding elements to named sets. Fixes: 3975430b12d9 ("src: expand table command before evaluation") Reported-by: Jann Haber <jannh@selfnet.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix memleak in prefix evaluation with wildcard interface name	Pablo Neira Ayuso	2023-09-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following ruleset: table ip x { chain y { meta iifname { abcde*, xyz } } } triggers the following memleak: ==6871== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==6871== at 0x483877F: malloc (vg_replace_malloc.c:307) ==6871== by 0x48AD898: xmalloc (utils.c:37) ==6871== by 0x4BC8B22: __gmpz_init2 (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.4.1) ==6871== by 0x4887E67: constant_expr_alloc (expression.c:424) ==6871== by 0x488EF1F: expr_evaluate_prefix (evaluate.c:1138) ==6871== by 0x488EF1F: expr_evaluate (evaluate.c:2725) ==6871== by 0x488E76D: expr_evaluate_set_elem (evaluate.c:1662) ==6871== by 0x488E76D: expr_evaluate (evaluate.c:2739) ==6871== by 0x4891033: list_member_evaluate (evaluate.c:1454) ==6871== by 0x488E2B6: expr_evaluate_set (evaluate.c:1757) ==6871== by 0x488E2B6: expr_evaluate (evaluate.c:2737) ==6871== by 0x48910D0: elems_evaluate (evaluate.c:4605) ==6871== by 0x4891432: set_evaluate (evaluate.c:4711) ==6871== by 0x48915BC: implicit_set_declaration (evaluate.c:122) ==6871== by 0x488F18A: expr_evaluate_relational (evaluate.c:2503) ==6871== by 0x488F18A: expr_evaluate (evaluate.c:2745) expr_evaluate_prefix() calls constant_expr_alloc() which have already called mpz_init2(), the second call to mpz_init2() overlaps the existing mpz_t data memory area. Remove extra mpz_init2() call to fix this memleak. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: fix leaking typeof_expr_data/typeof_expr_key in ↵	Thomas Haller	2023-09-19	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	netlink_delinearize_set() There are various code paths that return without freeing typeof_expr_data and typeof_expr_key. It's not at all obvious, that there isn't a leak that way. Quite possibly there is a leak. Fix it, or at least make the code more obviously correct. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	6	-50/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	30	-30/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	parser_bison: include <nft.h> for base C environment to "parser_bison.y"	Thomas Haller	2023-09-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	All our C sources should include <nft.h> as first. This prepares an environment of things that we expect to have available in all our C sources (and indirectly in our internal header files, because internal header files are always indirectly from a C source). Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: rename "dtype_clone()" to datatype_clone()	Thomas Haller	2023-09-08	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	cache: avoid accessing uninitialized varible in implicit_chain_cache()	Thomas Haller	2023-09-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	$ ./tests/shell/run-tests.sh -V tests/shell/testcases/cache/0010_implicit_chain_0 Gives: ==59== Conditional jump or move depends on uninitialised value(s) ==59== at 0x48A6A6B: mnl_nft_rule_dump (mnl.c:695) ==59== by 0x48778EA: rule_cache_dump (cache.c:664) ==59== by 0x487797D: rule_init_cache (cache.c:997) ==59== by 0x4877ABF: implicit_chain_cache.isra.0 (cache.c:1032) ==59== by 0x48785C9: cache_init_objects (cache.c:1132) ==59== by 0x48785C9: nft_cache_init (cache.c:1166) ==59== by 0x48785C9: nft_cache_update (cache.c:1224) ==59== by 0x48ADBE1: nft_evaluate (libnftables.c:530) ==59== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:596) ==59== by 0x402983: main (main.c:535) Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: fix get element for concatenated set	Florian Westphal	2023-09-06	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	given: table ip filter { set test { type ipv4_addr . ether_addr . mark flags interval elements = { 198.51.100.0/25 . 00:0b:0c:ca:cc:10-c1:a0:c1:cc:10:00 . 0x0000006f, } } } We get lookup failure: nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f } Error: Could not process rule: No such file or directory Its possible to work around this via dummy range somewhere in the key, e.g. nft get element ip filter test { 198.51.100.1 . 00:0b:0c:ca:cc:10 . 0x6f-0x6f } but that shouldn't be needed, so make sure the INTERVAL flag is enabled for the queried element if the set is of interval type. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: revisit anonymous set with single element optimization	Pablo Neira Ayuso	2023-09-02	1	-20/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reworks it to perform this optimization from the evaluation step of the relational expression. Hence, when optimizing for protocol flags, use OP_EQ instead of OP_IMPLICIT, that is: tcp flags { syn } becomes (to represent an exact match): tcp flags == syn given OP_IMPLICIT and OP_EQ are not equivalent for flags. 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") disabled this optimization, which is enabled again after this patch. Fixes: 01167c393a12 ("evaluate: do not remove anonymous set with protocol flags and single element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: place byteorder conversion after numgen for IP address datatypes	Jorge Ortiz	2023-09-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The numgen extension generates numbers in little-endian. This can be very tricky when trying to combine it with IP addresses, which use big endian. This change adds a new byteorder operation to convert data type endianness. Before this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ nat snat ip addr_min reg 1 ] After this patch: $ sudo nft -d netlink add rule nat snat_chain snat to numgen inc mod 7 offset 0x0a000001 ip nat snat_chain [ numgen reg 1 = inc mod 7 offset 167772161 ] [ byteorder reg 1 = hton(reg 1, 4, 4) ] [ nat snat ip addr_min reg 1 ] Regression tests have been modified to include these new cases. Signed-off-by: Jorge Ortiz Escribano <jorge.ortiz.escribano@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	rule: set internal_location for table and chain	Pablo Neira Ayuso	2023-08-31	1	-0/+2
\| \| \| \| \| \| \|	JSON parser does not seem to set on this, better provide a default location. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: simplify chain_alloc()	Pablo Neira Ayuso	2023-08-31	5	-8/+8
\| \| \| \| \| \| \|	Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove check for NULL before calling expr_free()	Pablo Neira Ayuso	2023-08-31	2	-4/+3
\| \| \| \| \| \|	expr_free() already handles NULL pointer, remove redundant check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: use internal_location for unspecified location at allocation time	Pablo Neira Ayuso	2023-08-31	3	-12/+19
\| \| \| \| \| \| \|	Set location to internal_location instead of NULL to ensure this is always set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	xt: avoid "-Wmissing-field-initializers" for "original_opts"	Thomas Haller	2023-08-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid this warning with clang: CC src/xt.lo src/xt.c:353:9: error: missing field 'has_arg' initializer [-Werror,-Wmissing-field-initializers] { NULL }, ^ The warning seems not very useful, because it's well understood that specifying only some initializers leaves the remaining fields initialized with the default. However, as this warning is only hit once in the code base, it doesn't seem that we violate this style frequently. Hence, fix it instead of disabling the warning. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: silence "implicit-fallthrough" warnings	Thomas Haller	2023-08-30	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gcc with "-Wextra" warns: CC segtree.lo segtree.c: In function 'get_set_interval_find': segtree.c:129:28: error: this statement may fall through [-Werror=implicit-fallthrough=] 129 \| if (expr_basetype(i->key)->type != TYPE_STRING) \| ^ segtree.c:134:17: note: here 134 \| case EXPR_PREFIX: \| ^~~~ CC optimize.lo optimize.c: In function 'rule_collect_stmts': optimize.c:396:28: error: this statement may fall through [-Werror=implicit-fallthrough=] 396 \| if (stmt->expr->left->etype == EXPR_CONCAT) { \| ^ optimize.c:400:17: note: here 400 \| case STMT_VERDICT: \| ^~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	rule: fix "const static" declaration	Thomas Haller	2023-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gcc warns against this with "-Wextra": src/rule.c:869:1: error: static is not at beginning of declaration [-Werror=old-style-declaration] 869 \| const static struct prio_tag std_prios[] = { \| ^~~~~ src/rule.c:878:1: error: static is not at beginning of declaration [-Werror=old-style-declaration] 878 \| const static struct prio_tag bridge_std_prios[] = { \| ^~~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	proto: use hexadecimal to display ip frag-off field	Pablo Neira Ayuso	2023-08-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The ip frag-off field in the protocol definition is 16-bits long and it contains the DF (0x2000) and MF (0x4000) bits too. iptables-translate also suggests: ip frag-off & 0x1ffff != 0 to match on fragments. Use hexadecimal for listing this header field. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: do not remove anonymous set with protocol flags and single element	Pablo Neira Ayuso	2023-08-30	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set lookups with flags search for an exact match, however: tcp flags { syn } gets transformed into: tcp flags syn which is matching on the syn flag only (non-exact match). This optimization is safe for ct state though, because only one bit is ever set on in the ct state bitmask. Since protocol flags allow for combining flags, skip this optimization to retain exact match semantics. Another possible solution is to turn OP_IMPLICIT into OP_EQ for exact flag match to re-introduce this optimization and deal with this corner case. Fixes: fee6bda06403 ("evaluate: remove anon sets with exactly one element") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: suppress "-Wunused-but-set-variable" warning with "parser_bison.c"	Thomas Haller	2023-08-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang warns: parser_bison.c:7606:9: error: variable 'nft_nerrs' set but not used [-Werror,-Wunused-but-set-variable] int yynerrs = 0; ^ parser_bison.c:72:25: note: expanded from macro 'yynerrs' #define yynerrs nft_nerrs ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: don't needlessly clear full string buffer in ↵	Thomas Haller	2023-08-29	1	-1/+4
\| \| \| \| \| \| \|	stmt_evaluate_log_prefix() Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rework SNPRINTF_BUFFER_SIZE() and handle truncation	Thomas Haller	2023-08-29	2	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, the macro asserts against truncation. This is despite the callers still checked for truncation and tried to handle it. Probably for good reason. With stmt_evaluate_log_prefix() it's not clear that the code ensures that truncation cannot happen, so we must not assert against it, but handle it. Also, - wrap the macro in "do { ... } while(0)" to make it more function-like. - evaluate macro arguments exactly once, to make it more function-like. - take pointers to the arguments that are being modified. - use assert() instead of abort(). - use size_t type for arguments related to the buffer size. - drop "size". It was mostly redundant to "offset". We can know everything we want based on "len" and "offset" alone. - "offset" previously was incremented before checking for truncation. So it would point somewhere past the buffer. This behavior does not seem useful. Instead, on truncation "len" will be zero (as before) and "offset" will point one past the buffer (one past the terminating NUL). Thereby, also fix a warning from clang: evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size = 0; ^ meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size; ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix check for truncation in stmt_evaluate_log_prefix()	Thomas Haller	2023-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, nft crashes with prefix longer than 127 bytes: # nft add rule x y log prefix \"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\" ==159385==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffed5bf4a10 at pc 0x7f3134839269 bp 0x7ffed5bf48b0 sp 0x7ffed5bf4060 WRITE of size 129 at 0x7ffed5bf4a10 thread T0 #0 0x7f3134839268 in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:778 #1 0x7f3133e3074e in __mpz_export_data /tmp/nftables/src/gmputil.c:110 #2 0x7f3133d21d3c in expr_to_string /tmp/nftables/src/expression.c:192 #3 0x7f3133ded103 in netlink_gen_log_stmt /tmp/nftables/src/netlink_linearize.c:1148 #4 0x7f3133df33a1 in netlink_gen_stmt /tmp/nftables/src/netlink_linearize.c:1682 [...] Fixes: e76bb3794018 ('src: allow for variables in the log prefix string') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: avoid cast-align warning with struct sockaddr result from ↵	Thomas Haller	2023-08-29	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() With CC=clang we get datatype.c:625:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] addr = ((struct sockaddr_in )ai->ai_addr)->sin_addr; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ datatype.c:690:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in6 ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] addr = ((struct sockaddr_in6 )ai->ai_addr)->sin6_addr; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ datatype.c:826:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] port = ((struct sockaddr_in )ai->ai_addr)->sin_port; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by casting to (void) first. Also, add an assertion that the type is as expected. For inet_service_type_parse(), differentiate between AF_INET and AF_INET6. It might not have been a problem in practice, because the struct offsets of sin_port/sin6_port are identical. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: avoid "-Wenum-conversion" warning in parser_bison.y	Thomas Haller	2023-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang warns: parser_bison.y:3658:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion] { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_SNAT); } ~~~~~~~~~~~~~~ ^~~~~~~~~~~~ parser_bison.y:3659:83: error: implicit conversion from enumeration type 'enum nft_nat_types' to different enumeration type 'enum nft_nat_etypes' [-Werror,-Wenum-conversion] { (yyval.stmt) = nat_stmt_alloc(&(yyloc), NFT_NAT_DNAT); } ~~~~~~~~~~~~~~ ^~~~~~~~~~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: avoid "-Wenum-conversion" warning in dtype_map_from_kernel()	Thomas Haller	2023-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Clang warns: netlink.c:806:26: error: implicit conversion from enumeration type 'enum nft_data_types' to different enumeration type 'enum datatypes' [-Werror,-Wenum-conversion] return datatype_lookup(type); ~~~~~~~~~~~~~~~ ^~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	parser: permit gc-interval in map declarations	Florian Westphal	2023-08-29	1	-0/+5
\| \| \| \| \| \| \|	Maps support timeouts, so allow to set the gc interval as well. Fixes: 949cc39eb93f ("parser: support of maps with timeout") Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: Drop dead code from expr_evaluate_mapping()	Phil Sutter	2023-08-29	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 343a51702656a ("src: store expr, not dtype to track data in sets"), set->data is allocated for object maps in set_evaluate(), all other map types have set->data initialized by the parser already, set_evaluate() also checks that. Drop the confusing check, later in the function set->data is dereferenced unconditionally. Fixes: 343a51702656a ("src: store expr, not dtype to track data in sets") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	include: include <std{bool,int}.h> via <nft.h>	Thomas Haller	2023-08-25	17	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is always included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	49	-8/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	meta: define _GNU_SOURCE to get strptime() from <time.h>	Thomas Haller	2023-08-25	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To use `strptime()`, the documentation indicates #define _XOPEN_SOURCE #include <time.h> However, previously this was done wrongly. For example, when building with musl we got a warning: CC meta.lo meta.c:40: warning: "_XOPEN_SOURCE" redefined 40 \| #define _XOPEN_SOURCE \| In file included from /usr/include/errno.h:8, from meta.c:13: /usr/include/features.h:16: note: this is the location of the previous definition 16 \| #define _XOPEN_SOURCE 700 \| Defining "__USE_XOPEN" is wrong. This is a glibc internal define not for the user. Note that if we just set _XOPEN_SOURCE (or _XOPEN_SOURCE=700), we won't get other things like "struct tm.tm_gmtoff". Instead, we already define _GNU_SOURCE at other places. Do that here too, it will give us strptime() and all is good. Also, those directives should be defined as first thing (or via "-D" command line). See [1]. This is also important, because to use "time_t" in a header, we would need to include <time.h>. That only works, if we get the feature test macros right. That is, define the _?_SOURCE macro as first thing. [1] https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_JSON to enable JSON parsing	Thomas Haller	2023-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, the input is parsed using the nftables grammar. When setting NFT_CTX_OUTPUT_JSON flag, nftables will first try to parse the input as JSON before falling back to the nftables grammar. But NFT_CTX_OUTPUT_JSON flag also turns on JSON for the output. Add a flag NFT_CTX_INPUT_JSON which allows to treat only the input as JSON, but keep the output mode unchanged. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking	Thomas Haller	2023-08-24	2	-30/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flags for nft_ctx	Thomas Haller	2023-08-24	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	Similar to the existing output flags, add input flags. No flags are yet implemented, that will follow. One difference to nft_ctx_output_set_flags(), is that the setter for input flags returns the previously set flags. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: error out on meter overlap with an existing set/map declaration	Pablo Neira Ayuso	2023-08-23	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the problems with meters is that they use the set/map infrastructure behind the scenes which might be confusing to users. This patch errors out in case user declares a meter whose name overlaps with an existing set/map: meter.nft:15:18-91: Error: File exists; meter ‘syn4-meter’ overlaps an existing set ‘syn4-meter’ in family inet tcp dport 22 meter syn4-meter { ip saddr . tcp dport timeout 5m limit rate 20/minute } counter accept ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An old 5.10 kernel bails out simply with EEXIST, with this patch a better hint is provided. Dynamic sets are preferred over meters these days. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	cache: chain listing implicitly sets on terse option	Pablo Neira Ayuso	2023-08-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user specifies a chain to be listed (which is internally handled via filtering options), then toggle NFT_CACHE_TERSE to skip fetching set content from kernel for non-anonymous sets. With a large IPv6 set with bogons, before this patch: # time nft list chain inet raw x table inet raw { chain x { ip6 saddr @bogons6 ip6 saddr { aaaa::, bbbb:: } } } real 0m2,913s user 0m1,345s sys 0m1,568s After this patch: # time nft list chain inet raw prerouting table inet raw { chain x { ip6 saddr @bogons6 ip6 saddr { aaaa::, bbbb:: } } } real 0m0,056s user 0m0,018s sys 0m0,039s This speeds up chain listing in the presence of a large set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	meta: use reentrant localtime_r()/gmtime_r() functions	Thomas Haller	2023-08-22	1	-19/+22
\| \| \| \| \| \| \| \| \| \|	These functions are POSIX.1-2001. We should have them in all environments we care about. Use them as they are thread-safe. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	meta: don't assume time_t is 64 bit in date_type_print()	Thomas Haller	2023-08-22	1	-5/+8
\| \| \| \| \| \| \| \| \|	time_t on 32bit arch is not uint64_t. Even if it always were, it would be ugly to make such an assumption (without a static assert). Copy the value to a time_t variable first. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	nftutils: add and use wrappers for getprotoby{name,number}_r(), ↵	Thomas Haller	2023-08-20	6	-30/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getservbyport_r() We should aim to use the thread-safe variants of getprotoby{name,number} and getservbyport(). However, they may not be available with other libc, so it requires a configure check. As that is cumbersome, add wrappers that do that at one place. These wrappers are thread-safe, if libc provides the reentrant versions. Use them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	json: use strtok_r() instead of strtok()	Thomas Haller	2023-08-18	1	-2/+3
\| \| \| \| \| \| \| \| \|	strtok_r() is probably(?) everywhere available where we care. Use it. It is thread-safe, and libnftables shouldn't make assumptions about what other threads of the process are doing. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	parser: deduplicate map with data interval	Florian Westphal	2023-08-03	1	-13/+7
\| \| \| \| \| \| \| \| \| \|	Its copypasted, the copy is same as original except that it specifies a map key that maps to an interval. Add an exra rule that returns 0 or EXPR_F_INTERVAL, then use that in a single rule. Signed-off-by: Florian Westphal <fw@strlen.de>
*	parser: allow ct timeouts to use time_spec values	Florian Westphal	2023-08-03	2	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some reason the parser only allows raw numbers (seconds) for ct timeouts, e.g. ct timeout ttcp { protocol tcp; policy = { syn_sent : 3, ... Also permit time_spec, e.g. "established : 5d". Print the nicer time formats on output, but retain raw numbers support on input for compatibility. Signed-off-by: Florian Westphal <fw@strlen.de>
*	libnftables: Drop cache in -c/--check mode	Pablo Neira Ayuso	2023-08-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend e0aace943412 ("libnftables: Drop cache in error case") to also drop the cache with -c/--check, this is a dry run mode and kernel does not get any update. This fixes a bug with -o/--optimize, which first runs in an implicit -c/--check mode to validate that the ruleset is correct, then it provides the proposed optimization. In this case, if the cache is not emptied, old objects in the cache refer to scanner data that was already released, which triggers BUG like this: BUG: invalid input descriptor type 151665524 nft: erec.c:161: erec_print: Assertion `0' failed. Aborted This bug was triggered in a ruleset that contains a set for geoip filtering. This patch also extends tests/shell to cover this case. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ct expectation: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-07-31	4	-1/+4
\| \| \| \| \| \| \| \| \| \|	Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
*	rule: allow src/dstnat prios in input and output	Florian Westphal	2023-07-31	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dan Winship says: The "dnat" command is usable from either "prerouting" or "output", but the "dstnat" priority is only usable from "prerouting". (Likewise, "snat" is usable from either "postrouting" or "input", but "srcnat" is only usable from "postrouting".) No need to restrict those priorities to pre/postrouting. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1694 Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: delinearize: copy set keytype if needed	Florian Westphal	2023-07-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Output before: add @dynmark { 0xa020304 [invalid type] timeout 1s : 0x00000002 } comment "also check timeout-gc" after: add @dynmark { 10.2.3.4 timeout 1s : 0x00000002 } comment "also check timeout-gc" This is a followup to 76c358ccfea0 ("src: maps: update data expression dtype based on set"), which did fix the map expression, but not the key. Signed-off-by: Florian Westphal <fw@strlen.de>