nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	src: remove xfree() and use plain free()	Thomas Haller	2023-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmalloc() (and similar x-functions) are used for allocation. They wrap malloc()/realloc() but will abort the program on ENOMEM. The meaning of xmalloc() is that it wraps malloc() but aborts on failure. I don't think x-functions should have the notion, that this were potentially a different memory allocator that must be paired with a particular xfree(). Even if the original intent was that the allocator is abstracted (and possibly not backed by standard malloc()/free()), then that doesn't seem a good idea. Nowadays libc allocators are pretty good, and we would need a very special use cases to switch to something else. In other words, it will never happen that xmalloc() is not backed by malloc(). Also there were a few places, where a xmalloc() was already "wrongly" paired with free() (for example, iface_cache_release(), exit_cookie(), nft_run_cmd_from_buffer()). Or note how pid2name() returns an allocated string from fscanf(), which needs to be freed with free() (and not xfree()). This requirement bubbles up the callers portid2name() and name_by_portid(). This case was actually handled correctly and the buffer was freed with free(). But it shows that mixing different allocators is cumbersome to get right. Of course, we don't actually have different allocators and whether to use free() or xfree() makes no different. The point is that xfree() serves no actual purpose except raising irrelevant questions about whether x-functions are correctly paired with xfree(). Note that xfree() also used to accept const pointers. It is bad to unconditionally for all deallocations. Instead prefer to use plain free(). To free a const pointer use free_const() which obviously wraps free, as indicated by the name. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add free_const() and use it instead of xfree()	Thomas Haller	2023-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost everywhere xmalloc() and friends is used instead of malloc(). This is almost everywhere paired with xfree(). xfree() has two problems. First, it brings the wrong notion that xmalloc() should be paired with xfree(), as if xmalloc() would not use the plain malloc() allocator. In practices, xfree() just wraps free(), and it wouldn't make sense any other way. xfree() should go away. This will be addressed in the next commit. The problem addressed by this commit is that xfree() accepts a const pointer. Paired with the practice of almost always using xfree() instead of free(), all our calls to xfree() cast away constness of the pointer, regardless whether that is necessary. Declaring a pointer as const should help us to catch wrong uses. If the xfree() function always casts aways const, the compiler doesn't help. There are many places that rightly cast away const during free. But not all of them. Add a free_const() macro, which is like free(), but accepts const pointers. We should always make an intentional choice whether to use free() or free_const(). Having a free_const() macro makes this very common choice clearer, instead of adding a (void*) cast at many places. Note that we now pair xmalloc() allocations with a free() call (instead of xfree(). That inconsistency will be resolved in the next commit. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	gmputil: add nft_gmp_free() to free strings from mpz_get_str()	Thomas Haller	2023-11-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mpz_get_str() (with NULL as first argument) will allocate a buffer using the allocator functions (mp_set_memory_functions()). We should free those buffers with the corresponding free function. Add nft_gmp_free() for that and use it. The name nft_gmp_free() is chosen because "mini-gmp.c" already has an internal define called gmp_free(). There wouldn't be a direct conflict, but using the same name is confusing. And maybe our own defines should have a clear nft prefix. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	build: no recursive-make for "include/**/Makefile.am"	Thomas Haller	2023-11-02	8	-70/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch from recursive-make to a single top-level Makefile. This is the first step, the following patches will continue this. Unlike meson's subdir() or C's #include, automake's SUBDIRS= does not include a Makefile. Instead, it calls `make -C $dir`. https://www.gnu.org/software/make/manual/html_node/Recursion.html https://www.gnu.org/software/automake/manual/html_node/Subdirectories.html See also, "Recursive Make Considered Harmful". https://accu.org/journals/overload/14/71/miller_2004/ This has several problems, which we an avoid with a single Makefile: - recursive-make is harder to maintain and understand as a whole. Recursive-make makes sense, when there are truly independent sub-projects. Which is not the case here. The project needs to be considered as a whole and not one directory at a time. When we add unit tests (which we should), those would reside in separate directories but have dependencies between directories. With a single Makefile, we see all at once. The build setup has an inherent complexity, and that complexity is not necessarily reduced by splitting it into more files. On the contrary it helps to have it all in once place, provided that it's sensibly structured, named and organized. - typing `make` prints irrelevant "Entering directory" messages. So much so, that at the end of the build, the terminal is filled with such messages and we have to scroll to see what even happened. - with recursive-make, during build we see: make[3]: Entering directory '.../nftables/src' CC meta.lo meta.c:13:2: error: #warning hello test [-Werror=cpp] 13 \| #warning hello test \| ^~~~~~~ With a single Makefile we get CC src/meta.lo src/meta.c:13:2: error: #warning hello test [-Werror=cpp] 13 \| #warning hello test \| ^~~~~~~ This shows the full filename -- assuming that the developer works from the top level directory. The full name is useful, for example to copy+paste into the terminal. - single Makefile is also faster: $ make && perf stat -r 200 -B make -j I measure 35msec vs. 80msec. - recursive-make limits parallel make. You have to craft the SUBDIRS= in the correct order. The dependencies between directories are limited, as make only sees "LDADD = $(top_builddir)/src/libnftables.la" and not the deeper dependencies for the library. - I presume, some people like recursive-make because of `make -C $subdir` to only rebuild one directory. Rebuilding the entire tree is already very fast, so this feature seems not relevant. Also, as dependency handling is limited, we might wrongly not rebuild a target. For example, make check touch src/meta.c make -C examples check does not rebuild "examples/nft-json-file". What we now can do with single Makefile (and better than before), is `make examples/nft-json-file`, which works as desired and rebuilds all dependencies. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	icmpv6: Allow matching target address in NS/NA, redirect and MLD	Nicolas Cavallari	2023-10-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was currently not possible to match the target address of a neighbor solicitation or neighbor advertisement against a dynamic set, unlike in IPv4. Since they are many ICMPv6 messages with an address at the same offset, allow filtering on the target address for all icmp types that have one. While at it, also allow matching the destination address of an ICMPv6 redirect. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Signed-off-by: Florian Westphal <fw@strlen.de>
*	json: add missing map statement stub	Pablo Neira Ayuso	2023-09-28	1	-0/+1
\| \| \| \| \| \| \|	Add map statement stub to restore compilation without json support. Fixes: 27a2da23d508 ("netlink_linearize: skip set element expression in map statement key") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <string.h> in <nft.h>	Thomas Haller	2023-09-28	3	-3/+1
\| \| \| \| \| \| \| \|	<string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_linearize: skip set element expression in map statement key	Pablo Neira Ayuso	2023-09-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fix is similar to 22d201010919 ("netlink_linearize: skip set element expression in set statement key") to fix map statement. netlink_gen_map_stmt() relies on the map key, that is expressed as a set element. Use the set element key instead to skip the set element wrap, otherwise get_register() abort execution: nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. This includes JSON support to make this feature complete and it updates tests/shell to cover for this support. Reported-by: Luci Stanescu <luci@cnix.ro> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	expression: cleanup expr_ops_by_type() and handle u32 input	Thomas Haller	2023-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make fewer assumptions about the underlying integer type of the enum. Instead, be clear about where we have an untrusted uint32_t from netlink and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make this clearer. Later we might make the enum as packed, when this starts to matter more. Also, only the code path expr_ops() wants strict validation and assert against valid enum values. Move the assertion out of __expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to duplicate the handling of EXPR_INVALID. We still need to duplicate the check against EXPR_MAX, to ensure that the uint32_t value can be cast to an enum value. [ Remove cast on EXPR_MAX. --pablo ] Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: return const pointer from datatype_get()	Thomas Haller	2023-09-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	"struct datatype" is for the most part immutable, and most callers deal with const pointers. That's why datatype_get() accepts a const pointer to increase the reference count (mutating the refcnt field). It should also return a const pointer. In fact, all callers are fine with that already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use "enum byteorder" instead of int in set_datatype_alloc()	Thomas Haller	2023-09-20	1	-1/+1
\| \| \| \| \| \| \|	Use the enum types as we have them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: fix missing definitions in <cache.h>/<headers.h>	Thomas Haller	2023-09-20	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	The headers should be self-contained so they can be included in any order. With exception of <nft.h>, which any internal header can rely on. Some fixes for <cache.h>/<headers.h>. In case of <cache.h>, forward declare some of the structs instead of including the headers. <headers.h> uses struct in6_addr. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_EVENTBIT slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Matching on ct event makes no sense since this is mostly used as statement to globally filter out ctnetlink events, but do not crash if it is used from concatenations. Add the missing slot in the datatype array so this does not crash. Fixes: 2595b9ad6840 ("ct: add conntrack event mask support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_LABEL slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, ct label with concatenations such as: table ip x { chain y { ct label . ct mark { 0x1 . 0x1 } } } crashes: ../include/datatype.h:196:11: runtime error: member access within null pointer of type 'const struct datatype' AddressSanitizer:DEADLYSIGNAL ================================================================= ==640948==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc970d3199b bp 0x7fffd1f20560 sp 0x7fffd1f20540 T0) ==640948==The signal is caused by a READ memory access. ==640948==Hint: address points to the zero page. sudo #0 0x7fc970d3199b in datatype_equal ../include/datatype.h:196 Fixes: 2fcce8b0677b ("ct: connlabel matching support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	libnftables: drop gmp_init() and mp_set_memory_functions()	Thomas Haller	2023-09-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting global handles for libgmp via mp_set_memory_functions() is very ugly. When we don't use mini-gmp, then potentially there are other users of the library in the same process, and every process fighting about the allocation functions is not gonna work. It also means, we must not reset the allocation functions after somebody already allocated GMP data with them. Which we cannot ensure, as we don't know what other parts of the process are doing. It's also unnecessary. The default allocation functions for gmp and mini-gmp already abort the process on allocation failure ([1], [2]), just like our xmalloc(). Just don't do this. [1] https://gmplib.org/repo/gmp/file/8225bdfc499f/memory.c#l37 [2] https://git.netfilter.org/nftables/tree/src/mini-gmp.c?id=6d19a902c1d77cb51b940b1ce65f31b1cad38b74#n286 Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: rename "dtype_clone()" to datatype_clone()	Thomas Haller	2023-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: simplify chain_alloc()	Pablo Neira Ayuso	2023-08-31	1	-1/+1
\| \| \| \| \| \| \|	Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	utils: call abort() after BUG() macro	Thomas Haller	2023-08-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, we get spurious warnings. The compiler should be aware that there is no return from BUG(). Call abort() there, which is marked as __attribute__ ((__noreturn__)). In file included from ./include/nftables.h:6, from ./include/rule.h:4, from src/payload.c:26: src/payload.c: In function 'icmp_dep_to_type': ./include/utils.h:39:34: error: this statement may fall through [-Werror=implicit-fallthrough=] 39 \| #define BUG(fmt, arg...) ({ fprintf(stderr, "BUG: " fmt, ##arg); assert(0); }) \| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/payload.c:791:17: note: in expansion of macro 'BUG' 791 \| BUG("Invalid map for simple dependency"); \| ^~~ src/payload.c:792:9: note: here 792 \| case PROTO_ICMP_ECHO: return ICMP_ECHO; \| ^~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: drop "format" attribute from nft_gmp_print()	Thomas Haller	2023-08-29	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft_gmp_print() passes the format string and arguments to gmp_vfprintf(). Note that the format string is then interpreted by gmp, which also understand special specifiers like "%Zx". Note that with clang we get various compiler warnings: datatype.c:299:26: error: invalid conversion specifier 'Z' [-Werror,-Wformat-invalid-specifier] nft_gmp_print(octx, "0x%Zx [invalid type]", expr->value); ~^ gcc doesn't warn, because to gcc 'Z' is a deprecated alias for 'z' and because the 3rd argument of the attribute((format())) is zero (so gcc doesn't validate the arguments). But Z specifier in gmp expects a "mpz_t" value and not a size_t. It's really not the same thing. The correct solution is not to mark the function to accept a printf format string. Fixes: 2535ba7006f2 ('src: get rid of printf') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rework SNPRINTF_BUFFER_SIZE() and handle truncation	Thomas Haller	2023-08-29	1	-9/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, the macro asserts against truncation. This is despite the callers still checked for truncation and tried to handle it. Probably for good reason. With stmt_evaluate_log_prefix() it's not clear that the code ensures that truncation cannot happen, so we must not assert against it, but handle it. Also, - wrap the macro in "do { ... } while(0)" to make it more function-like. - evaluate macro arguments exactly once, to make it more function-like. - take pointers to the arguments that are being modified. - use assert() instead of abort(). - use size_t type for arguments related to the buffer size. - drop "size". It was mostly redundant to "offset". We can know everything we want based on "len" and "offset" alone. - "offset" previously was incremented before checking for truncation. So it would point somewhere past the buffer. This behavior does not seem useful. Instead, on truncation "len" will be zero (as before) and "offset" will point one past the buffer (one past the terminating NUL). Thereby, also fix a warning from clang: evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size = 0; ^ meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size; ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <std{bool,int}.h> via <nft.h>	Thomas Haller	2023-08-25	7	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is always included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	configure: use AC_USE_SYSTEM_EXTENSIONS to get _GNU_SOURCE	Thomas Haller	2023-08-25	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let "configure" detect which features are available. Also, nftables is a Linux project, so portability beyond gcc/clang and glibc/musl is less relevant. And even if it were, then feature detection by "configure" would still be preferable. Use AC_USE_SYSTEM_EXTENSIONS ([1]). Available since autoconf 2.60, from 2006 ([2]). [1] https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Posix-Variants.html#index-AC_005fUSE_005fSYSTEM_005fEXTENSIONS-1046 [2] https://lists.gnu.org/archive/html/autoconf/2006-06/msg00111.html Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: don't define _GNU_SOURCE in public header	Thomas Haller	2023-08-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	_GNU_SOURCE is supposed to be defined as first thing, before including any libc headers. Defining it in the public header of nftables is wrong, because it would only (somewhat) work if the user includes the nftables header as first thing too. But that is not what users commonly would do, in particular with autotools projects, where users would include <config.h> first. It's also unnecessary. Nothing in "nftables/libnftables.h" itself requires _GNU_SOURCE. Drop it. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	5	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_JSON to enable JSON parsing	Thomas Haller	2023-08-24	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, the input is parsed using the nftables grammar. When setting NFT_CTX_OUTPUT_JSON flag, nftables will first try to parse the input as JSON before falling back to the nftables grammar. But NFT_CTX_OUTPUT_JSON flag also turns on JSON for the output. Add a flag NFT_CTX_INPUT_JSON which allows to treat only the input as JSON, but keep the output mode unchanged. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking	Thomas Haller	2023-08-24	3	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flags for nft_ctx	Thomas Haller	2023-08-24	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Similar to the existing output flags, add input flags. No flags are yet implemented, that will follow. One difference to nft_ctx_output_set_flags(), is that the setter for input flags returns the previously set flags. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ct expectation: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-07-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
*	include: missing dccpopt.h breaks make distcheck	Pablo Neira Ayuso	2023-07-14	1	-0/+1
\| \| \| \| \| \|	Add it to Makefile.am. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Implement 'reset {set,map,element}' commands	Phil Sutter	2023-07-13	3	-4/+9
\| \| \| \| \| \| \| \| \| \| \|	All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	cli: Make cli_init() return to caller	Phil Sutter	2023-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid direct exit() calls as that leaves the caller-allocated nft_ctx object in place. Making sure it is freed helps with valgrind-analyses at least. To signal desired exit from CLI, introduce global cli_quit boolean and make all cli_exit() implementations also set cli_rc variable to the appropriate return code. The logic is to finish CLI only if cli_quit is true which asserts proper cleanup as it is set only by the respective cli_exit() function. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: avoid IPPROTO_MAX for array definitions	Florian Westphal	2023-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip header can only accomodate 8but value, but IPPROTO_MAX has been bumped due to uapi reasons to support MPTCP (262, which is used to toggle on multipath support in tcp). This results in: exthdr.c:349:11: warning: result of comparison of constant 263 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare] if (type < array_size(exthdr_protocols)) ~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ redude array sizes back to what can be used on-wire. Signed-off-by: Florian Westphal <fw@strlen.de>
*	ct timeout: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-06-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<empty ruleset> $ nft list ct timeout table t Error: No such file or directory list ct timeout table t ^ This is expected to list all 'ct timeout' objects. The failure is correct, the table 't' does not exist. But now lets add one: $ nft add table t $ nft list ct timeout table t Segmentation fault (core dumped) ... and thats not expected, nothing should be shown and nft should exit normally. Because of missing TIMEOUTS command enum, the backend thinks it should do an object lookup, but as frontend asked for 'list of objects' rather than 'show this object', handle.obj.name is NULL, which then results in this crash. Update the command enums so that backend knows what the frontend asked for. Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add json support for last statement	Pablo Neira Ayuso	2023-06-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This patch adds json support for the last statement, it works for me here. However, tests/py still displays a warning: any/last.t: WARNING: line 12: '{"nftables": [{"add": {"rule": {"family": "ip", "table": "test-ip4", "chain": "input", "expr": [{"last": {"used": 300000}}]}}}]}': '[{"last": {"used": 300000}}]' mismatches '[{"last": null}]' Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	exthdr: add boolean DCCP option matching	Jeremy Sowden	2023-06-01	3	-0/+45
\| \| \| \| \| \| \| \| \| \|	Iptables supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend exthdr expressions to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	mnl: support bpf id decode in nft list hooks	Florian Westphal	2023-05-22	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \|	This allows 'nft list hooks' to also display the bpf program id attached. Example: hook input { -0000000128 nf_hook_run_bpf id 6 .. Signed-off-by: Florian Westphal <fw@strlen.de>
*	datatype: add hint error handler	Pablo Neira Ayuso	2023-05-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user provides a symbol that cannot be parsed and the datatype provides an error handler, provide a hint through the misspell infrastructure. For instance: # cat test.nft table ip x { map y { typeof ip saddr : verdict elements = { 1.2.3.4 : filter_server1 } } } # nft -f test.nft test.nft:4:26-39: Error: Could not parse netfilter verdict; did you mean `jump filter_server1'? elements = { 1.2.3.4 : filter_server1 } ^^^^^^^^^^^^^^ While at it, normalize error to "Could not parse symbolic %s expression". Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	meta: introduce meta broute support	Sriram Yagnaraman	2023-04-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Can be used in bridge prerouting hook to divert a packet to the ip stack for routing. This is a replacement for "ebtables -t broute" functionality. Link: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20230224095251.11249-1-sriram.yagnaraman@est.tech/ Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: fix enum/integer mismatches	Florian Westphal	2023-04-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gcc 13 complains about type confusion: cache.c:1178:5: warning: conflicting types for 'nft_cache_update' due to enum/integer mismatch; have 'int(struct nft_ctx , unsigned int, struct list_head , const struct nft_cache_filter )' [-Wenum-int-mismatch] cache.h:74:5: note: previous declaration of 'nft_cache_update' with type 'int(struct nft_ctx , enum cmd_ops, struct list_head , const struct nft_cache_filter )' Same for: rule.c:1915:13: warning: conflicting types for 'obj_type_name' due to enum/integer mismatch; have 'const char (enum stmt_types)' [-Wenum-int-mismatch] 1915 \| const char obj_type_name(enum stmt_types type) \| ^~~~~~~~~~~~~ expression.c:1543:24: warning: conflicting types for 'expr_ops_by_type' due to enum/integer mismatch; have 'const struct expr_ops (uint32_t)' {aka 'const struct expr_ops (unsigned int)'} [-Wenum-int-mismatch] 1543 \| const struct expr_ops *expr_ops_by_type(uint32_t value) \| ^~~~~~~~~~~~~~~~ Convert to the stricter type (enum) where possible. Signed-off-by: Florian Westphal <fw@strlen.de>
*	mnl: set SO_SNDBUF before SO_SNDBUFFORCE	Pablo Neira Ayuso	2023-04-24	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set SO_SNDBUF before SO_SNDBUFFORCE: Unpriviledged user namespace does not have CAP_NET_ADMIN on the host (user_init_ns) namespace. SO_SNDBUF always succeeds in Linux, always try SO_SNDBUFFORCE after it. Moreover, suggest the user to bump socket limits if EMSGSIZE after having see EPERM previously, when calling SO_SNDBUFFORCE. Provide a hint to the user too: # nft -f test.nft netlink: Error: Could not process rule: Message too long Please, rise /proc/sys/net/core/wmem_max on the host namespace. Hint: 4194304 bytes Dave Pfike says: Prior to this patch, nft inside a systemd-nspawn container was failing to install my ruleset (which includes a large-ish map), with the error netlink: Error: Could not process rule: Message too long strace reveals: setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted) This is despite the nspawn process supposedly having CAP_NET_ADMIN. A web search reveals at least one other user having the same issue: https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/ Reported-by: Dave Pifke <dave@pifke.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: support shifts larger than the width of the left operand	Pablo Neira Ayuso	2023-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we want to left-shift a value of narrower type and assign the result to a variable of a wider type, we are constrained to only shifting up to the width of the narrower type. Thus: add rule t c meta mark set ip dscp << 2 works, but: add rule t c meta mark set ip dscp << 8 does not, even though the lvalue is large enough to accommodate the result. Upgrade the maximum length based on the statement datatype length, which is provided via context, if it is larger than expression lvalue. Update netlink_delinearize.c to handle the case where the length of a shift expression does not match that of its left-hand operand. Based on patch from Jeremy Sowden. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: fix a couple of typo's in comments	Jeremy Sowden	2023-03-12	1	-1/+1
\| \| \| \| \|	Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
*	cmd: move command functions to src/cmd.c	Pablo Neira Ayuso	2023-03-11	2	-6/+6
\| \| \| \| \| \|	Move several command functions to src/cmd.c to debloat src/rule.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add last statement	Pablo Neira Ayuso	2023-02-28	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: expand table command before evaluation	Pablo Neira Ayuso	2023-02-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nested syntax notation results in one single table command which includes all other objects. This differs from the flat notation where there is usually one command per object. This patch adds a previous step to the evaluation phase to expand the objects that are contained in the table into independent commands, so both notations have similar representations. Remove the code to evaluate the nested representation in the evaluation phase since commands are independently evaluated after the expansion. The commands are expanded after the set element collapse step, in case that there is a long list of singleton element commands to be added to the set, to shorten the command list iteration. This approach also avoids interference with the object cache that is populated in the evaluation, which might refer to objects coming in the existing command list that is being processed. There is still a post_expand phase to detach the elements from the set which could be consolidated by updating the evaluation step to handle the CMD_OBJ_SETELEMS command type. This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that contains rules") which broke rule addition/insertion by index because the expansion code after the evaluation messes up the cache. Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: use start condition with new destroy command	Pablo Neira Ayuso	2023-02-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	tests/py reports the following problem: any/ct.t: ERROR: line 116: add rule ip test-ip4 output ct event set new \| related \| destroy \| label: This rule should not have failed. any/ct.t: ERROR: line 117: add rule ip test-ip4 output ct event set new,related,destroy,label: This rule should not have failed. any/ct.t: ERROR: line 118: add rule ip test-ip4 output ct event set new,destroy: This rule should not have failed. Use start condition and update parser to handle 'destroy' keyword. Fixes: e1dfd5cc4c46 ("src: add support to command "destroy") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add support to command "destroy"	Fernando F. Mancera	2023-02-06	2	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"destroy" command performs a deletion as "delete" command but does not fail if the object does not exist. As there is no NLM_F_* flag for ignoring such error, it needs to be ignored directly on error handling. Example of use: # nft list ruleset table ip filter { chain output { } } # nft destroy table ip missingtable # echo $? 0 # nft list ruleset table ip filter { chain output { } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Implement 'reset rule' and 'reset rules' commands	Phil Sutter	2023-01-18	5	-1/+15
\| \| \| \| \| \| \| \|	Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>