nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	src: disentangle ICMP code types	Pablo Neira Ayuso	2024-04-04	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, ICMP{v4,v6,inet} code datatypes only describe those that are supported by the reject statement, but they can also be used for icmp code matching. Moreover, ICMP code types go hand-to-hand with ICMP types, that is, ICMP code symbols depend on the ICMP type. Thus, the output of: nft describe icmp_code look confusing because that only displays the values that are supported by the reject statement. Disentangle this by adding internal datatypes for the reject statement to handle the ICMP code symbol conversion to value as well as ruleset listing. The existing icmp_code, icmpv6_code and icmpx_code remain in place. For backward compatibility, a parser function is defined in case an existing ruleset relies on these symbols. As for the manpage, move existing ICMP code tables from the DATA TYPES section to the REJECT STATEMENT section, where this really belongs to. But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES section because that describe that this is an 8-bit integer field. After this patch: # nft describe icmp_code datatype icmp_code (icmp code) (basetype integer), 8 bits # nft describe icmpv6_code datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits # nft describe icmpx_code datatype icmpx_code (icmpx code) (basetype integer), 8 bits do not display the symbol table of the reject statement anymore. icmpx_code_type is not used anymore, but keep it in place for backward compatibility reasons. And update tests/shell accordingly. Fixes: 5fdd0b6a0600 ("nft: complete reject support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_delinearize: reverse cross-day meta hour range	Pablo Neira Ayuso	2024-03-20	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'") reverses the hour range in case that a cross-day range is used, eg. meta hour "03:00"-"14:00" counter accept which results in (Sidney, Australia AEDT time): meta hour != "14:00"-"03:00" counter accept kernel handles time in UTC, therefore, cross-day range may not be obvious according to local time. The ruleset listing above is not very intuitive to the reader depending on their timezone, therefore, complete netlink delinearize path to reverse the cross-day meta range. Update manpage to recommend to use a range expression when matching meta hour range. Recommend range expression for meta time and meta day too. Extend testcases/listing/meta_time to cover for this scenario. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: do not merge a set with a erroneous one	Florian Westphal	2024-03-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The included sample causes a crash because we attempt to range-merge a prefix expression with a symbolic expression. The first set is evaluated, the symbol expression evaluation fails and nft queues an error message ("Could not resolve hostname"). However, nft continues evaluation. nft then encounters the same set definition again and merges the new content with the preceeding one. But the first set structure is dodgy, it still contains the unresolved symbolic expression. That then makes nft crash (assert) in the set internals. There are various different incarnations of this issue, but the low level set processing code does not allow for any partially transformed expressions to still remain. Before: nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop BUG: invalid range expression type binop nft: src/expression.c:1479: range_expr_value_low: Assertion `0' failed. After: nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop invalid_range_expr_type_binop:4:18-25: Error: Could not resolve hostname: Name or service not known elements = { 1&.141.0.1 - 192.168.0.2} ^^^^^^^^ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: translate meter into dynamic set	Pablo Neira Ayuso	2024-03-12	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	129f9d153279 ("nft: migrate man page examples with `meter` directive to sets") already replaced meters by dynamic sets. This patch removes NFT_SET_ANONYMOUS flag from the implicit set that is instantiated via meter, so the listing shows a dynamic set instead which is the recommended approach these days. Therefore, a batch like this: add table t add chain t c add rule t c tcp dport 80 meter m size 128 { ip saddr timeout 1s limit rate 10/second } gets translated to a dynamic set: table ip t { set m { type ipv4_addr size 128 flags dynamic,timeout } chain c { tcp dport 80 update @m { ip saddr timeout 1s limit rate 10/second burst 5 packets } } } Check for NFT_SET_ANONYMOUS flag is also relaxed for list and flush meter commands: # nft list meter ip t m table ip t { set m { type ipv4_addr size 128 flags dynamic,timeout } } # nft flush meter ip t m As a side effect the legacy 'list meter' and 'flush meter' commands allow to flush a dynamic set to retain backward compatibility. This patch updates testcases/sets/0022type_selective_flush_0 and testcases/sets/0038meter_list_0 as well as the json output which now uses the dynamic set representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: permit use of host-endian constant values in set lookup keys	Pablo Neira Ayuso	2024-02-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFL found following crash: table ip filter { map ipsec_in { typeof ipsec in reqid . iif : verdict flags interval } chain INPUT { type filter hook input priority filter; policy drop; ipsec in reqid . 100 @ipsec_in } } Which yields: nft: evaluate.c:1213: expr_evaluate_unary: Assertion `!expr_is_constant(arg)' failed. All existing test cases with constant values use big endian values, but "iif" expects host endian values. As raw values were not supported before, concat byteorder conversion doesn't handle constants. Fix this: 1. Add constant handling so that the number is converted in-place, without unary expression. 2. Add the inverse handling on delinearization for non-interval set types. When dissecting the concat data soup, watch for integer constants where the datatype indicates host endian integer. Last, extend an existing test case with the afl input to cover in/output. A new test case is added to test linearization, delinearization and matching. Based on original patch from Florian Westphal, patch subject and description wrote by him. Fixes: b422b07ab2f9 ("src: permit use of constant values in set lookup keys") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: Describe rt symbol tables	Phil Sutter	2024-01-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a symbol_table_print() wrapper for the run-time populated rt_symbol_tables which formats output similar to expr_describe() and includes the data source. Since these tables reside in struct output_ctx there is no implicit connection between data type and therefore providing callbacks for relevant datat types which feed the data into said wrapper is a simpler solution than extending expr_describe() itself. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: do not allow to chain more than 16 binops	Florian Westphal	2023-12-22	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netlink_linearize.c has never supported more than 16 chained binops. Adding more is possible but overwrites the stack in netlink_gen_bitwise(). Add a recursion counter to catch this at eval stage. Its not enough to just abort once the counter hits NFT_MAX_EXPR_RECURSION. This is because there are valid test cases that exceed this. For example, evaluation of 1 \| 2 will merge the constans, so even if there are a dozen recursive eval calls this will not end up with large binop chain post-evaluation. v2: allow more than 16 binops iff the evaluation function did constant-merging. Signed-off-by: Florian Westphal <fw@strlen.de>
*	intervals: set_to_range can be static	Florian Westphal	2023-12-16	1	-1/+0
\| \| \| \|	Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: reject large raw payload and concat expressions	Florian Westphal	2023-12-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel will reject this too, but unfortunately nft may try to cram the data into the underlying libnftnl expr. This causes heap corruption or BUG: nld buffer overflow: want to copy 132, max 64 After: Error: Concatenation of size 544 exceeds maximum size of 512 udp length . @th,0,512 . @th,512,512 { 47-63 . 0xe373135363130 . 0x33131303735353203 } ^^^^^^^^^ resp. same warning for an over-sized raw expression. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: add and use nft_data_memcpy helper	Florian Westphal	2023-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a stack overflow somewhere in this code, we end up memcpy'ing a way too large expr into a fixed-size on-stack buffer. This is hard to diagnose, most of this code gets inlined so the crash happens later on return from alloc_nftnl_setelem. Condense the mempy into a helper and add a BUG so we can catch the overflow before it occurs. ->value is too small (4, should be 16), but for normal cases (well-formed data must fit into max reg space, i.e. 64 byte) the chain buffer that comes after value in the structure provides a cushion. In order to have the new BUG() not trigger on valid data, bump value to the correct size, this is userspace so the additional 60 bytes of stack usage is no concern. Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: reset statement length context before evaluating statement	Pablo Neira Ayuso	2023-12-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch consolidates ctx->stmt_len reset in stmt_evaluate() to avoid this problem. Note that stmt_evaluate_meta() and stmt_evaluate_ct() already reset it after the statement evaluation. Moreover, statement dependency can be generated while evaluating a meta and ct statement. Payload statement dependency already manually stashes this before calling stmt_evaluate(). Add a new stmt_dependency_evaluate() function to stash statement length context when evaluating a new statement dependency and use it for all of the existing statement dependencies. Florian also says: 'meta mark set vlan id map { 1 : 0x00000001, 4095 : 0x00004095 }' will crash. Reason is that the l2 dependency generated here is errounously expanded to a 32bit-one, so the evaluation path won't recognize this as a L2 dependency. Therefore, pctx->stacked_ll_count is 0 and __expr_evaluate_payload() crashes with a null deref when dereferencing pctx->stacked_ll[0]. nft-test.py gains a fugly hack to tolerate '!map typeof vlan id : meta mark'. For more generic support we should find something more acceptable, e.g. !map typeof( everything here is a key or data ) timeout ... tests/py update and assert(pctx->stacked_ll_count) by Florian Westphal. Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: remove xfree() and use plain free()	Thomas Haller	2023-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmalloc() (and similar x-functions) are used for allocation. They wrap malloc()/realloc() but will abort the program on ENOMEM. The meaning of xmalloc() is that it wraps malloc() but aborts on failure. I don't think x-functions should have the notion, that this were potentially a different memory allocator that must be paired with a particular xfree(). Even if the original intent was that the allocator is abstracted (and possibly not backed by standard malloc()/free()), then that doesn't seem a good idea. Nowadays libc allocators are pretty good, and we would need a very special use cases to switch to something else. In other words, it will never happen that xmalloc() is not backed by malloc(). Also there were a few places, where a xmalloc() was already "wrongly" paired with free() (for example, iface_cache_release(), exit_cookie(), nft_run_cmd_from_buffer()). Or note how pid2name() returns an allocated string from fscanf(), which needs to be freed with free() (and not xfree()). This requirement bubbles up the callers portid2name() and name_by_portid(). This case was actually handled correctly and the buffer was freed with free(). But it shows that mixing different allocators is cumbersome to get right. Of course, we don't actually have different allocators and whether to use free() or xfree() makes no different. The point is that xfree() serves no actual purpose except raising irrelevant questions about whether x-functions are correctly paired with xfree(). Note that xfree() also used to accept const pointers. It is bad to unconditionally for all deallocations. Instead prefer to use plain free(). To free a const pointer use free_const() which obviously wraps free, as indicated by the name. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add free_const() and use it instead of xfree()	Thomas Haller	2023-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost everywhere xmalloc() and friends is used instead of malloc(). This is almost everywhere paired with xfree(). xfree() has two problems. First, it brings the wrong notion that xmalloc() should be paired with xfree(), as if xmalloc() would not use the plain malloc() allocator. In practices, xfree() just wraps free(), and it wouldn't make sense any other way. xfree() should go away. This will be addressed in the next commit. The problem addressed by this commit is that xfree() accepts a const pointer. Paired with the practice of almost always using xfree() instead of free(), all our calls to xfree() cast away constness of the pointer, regardless whether that is necessary. Declaring a pointer as const should help us to catch wrong uses. If the xfree() function always casts aways const, the compiler doesn't help. There are many places that rightly cast away const during free. But not all of them. Add a free_const() macro, which is like free(), but accepts const pointers. We should always make an intentional choice whether to use free() or free_const(). Having a free_const() macro makes this very common choice clearer, instead of adding a (void*) cast at many places. Note that we now pair xmalloc() allocations with a free() call (instead of xfree(). That inconsistency will be resolved in the next commit. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	gmputil: add nft_gmp_free() to free strings from mpz_get_str()	Thomas Haller	2023-11-09	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mpz_get_str() (with NULL as first argument) will allocate a buffer using the allocator functions (mp_set_memory_functions()). We should free those buffers with the corresponding free function. Add nft_gmp_free() for that and use it. The name nft_gmp_free() is chosen because "mini-gmp.c" already has an internal define called gmp_free(). There wouldn't be a direct conflict, but using the same name is confusing. And maybe our own defines should have a clear nft prefix. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	build: no recursive-make for "include/**/Makefile.am"	Thomas Haller	2023-11-02	8	-70/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch from recursive-make to a single top-level Makefile. This is the first step, the following patches will continue this. Unlike meson's subdir() or C's #include, automake's SUBDIRS= does not include a Makefile. Instead, it calls `make -C $dir`. https://www.gnu.org/software/make/manual/html_node/Recursion.html https://www.gnu.org/software/automake/manual/html_node/Subdirectories.html See also, "Recursive Make Considered Harmful". https://accu.org/journals/overload/14/71/miller_2004/ This has several problems, which we an avoid with a single Makefile: - recursive-make is harder to maintain and understand as a whole. Recursive-make makes sense, when there are truly independent sub-projects. Which is not the case here. The project needs to be considered as a whole and not one directory at a time. When we add unit tests (which we should), those would reside in separate directories but have dependencies between directories. With a single Makefile, we see all at once. The build setup has an inherent complexity, and that complexity is not necessarily reduced by splitting it into more files. On the contrary it helps to have it all in once place, provided that it's sensibly structured, named and organized. - typing `make` prints irrelevant "Entering directory" messages. So much so, that at the end of the build, the terminal is filled with such messages and we have to scroll to see what even happened. - with recursive-make, during build we see: make[3]: Entering directory '.../nftables/src' CC meta.lo meta.c:13:2: error: #warning hello test [-Werror=cpp] 13 \| #warning hello test \| ^~~~~~~ With a single Makefile we get CC src/meta.lo src/meta.c:13:2: error: #warning hello test [-Werror=cpp] 13 \| #warning hello test \| ^~~~~~~ This shows the full filename -- assuming that the developer works from the top level directory. The full name is useful, for example to copy+paste into the terminal. - single Makefile is also faster: $ make && perf stat -r 200 -B make -j I measure 35msec vs. 80msec. - recursive-make limits parallel make. You have to craft the SUBDIRS= in the correct order. The dependencies between directories are limited, as make only sees "LDADD = $(top_builddir)/src/libnftables.la" and not the deeper dependencies for the library. - I presume, some people like recursive-make because of `make -C $subdir` to only rebuild one directory. Rebuilding the entire tree is already very fast, so this feature seems not relevant. Also, as dependency handling is limited, we might wrongly not rebuild a target. For example, make check touch src/meta.c make -C examples check does not rebuild "examples/nft-json-file". What we now can do with single Makefile (and better than before), is `make examples/nft-json-file`, which works as desired and rebuilds all dependencies. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	icmpv6: Allow matching target address in NS/NA, redirect and MLD	Nicolas Cavallari	2023-10-06	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was currently not possible to match the target address of a neighbor solicitation or neighbor advertisement against a dynamic set, unlike in IPv4. Since they are many ICMPv6 messages with an address at the same offset, allow filtering on the target address for all icmp types that have one. While at it, also allow matching the destination address of an ICMPv6 redirect. Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Signed-off-by: Florian Westphal <fw@strlen.de>
*	json: add missing map statement stub	Pablo Neira Ayuso	2023-09-28	1	-0/+1
\| \| \| \| \| \| \|	Add map statement stub to restore compilation without json support. Fixes: 27a2da23d508 ("netlink_linearize: skip set element expression in map statement key") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <string.h> in <nft.h>	Thomas Haller	2023-09-28	3	-3/+1
\| \| \| \| \| \| \| \|	<string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink_linearize: skip set element expression in map statement key	Pablo Neira Ayuso	2023-09-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fix is similar to 22d201010919 ("netlink_linearize: skip set element expression in set statement key") to fix map statement. netlink_gen_map_stmt() relies on the map key, that is expressed as a set element. Use the set element key instead to skip the set element wrap, otherwise get_register() abort execution: nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. This includes JSON support to make this feature complete and it updates tests/shell to cover for this support. Reported-by: Luci Stanescu <luci@cnix.ro> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	expression: cleanup expr_ops_by_type() and handle u32 input	Thomas Haller	2023-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make fewer assumptions about the underlying integer type of the enum. Instead, be clear about where we have an untrusted uint32_t from netlink and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make this clearer. Later we might make the enum as packed, when this starts to matter more. Also, only the code path expr_ops() wants strict validation and assert against valid enum values. Move the assertion out of __expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to duplicate the handling of EXPR_INVALID. We still need to duplicate the check against EXPR_MAX, to ensure that the uint32_t value can be cast to an enum value. [ Remove cast on EXPR_MAX. --pablo ] Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: return const pointer from datatype_get()	Thomas Haller	2023-09-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	"struct datatype" is for the most part immutable, and most callers deal with const pointers. That's why datatype_get() accepts a const pointer to increase the reference count (mutating the refcnt field). It should also return a const pointer. In fact, all callers are fine with that already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use "enum byteorder" instead of int in set_datatype_alloc()	Thomas Haller	2023-09-20	1	-1/+1
\| \| \| \| \| \| \|	Use the enum types as we have them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: fix missing definitions in <cache.h>/<headers.h>	Thomas Haller	2023-09-20	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	The headers should be self-contained so they can be included in any order. With exception of <nft.h>, which any internal header can rely on. Some fixes for <cache.h>/<headers.h>. In case of <cache.h>, forward declare some of the structs instead of including the headers. <headers.h> uses struct in6_addr. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_EVENTBIT slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Matching on ct event makes no sense since this is mostly used as statement to globally filter out ctnetlink events, but do not crash if it is used from concatenations. Add the missing slot in the datatype array so this does not crash. Fixes: 2595b9ad6840 ("ct: add conntrack event mask support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_LABEL slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, ct label with concatenations such as: table ip x { chain y { ct label . ct mark { 0x1 . 0x1 } } } crashes: ../include/datatype.h:196:11: runtime error: member access within null pointer of type 'const struct datatype' AddressSanitizer:DEADLYSIGNAL ================================================================= ==640948==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc970d3199b bp 0x7fffd1f20560 sp 0x7fffd1f20540 T0) ==640948==The signal is caused by a READ memory access. ==640948==Hint: address points to the zero page. sudo #0 0x7fc970d3199b in datatype_equal ../include/datatype.h:196 Fixes: 2fcce8b0677b ("ct: connlabel matching support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	libnftables: drop gmp_init() and mp_set_memory_functions()	Thomas Haller	2023-09-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting global handles for libgmp via mp_set_memory_functions() is very ugly. When we don't use mini-gmp, then potentially there are other users of the library in the same process, and every process fighting about the allocation functions is not gonna work. It also means, we must not reset the allocation functions after somebody already allocated GMP data with them. Which we cannot ensure, as we don't know what other parts of the process are doing. It's also unnecessary. The default allocation functions for gmp and mini-gmp already abort the process on allocation failure ([1], [2]), just like our xmalloc(). Just don't do this. [1] https://gmplib.org/repo/gmp/file/8225bdfc499f/memory.c#l37 [2] https://git.netfilter.org/nftables/tree/src/mini-gmp.c?id=6d19a902c1d77cb51b940b1ce65f31b1cad38b74#n286 Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: rename "dtype_clone()" to datatype_clone()	Thomas Haller	2023-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: simplify chain_alloc()	Pablo Neira Ayuso	2023-08-31	1	-1/+1
\| \| \| \| \| \| \|	Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	utils: call abort() after BUG() macro	Thomas Haller	2023-08-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, we get spurious warnings. The compiler should be aware that there is no return from BUG(). Call abort() there, which is marked as __attribute__ ((__noreturn__)). In file included from ./include/nftables.h:6, from ./include/rule.h:4, from src/payload.c:26: src/payload.c: In function 'icmp_dep_to_type': ./include/utils.h:39:34: error: this statement may fall through [-Werror=implicit-fallthrough=] 39 \| #define BUG(fmt, arg...) ({ fprintf(stderr, "BUG: " fmt, ##arg); assert(0); }) \| ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/payload.c:791:17: note: in expansion of macro 'BUG' 791 \| BUG("Invalid map for simple dependency"); \| ^~~ src/payload.c:792:9: note: here 792 \| case PROTO_ICMP_ECHO: return ICMP_ECHO; \| ^~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: drop "format" attribute from nft_gmp_print()	Thomas Haller	2023-08-29	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft_gmp_print() passes the format string and arguments to gmp_vfprintf(). Note that the format string is then interpreted by gmp, which also understand special specifiers like "%Zx". Note that with clang we get various compiler warnings: datatype.c:299:26: error: invalid conversion specifier 'Z' [-Werror,-Wformat-invalid-specifier] nft_gmp_print(octx, "0x%Zx [invalid type]", expr->value); ~^ gcc doesn't warn, because to gcc 'Z' is a deprecated alias for 'z' and because the 3rd argument of the attribute((format())) is zero (so gcc doesn't validate the arguments). But Z specifier in gmp expects a "mpz_t" value and not a size_t. It's really not the same thing. The correct solution is not to mark the function to accept a printf format string. Fixes: 2535ba7006f2 ('src: get rid of printf') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rework SNPRINTF_BUFFER_SIZE() and handle truncation	Thomas Haller	2023-08-29	1	-9/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, the macro asserts against truncation. This is despite the callers still checked for truncation and tried to handle it. Probably for good reason. With stmt_evaluate_log_prefix() it's not clear that the code ensures that truncation cannot happen, so we must not assert against it, but handle it. Also, - wrap the macro in "do { ... } while(0)" to make it more function-like. - evaluate macro arguments exactly once, to make it more function-like. - take pointers to the arguments that are being modified. - use assert() instead of abort(). - use size_t type for arguments related to the buffer size. - drop "size". It was mostly redundant to "offset". We can know everything we want based on "len" and "offset" alone. - "offset" previously was incremented before checking for truncation. So it would point somewhere past the buffer. This behavior does not seem useful. Instead, on truncation "len" will be zero (as before) and "offset" will point one past the buffer (one past the terminating NUL). Thereby, also fix a warning from clang: evaluate.c:4134:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size = 0; ^ meta.c:1006:9: error: variable 'size' set but not used [-Werror,-Wunused-but-set-variable] size_t size; ^ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <std{bool,int}.h> via <nft.h>	Thomas Haller	2023-08-25	7	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a minimum base that all our sources will end up needing. This is what <nft.h> provides. Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to implement anything, without having "bool" and "uint32_t" types available. Yes, this means the internal headers are not self-contained, with respect to what <nft.h> provides. This is the exception to the rule, and our internal headers should rely to have <nft.h> included for them. They should not include <nft.h> themselves, because <nft.h> needs always be included as first. So when an internal header would include <nft.h> it would be unnecessary, because the header is always included already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	configure: use AC_USE_SYSTEM_EXTENSIONS to get _GNU_SOURCE	Thomas Haller	2023-08-25	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let "configure" detect which features are available. Also, nftables is a Linux project, so portability beyond gcc/clang and glibc/musl is less relevant. And even if it were, then feature detection by "configure" would still be preferable. Use AC_USE_SYSTEM_EXTENSIONS ([1]). Available since autoconf 2.60, from 2006 ([2]). [1] https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Posix-Variants.html#index-AC_005fUSE_005fSYSTEM_005fEXTENSIONS-1046 [2] https://lists.gnu.org/archive/html/autoconf/2006-06/msg00111.html Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: don't define _GNU_SOURCE in public header	Thomas Haller	2023-08-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	_GNU_SOURCE is supposed to be defined as first thing, before including any libc headers. Defining it in the public header of nftables is wrong, because it would only (somewhat) work if the user includes the nftables header as first thing too. But that is not what users commonly would do, in particular with autotools projects, where users would include <config.h> first. It's also unnecessary. Nothing in "nftables/libnftables.h" itself requires _GNU_SOURCE. Drop it. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	5	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_JSON to enable JSON parsing	Thomas Haller	2023-08-24	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, the input is parsed using the nftables grammar. When setting NFT_CTX_OUTPUT_JSON flag, nftables will first try to parse the input as JSON before falling back to the nftables grammar. But NFT_CTX_OUTPUT_JSON flag also turns on JSON for the output. Add a flag NFT_CTX_INPUT_JSON which allows to treat only the input as JSON, but keep the output mode unchanged. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking	Thomas Haller	2023-08-24	3	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flags for nft_ctx	Thomas Haller	2023-08-24	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Similar to the existing output flags, add input flags. No flags are yet implemented, that will follow. One difference to nft_ctx_output_set_flags(), is that the setter for input flags returns the previously set flags. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ct expectation: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-07-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Just like "ct timeout", "ct expectation" is in need of the same fix, we get segfault on "nft list ct expectation table t", if table t exists. This is the exact same pattern as resolved for "ct timeout" in commit 1d2e22fc0521 ("ct timeout: fix 'list object x' vs. 'list objects in table' confusion"). Signed-off-by: Florian Westphal <fw@strlen.de>
*	include: missing dccpopt.h breaks make distcheck	Pablo Neira Ayuso	2023-07-14	1	-0/+1
\| \| \| \| \| \|	Add it to Makefile.am. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Implement 'reset {set,map,element}' commands	Phil Sutter	2023-07-13	3	-4/+9
\| \| \| \| \| \| \| \| \| \| \|	All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	cli: Make cli_init() return to caller	Phil Sutter	2023-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid direct exit() calls as that leaves the caller-allocated nft_ctx object in place. Making sure it is freed helps with valgrind-analyses at least. To signal desired exit from CLI, introduce global cli_quit boolean and make all cli_exit() implementations also set cli_rc variable to the appropriate return code. The logic is to finish CLI only if cli_quit is true which asserts proper cleanup as it is set only by the respective cli_exit() function. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: avoid IPPROTO_MAX for array definitions	Florian Westphal	2023-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ip header can only accomodate 8but value, but IPPROTO_MAX has been bumped due to uapi reasons to support MPTCP (262, which is used to toggle on multipath support in tcp). This results in: exthdr.c:349:11: warning: result of comparison of constant 263 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare] if (type < array_size(exthdr_protocols)) ~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ redude array sizes back to what can be used on-wire. Signed-off-by: Florian Westphal <fw@strlen.de>
*	ct timeout: fix 'list object x' vs. 'list objects in table' confusion	Florian Westphal	2023-06-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<empty ruleset> $ nft list ct timeout table t Error: No such file or directory list ct timeout table t ^ This is expected to list all 'ct timeout' objects. The failure is correct, the table 't' does not exist. But now lets add one: $ nft add table t $ nft list ct timeout table t Segmentation fault (core dumped) ... and thats not expected, nothing should be shown and nft should exit normally. Because of missing TIMEOUTS command enum, the backend thinks it should do an object lookup, but as frontend asked for 'list of objects' rather than 'show this object', handle.obj.name is NULL, which then results in this crash. Update the command enums so that backend knows what the frontend asked for. Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add json support for last statement	Pablo Neira Ayuso	2023-06-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This patch adds json support for the last statement, it works for me here. However, tests/py still displays a warning: any/last.t: WARNING: line 12: '{"nftables": [{"add": {"rule": {"family": "ip", "table": "test-ip4", "chain": "input", "expr": [{"last": {"used": 300000}}]}}}]}': '[{"last": {"used": 300000}}]' mismatches '[{"last": null}]' Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	exthdr: add boolean DCCP option matching	Jeremy Sowden	2023-06-01	3	-0/+45
\| \| \| \| \| \| \| \| \| \|	Iptables supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend exthdr expressions to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	mnl: support bpf id decode in nft list hooks	Florian Westphal	2023-05-22	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \|	This allows 'nft list hooks' to also display the bpf program id attached. Example: hook input { -0000000128 nf_hook_run_bpf id 6 .. Signed-off-by: Florian Westphal <fw@strlen.de>
*	datatype: add hint error handler	Pablo Neira Ayuso	2023-05-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user provides a symbol that cannot be parsed and the datatype provides an error handler, provide a hint through the misspell infrastructure. For instance: # cat test.nft table ip x { map y { typeof ip saddr : verdict elements = { 1.2.3.4 : filter_server1 } } } # nft -f test.nft test.nft:4:26-39: Error: Could not parse netfilter verdict; did you mean `jump filter_server1'? elements = { 1.2.3.4 : filter_server1 } ^^^^^^^^^^^^^^ While at it, normalize error to "Could not parse symbolic %s expression". Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>