nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	src: disentangle ICMP code types	Pablo Neira Ayuso	2024-04-04	1	-7/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, ICMP{v4,v6,inet} code datatypes only describe those that are supported by the reject statement, but they can also be used for icmp code matching. Moreover, ICMP code types go hand-to-hand with ICMP types, that is, ICMP code symbols depend on the ICMP type. Thus, the output of: nft describe icmp_code look confusing because that only displays the values that are supported by the reject statement. Disentangle this by adding internal datatypes for the reject statement to handle the ICMP code symbol conversion to value as well as ruleset listing. The existing icmp_code, icmpv6_code and icmpx_code remain in place. For backward compatibility, a parser function is defined in case an existing ruleset relies on these symbols. As for the manpage, move existing ICMP code tables from the DATA TYPES section to the REJECT STATEMENT section, where this really belongs to. But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES section because that describe that this is an 8-bit integer field. After this patch: # nft describe icmp_code datatype icmp_code (icmp code) (basetype integer), 8 bits # nft describe icmpv6_code datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits # nft describe icmpx_code datatype icmpx_code (icmpx code) (basetype integer), 8 bits do not display the symbol table of the reject statement anymore. icmpx_code_type is not used anymore, but keep it in place for backward compatibility reasons. And update tests/shell accordingly. Fixes: 5fdd0b6a0600 ("nft: complete reject support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use DTYPE_F_PREFIX only for IP address datatype	Pablo Neira Ayuso	2024-03-21	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DTYPE_F_PREFIX flag provides a hint to the netlink delinearize path to use prefix notation. It seems use of prefix notation in meta mark causes confusion, users expect to see prefix in the listing only in IP address datatypes. Untoggle this flag so (more lengthy) binop output such as: meta mark & 0xffffff00 == 0xffffff00 is used instead. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1739 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: display 0s time datatype	Pablo Neira Ayuso	2024-02-07	1	-5/+19
\| \| \| \|	Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: do not assert when value exceeds expected width	Florian Westphal	2024-01-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inputs: ip protocol . th dport { tcp / 22, }' or th dport . ip protocol { tcp / 22, }' are not rejected at this time. 'list ruleset' yields: ip protocol & nft: src/gmputil.c:77: mpz_get_uint8: Assertion `cnt <= 1' failed. or th dport & nft: src/gmputil.c:87: mpz_get_be16: Assertion `cnt <= 1' failed. While this should be caught at input too, the print path should be more robust, e.g. when there are direct nfnetlink users. After this patch, the print functions fall back to 'integer_type_print' which can handle large numbers too. Note that the output printed this way cannot be read back by nft; it will dump something like: tcp dport & 18446739675663040512 . ip protocol 0 . 0 but thats better than assert(). v2: same problem exists for service too. Signed-off-by: Florian Westphal <fw@strlen.de>
*	datatype: Describe rt symbol tables	Phil Sutter	2024-01-02	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a symbol_table_print() wrapper for the run-time populated rt_symbol_tables which formats output similar to expr_describe() and includes the data source. Since these tables reside in struct output_ctx there is no implicit connection between data type and therefore providing callbacks for relevant datat types which feed the data into said wrapper is a simpler solution than extending expr_describe() itself. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	datatype: Initialize rt_symbol_tables' base field	Phil Sutter	2024-01-02	1	-4/+8
\| \| \| \| \| \| \| \| \|	It is unconditionally accessed in symbol_table_print() so make sure it is initialized to either BASE_DECIMAL (arbitrary) for empty or non-existent source files or a proper value depending on entry number format. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	datatype: rt_symbol_table_init() to search for iproute2 configs	Phil Sutter	2024-01-02	1	-4/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is an ongoing effort among various distributions to tidy up in /etc. The idea is to reduce contents to just what the admin manually inserted to customize the system, anything else shall move out to /usr (or so). The various files in /etc/iproute2 fall in that category as they are seldomly modified. The crux is though that iproute2 project seems not quite sure yet where the files should go. While v6.6.0 installs them into /usr/lib/iproute2, current mast^Wmain branch uses /usr/share/iproute2. Assume this is going to stay as /(usr/)lib does not seem right for such files. Note that rt_symbol_table_init() is not just used for iproute2-maintained configs but also for connlabel.conf - so retain the old behaviour when passed an absolute path. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: remove xfree() and use plain free()	Thomas Haller	2023-11-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmalloc() (and similar x-functions) are used for allocation. They wrap malloc()/realloc() but will abort the program on ENOMEM. The meaning of xmalloc() is that it wraps malloc() but aborts on failure. I don't think x-functions should have the notion, that this were potentially a different memory allocator that must be paired with a particular xfree(). Even if the original intent was that the allocator is abstracted (and possibly not backed by standard malloc()/free()), then that doesn't seem a good idea. Nowadays libc allocators are pretty good, and we would need a very special use cases to switch to something else. In other words, it will never happen that xmalloc() is not backed by malloc(). Also there were a few places, where a xmalloc() was already "wrongly" paired with free() (for example, iface_cache_release(), exit_cookie(), nft_run_cmd_from_buffer()). Or note how pid2name() returns an allocated string from fscanf(), which needs to be freed with free() (and not xfree()). This requirement bubbles up the callers portid2name() and name_by_portid(). This case was actually handled correctly and the buffer was freed with free(). But it shows that mixing different allocators is cumbersome to get right. Of course, we don't actually have different allocators and whether to use free() or xfree() makes no different. The point is that xfree() serves no actual purpose except raising irrelevant questions about whether x-functions are correctly paired with xfree(). Note that xfree() also used to accept const pointers. It is bad to unconditionally for all deallocations. Instead prefer to use plain free(). To free a const pointer use free_const() which obviously wraps free, as indicated by the name. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add free_const() and use it instead of xfree()	Thomas Haller	2023-11-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost everywhere xmalloc() and friends is used instead of malloc(). This is almost everywhere paired with xfree(). xfree() has two problems. First, it brings the wrong notion that xmalloc() should be paired with xfree(), as if xmalloc() would not use the plain malloc() allocator. In practices, xfree() just wraps free(), and it wouldn't make sense any other way. xfree() should go away. This will be addressed in the next commit. The problem addressed by this commit is that xfree() accepts a const pointer. Paired with the practice of almost always using xfree() instead of free(), all our calls to xfree() cast away constness of the pointer, regardless whether that is necessary. Declaring a pointer as const should help us to catch wrong uses. If the xfree() function always casts aways const, the compiler doesn't help. There are many places that rightly cast away const during free. But not all of them. Add a free_const() macro, which is like free(), but accepts const pointers. We should always make an intentional choice whether to use free() or free_const(). Having a free_const() macro makes this very common choice clearer, instead of adding a (void*) cast at many places. Note that we now pair xmalloc() allocations with a free() call (instead of xfree(). That inconsistency will be resolved in the next commit. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: don't return a const string from cgroupv2_get_path()	Thomas Haller	2023-11-09	1	-3/+3
\| \| \| \| \| \| \| \|	The caller is supposed to free the allocated string. Return a non-const string to make that clearer. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use xmalloc() for allocating datatype in datatype_clone()	Thomas Haller	2023-09-28	1	-1/+1
\| \| \| \| \| \| \| \|	The returned memory will be initialized. No need to zero it first. Use xmalloc() instead of xzalloc(). Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <string.h> in <nft.h>	Thomas Haller	2023-09-28	1	-1/+0
\| \| \| \| \| \| \| \|	<string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: return const pointer from datatype_get()	Thomas Haller	2023-09-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	"struct datatype" is for the most part immutable, and most callers deal with const pointers. That's why datatype_get() accepts a const pointer to increase the reference count (mutating the refcnt field). It should also return a const pointer. In fact, all callers are fine with that already. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: use "enum byteorder" instead of int in set_datatype_alloc()	Thomas Haller	2023-09-20	1	-1/+1
\| \| \| \| \| \| \|	Use the enum types as we have them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_EVENTBIT slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Matching on ct event makes no sense since this is mostly used as statement to globally filter out ctnetlink events, but do not crash if it is used from concatenations. Add the missing slot in the datatype array so this does not crash. Fixes: 2595b9ad6840 ("ct: add conntrack event mask support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: initialize TYPE_CT_LABEL slot in datatype array	Pablo Neira Ayuso	2023-09-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, ct label with concatenations such as: table ip x { chain y { ct label . ct mark { 0x1 . 0x1 } } } crashes: ../include/datatype.h:196:11: runtime error: member access within null pointer of type 'const struct datatype' AddressSanitizer:DEADLYSIGNAL ================================================================= ==640948==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc970d3199b bp 0x7fffd1f20560 sp 0x7fffd1f20540 T0) ==640948==The signal is caused by a READ memory access. ==640948==Hint: address points to the zero page. sudo #0 0x7fc970d3199b in datatype_equal ../include/datatype.h:196 Fixes: 2fcce8b0677b ("ct: connlabel matching support") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	1	-6/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: rename "dtype_clone()" to datatype_clone()	Thomas Haller	2023-09-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	The struct is called "datatype" and related functions have the fitting "datatype_" prefix. Rename. Also rename the internal "dtype_alloc()" to "datatype_alloc()". This is a follow up to commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'), which started adding "datatype_*()" functions. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	datatype: avoid cast-align warning with struct sockaddr result from ↵	Thomas Haller	2023-08-29	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() With CC=clang we get datatype.c:625:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] addr = ((struct sockaddr_in )ai->ai_addr)->sin_addr; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ datatype.c:690:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in6 ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] addr = ((struct sockaddr_in6 )ai->ai_addr)->sin6_addr; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ datatype.c:826:11: error: cast from 'struct sockaddr ' to 'struct sockaddr_in ' increases required alignment from 2 to 4 [-Werror,-Wcast-align] port = ((struct sockaddr_in )ai->ai_addr)->sin_port; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by casting to (void) first. Also, add an assertion that the type is as expected. For inet_service_type_parse(), differentiate between AF_INET and AF_INET6. It might not have been a problem in practice, because the struct offsets of sin_port/sin6_port are identical. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add input flag NFT_CTX_INPUT_NO_DNS to avoid blocking	Thomas Haller	2023-08-24	1	-28/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getaddrinfo() blocks while trying to resolve the name. Blocking the caller of the library is in many cases undesirable. Also, while reconfiguring the firewall, it's not clear that resolving names via the network will work or makes sense. Add a new input flag NFT_CTX_INPUT_NO_DNS to opt-out from getaddrinfo() and only accept plain IP addresses. We could also use AI_NUMERICHOST with getaddrinfo() instead of inet_pton(). By parsing via inet_pton(), we are better aware of what we expect and can generate a better error message in case of failure. Signed-off-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	nftutils: add and use wrappers for getprotoby{name,number}_r(), ↵	Thomas Haller	2023-08-20	1	-16/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getservbyport_r() We should aim to use the thread-safe variants of getprotoby{name,number} and getservbyport(). However, they may not be available with other libc, so it requires a configure check. As that is cumbersome, add wrappers that do that at one place. These wrappers are thread-safe, if libc provides the reentrant versions. Use them. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: add hint error handler	Pablo Neira Ayuso	2023-05-11	1	-2/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If user provides a symbol that cannot be parsed and the datatype provides an error handler, provide a hint through the misspell infrastructure. For instance: # cat test.nft table ip x { map y { typeof ip saddr : verdict elements = { 1.2.3.4 : filter_server1 } } } # nft -f test.nft test.nft:4:26-39: Error: Could not parse netfilter verdict; did you mean `jump filter_server1'? elements = { 1.2.3.4 : filter_server1 } ^^^^^^^^^^^^^^ While at it, normalize error to "Could not parse symbolic %s expression". Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: misspell support with symbol table parser for error reporting	Pablo Neira Ayuso	2023-05-11	1	-2/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some datatypes provide a symbol table that is parsed as an integer. Improve error reporting by using the misspell infrastructure, to provide a hint to the user, whenever possible. If base datatype, usually the integer datatype, fails to parse the symbol, then try a fuzzy match on the symbol table to provide a hint in case the user has mistype it. For instance: test.nft:3:11-14: Error: Could not parse Differentiated Services Code Point expression; did you you mean `cs0`? ip dscp ccs0 ^^^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Don't parse string as verdict in map	Xiao Liang	2022-08-19	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In verdict map, string values are accidentally treated as verdicts. For example: table t { map foo { type ipv4_addr : verdict elements = { 192.168.0.1 : bar } } chain output { type filter hook output priority mangle; ip daddr vmap @foo } } Though "bar" is not a valid verdict (should be "jump bar" or something), the string is taken as the element value. Then NFTA_DATA_VALUE is sent to the kernel instead of NFTA_DATA_VERDICT. This would be rejected by recent kernels. On older ones (e.g. v5.4.x) that don't validate the type, a warning can be seen when the rule is hit, because of the corrupted verdict value: [5120263.467627] WARNING: CPU: 12 PID: 303303 at net/netfilter/nf_tables_core.c:229 nft_do_chain+0x394/0x500 [nf_tables] Indeed, we don't parse verdicts during evaluation, but only chain names, which is of type string rather than verdict. For example, "jump $var" is a verdict while "$var" is a string. Fixes: c64457cff967 ("src: Allow goto and jump to a variable") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: allow to use typeof of raw expressions in set declaration	Pablo Neira Ayuso	2022-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set length and byteorder. Add missing information to the set userdata area for raw payload expressions which allows to rebuild the set typeof from the listing path. A few examples: - With anonymous sets: nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e } - With named sets: table x { set y { typeof ip saddr . @ih,32,32 elements = { 1.1.1.1 . 0x14 } } } Incremental updates are also supported, eg. nft add element x y { 3.3.3.3 . 0x28 } expr_evaluate_concat() is used to evaluate both set key definitions and set key values, using two different function might help to simplify this code in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: Fix size of time_type	Phil Sutter	2021-11-30	1	-2/+4
\| \| \| \| \| \| \| \| \|	Used by 'ct expiration', time_type is supposed to be 32bits. Passing a 64bits variable to constant_expr_alloc() causes the value to be always zero on Big Endian. Fixes: 0974fa84f162a ("datatype: seperate time parsing/printing from time_type") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	datatype: add xinteger_type alias to print in hexadecimal	Pablo Neira Ayuso	2021-11-03	1	-0/+16
\| \| \| \| \| \| \| \| \|	Add an alias of the integer type to print raw payload expressions in hexadecimal. Update tests/py. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: time_print() ignores -T	Pablo Neira Ayuso	2021-09-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Honor NFT_CTX_OUTPUT_NUMERIC_TIME. # nft list ruleset table ip x { set y { type ipv4_addr flags timeout elements = { 1.1.1.1 timeout 5m expires 1m49s40ms } } } # sudo nft -T list ruleset table ip x { set y { type ipv4_addr flags timeout elements = { 1.1.1.1 timeout 300s expires 108s } } } Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1561 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: skip cgroupv2 rootfs in listing	Pablo Neira Ayuso	2021-05-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cgroupv2 path is expressed from the /sys/fs/cgroup folder, update listing to skip it. # nft add rule x y socket cgroupv2 level 1 "user.slice" counter # nft list ruleset table ip x { chain y { type filter hook input priority filter; policy accept; socket cgroupv2 level 1 "user.slice" counter } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: use PRIu64 format	Pablo Neira Ayuso	2021-05-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the following compilation warnings on x86_32. datatype.c: In function ‘cgroupv2_type_print’: datatype.c:1387:22: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Wformat=] nft_print(octx, "%lu", id); ~~^ ~~ %llu meta.c: In function ‘date_type_print’: meta.c:411:21: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Wformat=] nft_print(octx, "%lu", tstamp); ~~^ ~~~~~~ %llu Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add cgroupsv2 support	Pablo Neira Ayuso	2021-05-03	1	-0/+91
\| \| \| \| \| \|	Add support for matching on the cgroups version 2. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add datatype->describe()	Pablo Neira Ayuso	2021-03-25	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \|	As an alternative to print the datatype values when no symbol table is available. Use it to print protocols available via getprotobynumber() which actually refers to /etc/protocols. Not very efficient, getprotobynumber() causes a series of open()/close() calls on /etc/protocols, but this is called from a non-critical path. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1503 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: convert chain name from gmp value to string	Pablo Neira Ayuso	2020-07-15	1	-8/+13
\| \| \| \| \| \| \|	Add expr_chain_export() helper function to convert the chain name that is stored in a gmp value variable to string. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: add frag-needed (ipv4) to reject options	Michael Braun	2020-05-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables to send icmp frag-needed messages using reject target. I have a bridge with connects an gretap tunnel with some ethernet lan. On the gretap device I use ignore-df to avoid packets being lost without icmp reject to the sender of the bridged packet. Still I want to avoid packet fragmentation with the gretap packets. So I though about adding an nftables rule like this: nft insert rule bridge filter FORWARD \ ip protocol tcp \ ip length > 1400 \ ip frag-off & 0x4000 != 0 \ reject with icmp type frag-needed This would reject all tcp packets with ip dont-fragment bit set that are bigger than some threshold (here 1400 bytes). The sender would then receive ICMP unreachable - fragmentation needed and reduce its packet size (as defined with PMTU). [ pablo: update tests/py ] Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix double-free resulting in use-after-free in datatype_free	Michael Braun	2020-05-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nft list table bridge t table bridge t { set s4 { typeof ip saddr . ip daddr elements = { 1.0.0.1 . 2.0.0.2 } } } ================================================================= ==24334==ERROR: AddressSanitizer: heap-use-after-free on address 0x6080000000a8 at pc 0x7fe0e67df0ad bp 0x7ffff83e88c0 sp 0x7ffff83e88b8 READ of size 4 at 0x6080000000a8 thread T0 #0 0x7fe0e67df0ac in datatype_free nftables/src/datatype.c:1110 #1 0x7fe0e67e2092 in expr_free nftables/src/expression.c:89 #2 0x7fe0e67a855e in set_free nftables/src/rule.c:359 #3 0x7fe0e67b2f3e in table_free nftables/src/rule.c:1263 #4 0x7fe0e67a70ce in __cache_flush nftables/src/rule.c:299 #5 0x7fe0e67a71c7 in cache_release nftables/src/rule.c:305 #6 0x7fe0e68dbfa9 in nft_ctx_free nftables/src/libnftables.c:292 #7 0x55f00fbe0051 in main nftables/src/main.c:469 #8 0x7fe0e553309a in __libc_start_main ../csu/libc-start.c:308 #9 0x55f00fbdd429 in _start (nftables/src/.libs/nft+0x9429) 0x6080000000a8 is located 8 bytes inside of 96-byte region [0x6080000000a0,0x608000000100) freed by thread T0 here: #0 0x7fe0e6e70fb0 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe8fb0) #1 0x7fe0e68b8122 in xfree nftables/src/utils.c:29 #2 0x7fe0e67df2e5 in datatype_free nftables/src/datatype.c:1117 #3 0x7fe0e67e2092 in expr_free nftables/src/expression.c:89 #4 0x7fe0e67a83fe in set_free nftables/src/rule.c:356 #5 0x7fe0e67b2f3e in table_free nftables/src/rule.c:1263 #6 0x7fe0e67a70ce in __cache_flush nftables/src/rule.c:299 #7 0x7fe0e67a71c7 in cache_release nftables/src/rule.c:305 #8 0x7fe0e68dbfa9 in nft_ctx_free nftables/src/libnftables.c:292 #9 0x55f00fbe0051 in main nftables/src/main.c:469 #10 0x7fe0e553309a in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7fe0e6e71330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7fe0e68b813d in xmalloc nftables/src/utils.c:36 #2 0x7fe0e68b8296 in xzalloc nftables/src/utils.c:65 #3 0x7fe0e67de7d5 in dtype_alloc nftables/src/datatype.c:1065 #4 0x7fe0e67df862 in concat_type_alloc nftables/src/datatype.c:1146 #5 0x7fe0e67ea852 in concat_expr_parse_udata nftables/src/expression.c:954 #6 0x7fe0e685dc94 in set_make_key nftables/src/netlink.c:718 #7 0x7fe0e685e177 in netlink_delinearize_set nftables/src/netlink.c:770 #8 0x7fe0e685f667 in list_set_cb nftables/src/netlink.c:895 #9 0x7fe0e4f95a03 in nftnl_set_list_foreach src/set.c:904 SUMMARY: AddressSanitizer: heap-use-after-free nftables/src/datatype.c:1110 in datatype_free Shadow bytes around the buggy address: 0x0c107fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c107fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c107fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c107fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c107fff8000: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c107fff8010: fa fa fa fa fd[fd]fd fd fd fd fd fd fd fd fd fd 0x0c107fff8020: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd 0x0c107fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c107fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c107fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c107fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==24334==ABORTING Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: compute mnemonic port name much easier	Jan Engelhardt	2020-02-07	1	-27/+6
\| \| \| \| \|	Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: store expr, not dtype to track data in sets	Florian Westphal	2019-12-16	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be needed once we add support for the 'typeof' keyword to handle maps that could e.g. store 'ct helper' "type" values. Instead of: set foo { type ipv4_addr . mark; this would allow set foo { typeof(ip saddr) . typeof(ct mark); (exact syntax TBD). This would be needed to allow sets that store variable-sized data types (string, integer and the like) that can't be used at at the moment. Adding special data types for everything is problematic due to the large amount of different types needed. For anonymous sets, e.g. "string" can be used because the needed size can be inferred from the statement, e.g. 'osf name { "Windows", "Linux }', but in case of named sets that won't work because 'type string' lacks the context needed to derive the size information. With 'typeof(osf name)' the context is there, but at the moment it won't help because the expression is discarded instantly and only the data type is retained. Signed-off-by: Florian Westphal <fw@strlen.de>
*	datatype: display description for header field < 8 bits	Pablo Neira Ayuso	2019-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	# nft describe ip dscp payload expression, datatype dscp (Differentiated Services Code Point) (basetype integer), 6 bits pre-defined symbolic constants (in hexadecimal): nft: datatype.c:209: switch_byteorder: Assertion `len > 0' failed. Aborted Fixes: c89a0801d077 ("datatype: Display pre-defined inet_service values in host byte order") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	meta: Introduce new conditions 'time', 'day' and 'hour'	Ander Juaristi	2019-09-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These keywords introduce new checks for a timestamp, an absolute date (which is converted to a timestamp), an hour in the day (which is converted to the number of seconds since midnight) and a day of week. When converting an ISO date (eg. 2019-06-06 17:00) to a timestamp, we need to substract it the GMT difference in seconds, that is, the value of the 'tm_gmtoff' field in the tm structure. This is because the kernel doesn't know about time zones. And hence the kernel manages different timestamps than those that are advertised in userspace when running, for instance, date +%s. The same conversion needs to be done when converting hours (e.g 17:00) to seconds since midnight as well. The result needs to be computed modulo 86400 in case GMT offset (difference in seconds from UTC) is negative. We also introduce a new command line option (-t, --seconds) to show the actual timestamps when printing the values, rather than the ISO dates, or the hour. Some usage examples: time < "2019-06-06 17:00" drop; time < "2019-06-06 17:20:20" drop; time < 12341234 drop; day "Saturday" drop; day 6 drop; hour >= 17:00 drop; hour >= "17:00:01" drop; hour >= 63000 drop; We need to convert an ISO date to a timestamp without taking into account the time zone offset, since comparison will be done in kernel space and there is no time zone information there. Overwriting TZ is portable, but will cause problems when parsing a ruleset that has 'time' and 'hour' rules. Parsing an 'hour' type must not do time zone conversion, but that will be automatically done if TZ has been overwritten to UTC. Hence, we use timegm() to parse the 'time' type, even though it's not portable. Overwriting TZ seems to be a much worse solution. Finally, be aware that timestamps are converted to nanoseconds when transferring to the kernel (as comparison is done with nanosecond precision), and back to seconds when retrieving them for printing. We swap left and right values in a range to properly handle cross-day hour ranges (e.g. 23:15-03:22). Signed-off-by: Ander Juaristi <a@juaristi.eus> Reviewed-by: Florian Westphal <fw@strlen.de>
*	src: fix jumps on bigendian arches	Florian Westphal	2019-08-14	1	-9/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	table bla { chain foo { } chain bar { jump foo } } } Fails to restore on big-endian platforms: jump.nft:5:2-9: Error: Could not process rule: No such file or directory jump foo nft passes a 0-length name to the kernel. This is because when we export the value (the string), we provide the size of the destination buffer. In earlier versions, the parser allocated the name with the same fixed size and all was fine. After the fix, the export places the name in the wrong location in the destination buffer. This makes tests/shell/testcases/chains/0001jumps_0 work on s390x. v2: convert one error check to a BUG(), it should not happen unless kernel abi is broken. Fixes: 142350f154c78 ("src: invalid read when importing chain name") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow variable in chain policy	Fernando Fernandez Mancera	2019-08-08	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to use variables in chain policy definition, e.g. define default_policy = "accept" add table ip foo add chain ip foo bar {type filter hook input priority filter; policy $default_policy} Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow variables in the chain priority specification	Fernando Fernandez Mancera	2019-08-08	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to use variables in chain priority definitions, e.g. define prio = filter define prionum = 10 define prioffset = "filter - 150" add table ip foo add chain ip foo bar { type filter hook input priority $prio; } add chain ip foo ber { type filter hook input priority $prionum; } add chain ip foo bor { type filter hook input priority $prioffset; } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove global symbol_table	Pablo Neira Ayuso	2019-08-08	1	-9/+7
\| \| \| \| \| \| \| \| \|	Store symbol tables in context object instead. Use the nft_ctx object to store the dynamic symbol table. Pass it on to the parse_ctx object so this can be accessed from the parse routines. This dynamic symbol table is also accesible from the output_ctx object for print routines. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add parse_ctx object	Pablo Neira Ayuso	2019-08-08	1	-17/+29
\| \| \| \| \| \| \| \|	This object stores the dynamic symbol tables that are loaded from files. Pass this object to datatype parse functions, although this new parameter is not used yet, this is just a preparation patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix print of raw numerical symbol values	Florian Westphal	2019-06-17	1	-11/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The two rules: arp operation 1-2 accept arp operation 256-512 accept are both shown as 256-512: chain in_public { arp operation 256-512 accept arp operation 256-512 accept meta mark "1" tcp flags 2,4 } This is because range expression enforces numeric output, yet nft_print doesn't respect byte order. Behave as if we had no symbol in the first place and call the base type print function instead. This means we now respect format specifier as well: chain in_public { arp operation 1-2 accept arp operation 256-512 accept meta mark 0x00000001 tcp flags 0x2,0x4 } Without fix, added test case will fail: 'add rule arp test-arp input arp operation 1-2': 'arp operation 1-2' mismatches 'arp operation 256-512' v2: in case of -n, also elide quotation marks, just as if we would not have found a symbolic name. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: double datatype_free() with dynamic integer datatypes	Pablo Neira Ayuso	2019-06-14	1	-5/+0
\| \| \| \| \| \|	datatype_set() already deals with this case, remove this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: dtype_clone() should clone flags too	Pablo Neira Ayuso	2019-06-13	1	-1/+1
\| \| \| \| \| \|	Clone original flags too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add reference counter for dynamic datatypes	Pablo Neira Ayuso	2019-06-13	1	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two datatypes are using runtime datatype allocation: * Concatenations. * Integer, that require byteorder adjustment. From the evaluation / postprocess step, transformations are common, hence expressions may end up fetching (infering) datatypes from an existing one. This patch adds a reference counter to release the dynamic datatype object when it is shared. The API includes the following helper functions: * datatype_set(expr, datatype), to assign a datatype to an expression. This helper already deals with reference counting for dynamic datatypes. This also drops the reference counter of any previous datatype (to deal with the datatype replacement case). * datatype_get(datatype) bumps the reference counter. This function also deals with nul-pointers, that occurs when the datatype is unset. * datatype_free() drops the reference counter, and it also releases the datatype if there are not more clients of it. Rule of thumb is: The reference counter of any newly allocated datatype is set to zero. This patch also updates every spot to use datatype_set() for non-dynamic datatypes, for consistency. In this case, the helper just makes an simple assignment. Note that expr_alloc() has been updated to call datatype_get() on the datatype that is assigned to this new expression. Moreover, expr_free() calls datatype_free(). This fixes valgrind reports like this one: ==28352== 1,350 (440 direct, 910 indirect) bytes in 5 blocks are definitely lost in loss recor 3 of 3 ==28352== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==28352== by 0x4E79558: xmalloc (utils.c:36) ==28352== by 0x4E7963D: xzalloc (utils.c:65) ==28352== by 0x4E6029B: dtype_alloc (datatype.c:1073) ==28352== by 0x4E6029B: concat_type_alloc (datatype.c:1127) ==28352== by 0x4E6D3B3: netlink_delinearize_set (netlink.c:578) ==28352== by 0x4E6D68E: list_set_cb (netlink.c:648) ==28352== by 0x5D74023: nftnl_set_list_foreach (set.c:780) ==28352== by 0x4E6D6F3: netlink_list_sets (netlink.c:669) ==28352== by 0x4E5A7A3: cache_init_objects (rule.c:159) ==28352== by 0x4E5A7A3: cache_init (rule.c:216) ==28352== by 0x4E5A7A3: cache_update (rule.c:266) ==28352== by 0x4E7E0EE: nft_evaluate (libnftables.c:388) ==28352== by 0x4E7EADD: nft_run_cmd_from_filename (libnftables.c:479) ==28352== by 0x109A53: main (main.c:310) This patch also removes the DTYPE_F_CLONE flag which is broken and not needed anymore since proper reference counting is in place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>