nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	netlink: allow typeof keywords with objref maps during listing	Florian Westphal	2024-03-01	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, typeof meta l4proto . ip saddr . tcp sport : limit ... is shown as type inet_proto . ipv4_addr . inet_service : limit The "data" element is a value (the object type number). It doesn't support userinfo data. There is no reason to add it, the value is the object type number that the object-reference map stores. So, if we have an objref map, DO NOT discard the key part, as we do for normal maps. For normal maps, we support either typeof notation, i.e.: typeof meta l4proto . ip saddr . tcp sport : ip saddr or the data type version: type inet_proto . ipv4_addr . inet_service : ipv4_addr ... but not a mix, a hyptothetical typeof meta l4proto . ip saddr . tcp sport : ipv4_addr ... does not work. If nft finds no udata attached to the data element, for normal map case, it has to fall back to the "type" form. But for objref maps this is expected, udata for key but not for data. Hence, for objref case, keep the typeof part if its valid. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: fix stack overflow due to erroneous rounding	Florian Westphal	2023-12-20	1	-3/+8
\| \| \| \| \| \| \|	Byteorder switch in this function may undersize the conversion buffer by one byte, this needs to use div_round_up(). Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: fix stack buffer overflow with sub-reg sized prefixes	Florian Westphal	2023-12-15	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The calculation of the dynamic on-stack array is incorrect, the scratch space can be too low which gives stack corruption: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffdb454f064.. #1 0x7fabe92aaac4 in __mpz_export_data src/gmputil.c:108 #2 0x7fabe92d71b1 in netlink_export_pad src/netlink.c:251 #3 0x7fabe92d91d8 in netlink_gen_prefix src/netlink.c:476 div_round_up() cannot be used here, it fails to account for register padding. A 16 bit prefix will need 2 registers (start, end -- 8 bytes in total). Remove the dynamic sizing and add an assertion in case upperlayer ever passes invalid expr sizes down to us. After this fix, the combination is rejected by the kernel because of the maps' wrong data size, before the fix userspace may crash before. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: add and use nft_data_memcpy helper	Florian Westphal	2023-12-12	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a stack overflow somewhere in this code, we end up memcpy'ing a way too large expr into a fixed-size on-stack buffer. This is hard to diagnose, most of this code gets inlined so the crash happens later on return from alloc_nftnl_setelem. Condense the mempy into a helper and add a BUG so we can catch the overflow before it occurs. ->value is too small (4, should be 16), but for normal cases (well-formed data must fit into max reg space, i.e. 64 byte) the chain buffer that comes after value in the structure provides a cushion. In order to have the new BUG() not trigger on valid data, bump value to the correct size, this is userspace so the additional 60 bytes of stack usage is no concern. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: fix buffer size for user data in netlink_delinearize_chain()	Thomas Haller	2023-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The correct define is NFTNL_UDATA_CHAIN_MAX and not NFTNL_UDATA_OBJ_MAX. In current libnftnl, they both are defined as 1, so (with current libnftnl) there is no difference. Fixes: 702ac2b72c0e ("src: add comment support for chains") Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <string.h> in <nft.h>	Thomas Haller	2023-09-28	1	-1/+0
\| \| \| \| \| \| \| \|	<string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	expression: cleanup expr_ops_by_type() and handle u32 input	Thomas Haller	2023-09-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make fewer assumptions about the underlying integer type of the enum. Instead, be clear about where we have an untrusted uint32_t from netlink and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make this clearer. Later we might make the enum as packed, when this starts to matter more. Also, only the code path expr_ops() wants strict validation and assert against valid enum values. Move the assertion out of __expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to duplicate the handling of EXPR_INVALID. We still need to duplicate the check against EXPR_MAX, to ensure that the uint32_t value can be cast to an enum value. [ Remove cast on EXPR_MAX. --pablo ] Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: handle invalid etype in set_make_key()	Thomas Haller	2023-09-20	1	-0/+2
\| \| \| \| \| \| \| \| \|	It's not clear to me, what ensures that the etype is always valid. Handle a NULL. Fixes: 6e48df5329ea ('src: add "typeof" build/parse/print support') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: fix leaking typeof_expr_data/typeof_expr_key in ↵	Thomas Haller	2023-09-19	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	netlink_delinearize_set() There are various code paths that return without freeing typeof_expr_data and typeof_expr_key. It's not at all obvious, that there isn't a leak that way. Quite possibly there is a leak. Fix it, or at least make the code more obviously correct. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	datatype: fix leak and cleanup reference counting for struct datatype	Thomas Haller	2023-09-14	1	-14/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port` fails: ==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3 ==118== at 0x484682C: calloc (vg_replace_malloc.c:1554) ==118== by 0x48A39DD: xmalloc (utils.c:37) ==118== by 0x48A39DD: xzalloc (utils.c:76) ==118== by 0x487BDFD: datatype_alloc (datatype.c:1205) ==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288) ==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786) ==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892) ==118== by 0x488229D: stmt_evaluate (evaluate.c:4450) ==118== by 0x488328E: rule_evaluate (evaluate.c:4956) ==118== by 0x48ADC71: nft_evaluate (libnftables.c:552) ==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595) ==118== by 0x402983: main (main.c:534) I think the reference handling for datatype is wrong. It was introduced by commit 01a13882bb59 ('src: add reference counter for dynamic datatypes'). We don't notice it most of the time, because instances are statically allocated, where datatype_get()/datatype_free() is a NOP. Fix and rework. - Commit 01a13882bb59 comments "The reference counter of any newly allocated datatype is set to zero". That seems not workable. Previously, functions like datatype_clone() would have returned the refcnt set to zero. Some callers would then then set the refcnt to one, but some wouldn't (set_datatype_alloc()). Calling datatype_free() with a refcnt of zero will overflow to UINT_MAX and leak: if (--dtype->refcnt > 0) return; While there could be schemes with such asymmetric counting that juggle the appropriate number of datatype_get() and datatype_free() calls, this is confusing and error prone. The common pattern is that every alloc/clone/get/ref is paired with exactly one unref/free. Let datatype_clone() return references with refcnt set 1 and in general be always clear about where we transfer ownership (take a reference) and where we need to release it. - set_datatype_alloc() needs to consistently return ownership to the reference. Previously, some code paths would and others wouldn't. - Replace datatype_set(key, set_datatype_alloc(dtype, key->byteorder)) with a __datatype_set() with takes ownership. Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes') Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	include: include <stdlib.h> in <nft.h>	Thomas Haller	2023-09-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It provides malloc()/free(), which is so basic that we need it everywhere. Include via <nft.h>. The ultimate purpose is to define more things in <nft.h>. While it has not corresponding C sources, <nft.h> can contain macros and static inline functions, and is a good place for things that we shall have everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that is a very basic dependency, that will be needed for that. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: simplify chain_alloc()	Pablo Neira Ayuso	2023-08-31	1	-1/+3
\| \| \| \| \| \| \|	Remove parameter to set the chain name which is only used from netlink path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: avoid "-Wenum-conversion" warning in dtype_map_from_kernel()	Thomas Haller	2023-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Clang warns: netlink.c:806:26: error: implicit conversion from enumeration type 'enum nft_data_types' to different enumeration type 'enum datatypes' [-Werror,-Wenum-conversion] return datatype_lookup(type); ~~~~~~~~~~~~~~~ ^~~~ Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add <nft.h> header and include it as first	Thomas Haller	2023-08-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Implement 'reset {set,map,element}' commands	Phil Sutter	2023-07-13	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	All these are used to reset state in set/map elements, i.e. reset the timeout or zero quota and counter values. While 'reset element' expects a (list of) elements to be specified which should be reset, 'reset set/map' will reset all elements in the given set/map. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: restore typeof interval map data type	Florian Westphal	2023-05-02	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	When "typeof ... : interval ..." gets used, existing logic failed to validate the expressions. "interval" means that kernel reserves twice the size, so consider this when validating and restoring. Also fix up the dump file of the existing test case to be symmetrical. Signed-off-by: Florian Westphal <fw@strlen.de>
*	Implement 'reset rule' and 'reset rules' commands	Phil Sutter	2023-01-18	1	-0/+49
\| \| \| \| \| \| \| \|	Reset rule counters and quotas in kernel, i.e. without having to reload them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET message type. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: Fix for potential NULL-pointer deref	Phil Sutter	2023-01-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	If memory allocation fails, calloc() returns NULL which was not checked for. The code seems to expect zero array size though, so simply replacing this call by one of the x*calloc() ones won't work. So guard the call also by a check for 'len'. Fixes: db0697ce7f602 ("src: support for flowtable listing") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	src: display (inner) tag in --debug=proto-ctx	Pablo Neira Ayuso	2023-01-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For easier debugging, add decoration on protocol context: # nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter update link layer protocol context (inner): link layer : netdev <- network layer : none transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update network layer protocol context (inner): link layer : netdev network layer : ip <- transport layer : none payload data : none update transport layer protocol context (inner): link layer : netdev network layer : ip transport layer : icmp <- payload data : none Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: unfold function to generate concatenations for keys and data	Pablo Neira Ayuso	2022-12-10	1	-10/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a specific function to generate concatenation with and without intervals in maps. This restores the original function added by 8ac2f3b2fca3 ("src: Add support for concatenated set ranges") which is used by 66746e7dedeb ("src: support for nat with interval concatenation") to generate the data concatenations in maps. Only the set element key requires the byteswap introduced by 1017d323cafa ("src: support for selectors with different byteorder with interval concatenations"). Therefore, better not to reuse the same function for key and data as the future might bring support for more kind of concatenations in data maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: add function to generate set element key data	Pablo Neira Ayuso	2022-12-10	1	-4/+22
\| \| \| \| \| \| \|	Add netlink_gen_key(), it is just like __netlink_gen_data() with no EXPR_VERDICT case, which should not ever happen for set element keys. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: statify __netlink_gen_data()	Pablo Neira Ayuso	2022-12-10	1	-4/+4
\| \| \| \|	Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: swap byteorder of value component in concatenation of intervals	Pablo Neira Ayuso	2022-12-08	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1017d323cafa ("src: support for selectors with different byteorder with interval concatenations") was incomplete. Switch byteorder of singleton values in a set that contains concatenation of intervals. This singleton value is actually represented as a range in the kernel. After this patch, if the set represents a concatenation of intervals: - EXPR_F_INTERVAL denotes the lhs of the interval. - EXPR_F_INTERVAL_END denotes the rhs of the interval (this flag was already used in this way before this patch). If none of these flags are set on, then the set contains concatenations of singleton values (no interval flag is set on), in such case, no byteorder swap is required. Update tests/shell and tests/py to cover the use-case breakage reported by Eric. Fixes: 1017d323cafa ("src: support for selectors with different byteorder with interval concatenations") Reported-by: Eric Garver <eric@garver.life> Tested-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: support for selectors with different byteorder with interval concatenations	Pablo Neira Ayuso	2022-11-30	1	-6/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assuming the following interval set with concatenation: set test { typeof ip saddr . meta mark flags interval } then, the following rule: ip saddr . meta mark @test requires bytecode that swaps the byteorder for the meta mark selector in case the set contains intervals and concatenations. inet x y [ meta load nfproto => reg 1 ] [ cmp eq reg 1 0x00000002 ] [ payload load 4b @ network header + 12 => reg 1 ] [ meta load mark => reg 9 ] [ byteorder reg 9 = hton(reg 9, 4, 4) ] <----- this is required ! [ lookup reg 1 set test dreg 0 ] This patch updates byteorder_conversion() to add the unary expression that introduces the byteorder expression. Moreover, store the meta mark range component of the element tuple in the set in big endian as it is required for the range comparisons. Undo the byteorder swap in the netlink delinearize path to listing the meta mark values accordingly. Update tests/py to validate that byteorder expression is emitted in the bytecode. Update tests/shell to validate insertion and listing of a named map declaration. A similar commit 806ab081dc9a ("netlink: swap byteorder for host-endian concat data") already exists in the tree to handle this for strings with prefix (e.g. eth*). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow anon set concatenation with ether and vlan	Florian Westphal	2022-08-05	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vlan id uses integer type (which has a length of 0). Using it was possible, but listing would assert: python: mergesort.c:24: concat_expr_msort_value: Assertion `ilen > 0' failed. There are two reasons for this. First reason is that the udata/typeof information lacks the 'vlan id' part, because internally this is 'payload . binop(payload AND mask)'. binop lacks an udata store. It makes little sense to store it, 'typeof' keyword expects normal match syntax. So, when storing udata, store the left hand side of the binary operation, i.e. the load of the 2-byte key. With that resolved, delinerization could work, but concat_elem_expr() would splice 12 bits off the elements value, but it should be 16 (on a byte boundary). Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: swap byteorder for host-endian concat data	Florian Westphal	2022-05-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	All data must be passed in network byte order, else matching won't work respectively kernel will reject the interval because it thinks that start is after end This is needed to allow use of 'ppp*' in interval sets with concatenations. Signed-off-by: Florian Westphal <fw@strlen.de>
*	netlink: remove unused argument from helper function	Florian Westphal	2022-04-18	1	-3/+3
\| \| \| \|	Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add EXPR_F_KERNEL to identify expression in the kernel	Pablo Neira Ayuso	2022-04-13	1	-0/+1
\| \| \| \| \| \|	This allows to identify the set elements that reside in the kernel. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow to use typeof of raw expressions in set declaration	Pablo Neira Ayuso	2022-03-29	1	-10/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set length and byteorder. Add missing information to the set userdata area for raw payload expressions which allows to rebuild the set typeof from the listing path. A few examples: - With anonymous sets: nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e } - With named sets: table x { set y { typeof ip saddr . @ih,32,32 elements = { 1.1.1.1 . 0x14 } } } Incremental updates are also supported, eg. nft add element x y { 3.3.3.3 . 0x28 } expr_evaluate_concat() is used to evaluate both set key definitions and set key values, using two different function might help to simplify this code in the future. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: check key is EXPR_CONCAT before accessing field	Pablo Neira Ayuso	2022-02-17	1	-1/+2
\| \| \| \| \| \| \|	alloc_nftnl_setelem() needs to check for EXPR_CONCAT before accessing field_count. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: Use abort() in case of netlink_abi_error	Eugene Crosser	2022-01-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Library functions should not use exit(), application that uses the library may contain error handling path, that cannot be executed if library functions calls exit(). For truly fatal errors, using abort() is more acceptable than exit(). Signed-off-by: Eugene Crosser <crosser@average.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: simplify logic governing storing payload dependencies	Jeremy Sowden	2022-01-15	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several places where we check whether `ctx->pdctx.pbase` equal to `PROTO_BASE_INVALID` and don't bother trying to free the dependency if so. However, these checks are redundant. In `payload_match_expand` and `trace_gen_stmts`, we skip a call to `payload_dependency_kill`, but that calls `payload_dependency_exists` to check a dependency exists before doing anything else. In `ct_meta_common_postprocess`, we skip an open-coded equivalent to `payload_dependency_kill` which performs some different checks, but the first is the same: a call to `payload_dependency_exists`. Therefore, we can drop the redundant checks and simplify the flow- control in the functions. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: remove arithmetic on booleans	Jeremy Sowden	2022-01-15	1	-4/+6
\| \| \| \| \| \| \| \|	Instead of subtracting a boolean from the protocol base for stacked payloads, just decrement the base variable itself. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
*	cache: Support filtering for a specific flowtable	Phil Sutter	2021-12-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Extend nft_cache_filter to hold a flowtable name so 'list flowtable' command causes fetching the requested flowtable only. Dump flowtables just once instead of for each table, merely assign fetched data to tables inside the loop. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	cache: Filter tables on kernel side	Phil Sutter	2021-12-03	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \|	Instead of requesting a dump of all tables and filtering the data in user space, construct a non-dump request if filter contains a table so kernel returns only that single table. This should improve nft performance in rulesets with many tables present. Signed-off-by: Phil Sutter <phil@nwl.cc>
*	cache: filter out rules by chain	Pablo Neira Ayuso	2021-11-11	1	-42/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With an autogenerated ruleset with ~20k chains. # time nft list ruleset &> /dev/null real 0m1,712s user 0m1,258s sys 0m0,454s Speed up listing of a specific chain: # time nft list chain nat MWDG-UGR-234PNG3YBUOTS5QD &> /dev/null real 0m0,542s user 0m0,251s sys 0m0,292s Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	mnl: do not build nftnl_set element list	Pablo Neira Ayuso	2021-11-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Do not call alloc_setelem_cache() to build the set element list in nftnl_set. Instead, translate one single set element expression to nftnl_set_elem object at a time and use this object to build the netlink header. Using a huge test set containing 1.1 million element blocklist, this patch is reducing userspace memory consumption by 40%. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: reset temporary set element stmt list after list splice	Pablo Neira Ayuso	2021-09-16	1	-1/+1
\| \| \| \| \| \| \| \|	Reset temporary stmt list to deal with the key_end case which might result in a jump backward to handle the rhs of the interval. Reported-by: Martin Zatloukal <slezi2@pvfree.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: rework range_expr_to_prefix()	Pablo Neira Ayuso	2021-09-09	1	-30/+36
\| \| \| \| \| \| \| \| \|	Consolidate prefix calculation in range_expr_is_prefix(). Add tests/py for 9208fb30dc49 ("src: Check range bounds before converting to prefix"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Check range bounds before converting to prefix	Xiao Liang	2021-09-06	1	-6/+9
\| \| \| \| \| \| \| \| \| \|	The lower bound must be the first value of the prefix to be coverted. For example, range "10.0.0.15-10.0.0.240" can not be converted to "10.0.0.15/24". Validate it by checking if the lower bound value has enough trailing zeros. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: support for nat with interval concatenation	Pablo Neira Ayuso	2021-07-13	1	-26/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to combine concatenation and interval in NAT mappings, e.g. add rule x y dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 } This generates the following NAT expression: [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ] which expects to obtain the following tuple: IP address (min), source port (min), IP address (max), source port (max) to be obtained from the map. This representation simplifies the delinearize path, since the datatype is specified as: ipv4_addr . inet_service. A few more notes on this update: - alloc_nftnl_setelem() needs a variant netlink_gen_data() to deal with the representation of the range on the rhs of the mapping. In contrast to interval concatenation in the key side, where the range is expressed as two netlink attributes, the data side of the set element mapping stores the interval concatenation in a contiguos memory area, see __netlink_gen_concat_expand() for reference. - add range_expr_postprocess() to postprocess the data mapping range. If either one single IP address or port is used, then the minimum and maximum value in the range is the same value, e.g. to avoid listing 80-80, this round simplify the range. This also invokes the range to prefix conversion routine. - add concat_elem_expr() helper function to consolidate code to build the concatenation expression on the rhs element data side. This patch also adds tests/py and tests/shell. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	rule: memleak of list of timeout policies	Pablo Neira Ayuso	2021-06-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Release list of ct timeout policy when object is freed. Direct leak of 160 byte(s) in 2 object(s) allocated from: #0 0x7fc0273ad330 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe9330) #1 0x7fc0231377c4 in xmalloc /home/.../devel/nftables/src/utils.c:36 #2 0x7fc023137983 in xzalloc /home/.../devel/nftables/src/utils.c:75 #3 0x7fc0231f64d6 in nft_parse /home/.../devel/nftables/src/parser_bison.y:4448 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: Avoid memleak in error path of netlink_delinearize_obj()	Phil Sutter	2021-06-14	1	-0/+1
\| \| \| \| \| \| \| \|	If parsing udata fails, 'obj' has to be freed before returning to caller. Fixes: 293c9b114faef ("src: add comment support for objects") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: Avoid memleak in error path of netlink_delinearize_table()	Phil Sutter	2021-06-14	1	-0/+1
\| \| \| \| \| \| \| \|	If parsing udata fails, 'table' has to be freed before returning to caller. Fixes: c156232a530b3 ("src: add comment support when adding tables") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: Avoid memleak in error path of netlink_delinearize_chain()	Phil Sutter	2021-06-14	1	-0/+1
\| \| \| \| \| \| \| \|	If parsing udata fails, 'chain' has to be freed before returning to caller. Fixes: 702ac2b72c0e8 ("src: add comment support for chains") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: Avoid memleak in error path of netlink_delinearize_set()	Phil Sutter	2021-06-14	1	-2/+2
\| \| \| \| \| \| \| \|	Duplicate string 'comment' later when the function does not fail anymore. Fixes: 0864c2d49ee8a ("src: add comment support for set declarations") Signed-off-by: Phil Sutter <phil@nwl.cc>
*	netlink: quick sort array of devices	Pablo Neira Ayuso	2021-06-08	1	-0/+18
\| \| \| \| \| \| \|	Provide an ordered list of devices for (netdev) chain and flowtable. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1525 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	libnftables: location-based error reporting for chain type	Pablo Neira Ayuso	2021-05-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store the location of the chain type for better error reporting. Several users that compile custom kernels reported that error reporting is misleading when accidentally selecting CONFIG_NFT_NAT=n. After this patch, a better hint is provided: # nft 'add chain x y { type nat hook prerouting priority dstnat; }' Error: Could not process rule: No such file or directory add chain x y { type nat hook prerouting priority dstnat; } ^^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add set element catch-all support	Pablo Neira Ayuso	2021-05-11	1	-26/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a catchall expression (EXPR_SET_ELEM_CATCHALL). Use the asterisk () to represent the catch-all set element, e.g. table x { set y { type ipv4_addr counter elements = { 1.2.3.4 counter packets 0 bytes 0, counter packets 0 bytes 0 } } } Special handling for segtree: zap the catch-all element from the set element list and re-add it after processing. Remove wildcard_expr deadcode in src/parser_bison.y This patch also adds several tests for the tests/py and tests/shell infrastructures. Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	cache: add hashtable cache for table	Pablo Neira Ayuso	2021-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Add a hashtable for fast table lookups. Tables that reside in the cache use the table->cache_hlist and table->cache_list heads. Table that are created from command line / ruleset are also added to the cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>