| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
| |
Move NFTNL_SET_ELEM_F_INTERVAL_OPEN flag to the existing flags field in
struct expr.
This saves 4 bytes in struct expr, shrinking it to 128 bytes according to
pahole. This reworks:
6089630f54ce ("segtree: Introduce flag for half-open range elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hitherto, the kernel has required constant values for the `xor` and
`mask` attributes of boolean bitwise expressions. This has meant that
the right-hand operand of a boolean binop must be constant. Now the
kernel has support for AND, OR and XOR operations with right-hand
operands passed via registers, we can relax this restriction. Allow
non-constant right-hand operands if the left-hand operand is not
constant, e.g.:
ct mark & 0xffff0000 | meta mark & 0xffff
The kernel now supports performing AND, OR and XOR operations directly,
on one register and an immediate value or on two registers, so we need
to be able to generate and parse bitwise boolean expressions of this
form.
If a boolean operation has a constant RHS, we continue to send a
mask-and-xor expression to the kernel.
Add tests for {ct,meta} mark with variable RHS operands.
JSON support is also included.
This requires Linux kernel >= 6.13-rc.
[ Originally posted as patch 1/8 and 6/8 which has been collapsed and
simplified to focus on initial {ct,meta} mark support. Tests have
been extracted from 8/8 including a tests/py fix to payload output
due to incorrect output in original patchset. JSON support has been
extracted from patch 7/8 --pablo]
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow to specify a numeric queue id as part of a map.
The parser side is easy, but the reverse direction (listing) is not.
'queue' is a statement, it doesn't have an expression.
Add a generic 'queue_type' datatype as a shim to the real basetype with
constant expressions, this is used only for udata build/parse, it stores
the "key" (the parser token, here "queue") as udata in kernel and can
then restore the original key.
Add a dumpfile to validate parser & output.
JSON support is missing because JSON allow typeof only since quite
recently.
Joint work with Pablo.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1455
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
| |
Leftover unused struct datatype field, remove it.
Fixes: e35aabd511c4 ("datatype: replace DTYPE_F_ALLOC by bitfield")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
| |
These were entirely ignored before, add the necessary code analogous to
e.g. objects.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Large sets can expand into several netlink messages, use sequence number
and attribute offset to correlate the set element and the location.
When set element command expands into several netlink messages,
increment sequence number for each netlink message. Update struct cmd to
store the range of netlink messages that result from this command.
struct nlerr_loc remains in the same size in x86_64.
# nft -f set-65535.nft
set-65535.nft:65029:22-32: Error: Could not process rule: File exists
create element x y { 1.1.254.253 }
^^^^^^^^^^^
Fixes: f8aec603aa7e ("src: initial extended netlink error reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum netlink message length (nlh->nlmsg_len) is uint32_t, struct
nlerr_loc stores the offset to the netlink attribute which must be
uint32_t, not uint16_t.
While at it, remove check for zero netlink attribute offset in
nft_cmd_error() which should not ever happen, likely this check was
there to prevent the uint16_t offset overflow.
Fixes: f8aec603aa7e ("src: initial extended netlink error reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
| |
To prepare for a fix for very large sets.
No functional change is intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
| |
rename mnl_seqnum_alloc() to mnl_seqnum_inc().
No functional change is intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
498a5f0c219d ("rule: collapse set element commands") does not help to
reduce memory consumption in the case of large sets defined by one
element per line:
add element ip x y { 1.1.1.1 }
add element ip x y { 1.1.1.2 }
...
This patch reduces memory consumption by ~75%, set elements are
collapsed into an existing cmd object wherever possible to reduce the
number of cmd objects.
This patch also adds a special case for variables for sets similar to:
be055af5c58d ("cmd: skip variable set elements when collapsing commands")
This patch requires this small kernel fix:
commit b53c116642502b0c85ecef78bff4f826a7dd4145
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri May 20 00:02:06 2022 +0200
netfilter: nf_tables: set element extended ACK reporting support
which is already included in recent -stable kernels:
# cat ruleset.nft
add table ip x
add chain ip x y
add set ip x y { type ipv4_addr; }
create element ip x y { 1.1.1.1 }
create element ip x y { 1.1.1.1 }
# nft -f ruleset.nft
ruleset.nft:5:25-31: Error: Could not process rule: File exists
create element ip x y { 1.1.1.1 }
^^^^^^^
since there is no need to relate commands via sequence number anymore,
this allows also removes the uncollapse step.
Fixes: 498a5f0c219d ("rule: collapse set element commands")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow to specify elements that never expire in sets with global
timeout.
set x {
typeof ip saddr
timeout 1m
elements = { 1.1.1.1 timeout never,
2.2.2.2,
3.3.3.3 timeout 2m }
}
in this example above:
- 1.1.1.1 is a permanent element
- 2.2.2.2 expires after 1 minute (uses default set timeout)
- 3.3.3.3 expires after 2 minutes (uses specified timeout override)
Use internal NFT_NEVER_TIMEOUT marker as UINT64_MAX to differenciate
between use default set timeout and timeout never if "timeout N" is used
in set declaration. Maximum supported timeout in milliseconds which is
conveyed within a netlink attribute is 0x10c6f7a0b5ec.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reset command does not utilize the cache infrastructure.
This implicitly fixes a crash with anonymous sets because elements are
not fetched. I initially tried to fix it by toggling the missing cache
flags, but then ASAN reports memleaks.
To address these issues relies on Phil's list filtering infrastructure
which updates is expanded to accomodate filtering requirements of the
reset commands, such as 'reset table ip' where only the family is sent
to the kernel.
After this update, tests/shell reports a few inconsistencies between
reset and list commands:
- reset rules chain t c2
display sets, but it should only list the given chain.
- reset rules table t
reset rules ip
do not list elements in the set. In both cases, these are fully
listing a given table and family, elements should be included.
The consolidation also ensures list and reset will not differ.
A few more notes:
- CMD_OBJ_TABLE is used for:
rules family table
from the parser, due to the lack of a better enum, same applies to
CMD_OBJ_CHAIN.
- CMD_OBJ_ELEMENTS still does not use the cache, but same occurs in
the CMD_GET command case which needs to be consolidated.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1763
Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands")
Fixes: 1694df2de79f ("Implement 'reset rule' and 'reset rules' commands")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, full ruleset flag is set on to fetch objects.
Follow a similar approach to these patches from Phil:
de961b930660 ("cache: Filter set list on server side") and
cb4b07d0b628 ("cache: Support filtering for a specific flowtable")
in preparation to update the reset command to use the cache
infrastructure.
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
| |
Only user of the datatype flags field is DTYPE_F_ALLOC, replace it by
bitfield, squash byteorder to 8 bits which is sufficient.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
| |
only ipv4 and ipv6 datatype support this, add datatype_prefix_notation()
helper function to report that datatype prefers prefix notation, if
possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
| |
Instead of not returning any results for
nft list hooks netdev
Iterate all interfaces and then query all of them.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updates on verdict maps that require many non-base chains are slowed
down due to fetching existing non-base chains into the cache.
Chains are only required for error reporting hints if kernel reports
ENOENT. Populate the cache from this error path only.
Similar approach already exists from rule ENOENT error path since:
deb7c5927fad ("cmd: add misspelling suggestions for rule commands")
however, NFT_CACHE_CHAIN was toggled inconditionally for rule
commands, rendering this on-demand cache population useless.
before this patch, running Neels' nft_slew benchmark (peak values):
created idx 4992 in 52587950 ns (128 in 7122 ms)
...
deleted idx 128 in 43542500 ns (127 in 6187 ms)
after this patch:
created idx 4992 in 11361299 ns (128 in 1612 ms)
...
deleted idx 1664 in 5239633 ns (128 in 733 ms)
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
| |
since commit b98fee20bfe2 ("mnl: revisit hook listing"), handle.chain is
never set in this path, so 'hook' is always set to -1, so the hook arg
can be dropped.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
| |
Removed two years ago with v6.1, ditch this from hook list code as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
| |
Add a string preprocessor to identify and replace variables in a string.
Rework existing support to variables in log prefix strings to use it.
Fixes: e76bb3794018 ("src: allow for variables in the log prefix string")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
| |
Bison parser lacked support for passing multiple flags, JSON parser
did not support table flags at all.
Document also 'owner' flag (and describe their relationship in nft.8.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, ICMP{v4,v6,inet} code datatypes only describe those that are
supported by the reject statement, but they can also be used for icmp
code matching. Moreover, ICMP code types go hand-to-hand with ICMP
types, that is, ICMP code symbols depend on the ICMP type.
Thus, the output of:
nft describe icmp_code
look confusing because that only displays the values that are supported
by the reject statement.
Disentangle this by adding internal datatypes for the reject statement
to handle the ICMP code symbol conversion to value as well as ruleset
listing.
The existing icmp_code, icmpv6_code and icmpx_code remain in place. For
backward compatibility, a parser function is defined in case an existing
ruleset relies on these symbols.
As for the manpage, move existing ICMP code tables from the DATA TYPES
section to the REJECT STATEMENT section, where this really belongs to.
But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES
section because that describe that this is an 8-bit integer field.
After this patch:
# nft describe icmp_code
datatype icmp_code (icmp code) (basetype integer), 8 bits
# nft describe icmpv6_code
datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits
# nft describe icmpx_code
datatype icmpx_code (icmpx code) (basetype integer), 8 bits
do not display the symbol table of the reject statement anymore.
icmpx_code_type is not used anymore, but keep it in place for backward
compatibility reasons.
And update tests/shell accordingly.
Fixes: 5fdd0b6a0600 ("nft: complete reject support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'")
reverses the hour range in case that a cross-day range is used, eg.
meta hour "03:00"-"14:00" counter accept
which results in (Sidney, Australia AEDT time):
meta hour != "14:00"-"03:00" counter accept
kernel handles time in UTC, therefore, cross-day range may not be
obvious according to local time.
The ruleset listing above is not very intuitive to the reader depending
on their timezone, therefore, complete netlink delinearize path to
reverse the cross-day meta range.
Update manpage to recommend to use a range expression when matching meta
hour range. Recommend range expression for meta time and meta day too.
Extend testcases/listing/meta_time to cover for this scenario.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The included sample causes a crash because we attempt to
range-merge a prefix expression with a symbolic expression.
The first set is evaluated, the symbol expression evaluation fails
and nft queues an error message ("Could not resolve hostname").
However, nft continues evaluation.
nft then encounters the same set definition again and merges the
new content with the preceeding one.
But the first set structure is dodgy, it still contains the
unresolved symbolic expression.
That then makes nft crash (assert) in the set internals.
There are various different incarnations of this issue, but the low
level set processing code does not allow for any partially transformed
expressions to still remain.
Before:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
BUG: invalid range expression type binop
nft: src/expression.c:1479: range_expr_value_low: Assertion `0' failed.
After:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
invalid_range_expr_type_binop:4:18-25: Error: Could not resolve hostname: Name or service not known
elements = { 1&.141.0.1 - 192.168.0.2}
^^^^^^^^
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
129f9d153279 ("nft: migrate man page examples with `meter` directive to
sets") already replaced meters by dynamic sets.
This patch removes NFT_SET_ANONYMOUS flag from the implicit set that is
instantiated via meter, so the listing shows a dynamic set instead which
is the recommended approach these days.
Therefore, a batch like this:
add table t
add chain t c
add rule t c tcp dport 80 meter m size 128 { ip saddr timeout 1s limit rate 10/second }
gets translated to a dynamic set:
table ip t {
set m {
type ipv4_addr
size 128
flags dynamic,timeout
}
chain c {
tcp dport 80 update @m { ip saddr timeout 1s limit rate 10/second burst 5 packets }
}
}
Check for NFT_SET_ANONYMOUS flag is also relaxed for list and flush
meter commands:
# nft list meter ip t m
table ip t {
set m {
type ipv4_addr
size 128
flags dynamic,timeout
}
}
# nft flush meter ip t m
As a side effect the legacy 'list meter' and 'flush meter' commands allow
to flush a dynamic set to retain backward compatibility.
This patch updates testcases/sets/0022type_selective_flush_0 and
testcases/sets/0038meter_list_0 as well as the json output which now
uses the dynamic set representation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AFL found following crash:
table ip filter {
map ipsec_in {
typeof ipsec in reqid . iif : verdict
flags interval
}
chain INPUT {
type filter hook input priority filter; policy drop;
ipsec in reqid . 100 @ipsec_in
}
}
Which yields:
nft: evaluate.c:1213: expr_evaluate_unary: Assertion `!expr_is_constant(arg)' failed.
All existing test cases with constant values use big endian values, but
"iif" expects host endian values.
As raw values were not supported before, concat byteorder conversion
doesn't handle constants.
Fix this:
1. Add constant handling so that the number is converted in-place,
without unary expression.
2. Add the inverse handling on delinearization for non-interval set
types.
When dissecting the concat data soup, watch for integer constants where
the datatype indicates host endian integer.
Last, extend an existing test case with the afl input to cover
in/output.
A new test case is added to test linearization, delinearization and
matching.
Based on original patch from Florian Westphal, patch subject and
description wrote by him.
Fixes: b422b07ab2f9 ("src: permit use of constant values in set lookup keys")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a symbol_table_print() wrapper for the run-time populated
rt_symbol_tables which formats output similar to expr_describe() and
includes the data source.
Since these tables reside in struct output_ctx there is no implicit
connection between data type and therefore providing callbacks for
relevant datat types which feed the data into said wrapper is a simpler
solution than extending expr_describe() itself.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
netlink_linearize.c has never supported more than 16 chained binops.
Adding more is possible but overwrites the stack in
netlink_gen_bitwise().
Add a recursion counter to catch this at eval stage.
Its not enough to just abort once the counter hits
NFT_MAX_EXPR_RECURSION.
This is because there are valid test cases that exceed this.
For example, evaluation of 1 | 2 will merge the constans, so even
if there are a dozen recursive eval calls this will not end up
with large binop chain post-evaluation.
v2: allow more than 16 binops iff the evaluation function
did constant-merging.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
| |
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel will reject this too, but unfortunately nft may try
to cram the data into the underlying libnftnl expr.
This causes heap corruption or
BUG: nld buffer overflow: want to copy 132, max 64
After:
Error: Concatenation of size 544 exceeds maximum size of 512
udp length . @th,0,512 . @th,512,512 { 47-63 . 0xe373135363130 . 0x33131303735353203 }
^^^^^^^^^
resp. same warning for an over-sized raw expression.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a stack overflow somewhere in this code, we end
up memcpy'ing a way too large expr into a fixed-size on-stack
buffer.
This is hard to diagnose, most of this code gets inlined so
the crash happens later on return from alloc_nftnl_setelem.
Condense the mempy into a helper and add a BUG so we can catch
the overflow before it occurs.
->value is too small (4, should be 16), but for normal
cases (well-formed data must fit into max reg space, i.e.
64 byte) the chain buffer that comes after value in the
structure provides a cushion.
In order to have the new BUG() not trigger on valid data,
bump value to the correct size, this is userspace so the additional
60 bytes of stack usage is no concern.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch consolidates ctx->stmt_len reset in stmt_evaluate() to avoid
this problem. Note that stmt_evaluate_meta() and stmt_evaluate_ct()
already reset it after the statement evaluation.
Moreover, statement dependency can be generated while evaluating a meta
and ct statement. Payload statement dependency already manually stashes
this before calling stmt_evaluate(). Add a new stmt_dependency_evaluate()
function to stash statement length context when evaluating a new statement
dependency and use it for all of the existing statement dependencies.
Florian also says:
'meta mark set vlan id map { 1 : 0x00000001, 4095 : 0x00004095 }' will
crash. Reason is that the l2 dependency generated here is errounously
expanded to a 32bit-one, so the evaluation path won't recognize this
as a L2 dependency. Therefore, pctx->stacked_ll_count is 0 and
__expr_evaluate_payload() crashes with a null deref when
dereferencing pctx->stacked_ll[0].
nft-test.py gains a fugly hack to tolerate '!map typeof vlan id : meta mark'.
For more generic support we should find something more acceptable, e.g.
!map typeof( everything here is a key or data ) timeout ...
tests/py update and assert(pctx->stacked_ll_count) by Florian Westphal.
Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xmalloc() (and similar x-functions) are used for allocation. They wrap
malloc()/realloc() but will abort the program on ENOMEM.
The meaning of xmalloc() is that it wraps malloc() but aborts on
failure. I don't think x-functions should have the notion, that this
were potentially a different memory allocator that must be paired
with a particular xfree().
Even if the original intent was that the allocator is abstracted (and
possibly not backed by standard malloc()/free()), then that doesn't seem
a good idea. Nowadays libc allocators are pretty good, and we would need
a very special use cases to switch to something else. In other words,
it will never happen that xmalloc() is not backed by malloc().
Also there were a few places, where a xmalloc() was already "wrongly"
paired with free() (for example, iface_cache_release(), exit_cookie(),
nft_run_cmd_from_buffer()).
Or note how pid2name() returns an allocated string from fscanf(), which
needs to be freed with free() (and not xfree()). This requirement
bubbles up the callers portid2name() and name_by_portid(). This case was
actually handled correctly and the buffer was freed with free(). But it
shows that mixing different allocators is cumbersome to get right. Of
course, we don't actually have different allocators and whether to use
free() or xfree() makes no different. The point is that xfree() serves
no actual purpose except raising irrelevant questions about whether
x-functions are correctly paired with xfree().
Note that xfree() also used to accept const pointers. It is bad to
unconditionally for all deallocations. Instead prefer to use plain
free(). To free a const pointer use free_const() which obviously wraps
free, as indicated by the name.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost everywhere xmalloc() and friends is used instead of malloc().
This is almost everywhere paired with xfree().
xfree() has two problems. First, it brings the wrong notion that
xmalloc() should be paired with xfree(), as if xmalloc() would not use
the plain malloc() allocator. In practices, xfree() just wraps free(),
and it wouldn't make sense any other way. xfree() should go away. This
will be addressed in the next commit.
The problem addressed by this commit is that xfree() accepts a const
pointer. Paired with the practice of almost always using xfree() instead
of free(), all our calls to xfree() cast away constness of the pointer,
regardless whether that is necessary. Declaring a pointer as const
should help us to catch wrong uses. If the xfree() function always casts
aways const, the compiler doesn't help.
There are many places that rightly cast away const during free. But not
all of them. Add a free_const() macro, which is like free(), but accepts
const pointers. We should always make an intentional choice whether to
use free() or free_const(). Having a free_const() macro makes this very
common choice clearer, instead of adding a (void*) cast at many places.
Note that we now pair xmalloc() allocations with a free() call (instead
of xfree(). That inconsistency will be resolved in the next commit.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mpz_get_str() (with NULL as first argument) will allocate a buffer using
the allocator functions (mp_set_memory_functions()). We should free
those buffers with the corresponding free function.
Add nft_gmp_free() for that and use it.
The name nft_gmp_free() is chosen because "mini-gmp.c" already has an
internal define called gmp_free(). There wouldn't be a direct conflict,
but using the same name is confusing. And maybe our own defines should
have a clear nft prefix.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch from recursive-make to a single top-level Makefile. This is the
first step, the following patches will continue this.
Unlike meson's subdir() or C's #include, automake's SUBDIRS= does not
include a Makefile. Instead, it calls `make -C $dir`.
https://www.gnu.org/software/make/manual/html_node/Recursion.html
https://www.gnu.org/software/automake/manual/html_node/Subdirectories.html
See also, "Recursive Make Considered Harmful".
https://accu.org/journals/overload/14/71/miller_2004/
This has several problems, which we an avoid with a single Makefile:
- recursive-make is harder to maintain and understand as a whole.
Recursive-make makes sense, when there are truly independent
sub-projects. Which is not the case here. The project needs to be
considered as a whole and not one directory at a time. When
we add unit tests (which we should), those would reside in separate
directories but have dependencies between directories. With a single
Makefile, we see all at once. The build setup has an inherent complexity,
and that complexity is not necessarily reduced by splitting it into more files.
On the contrary it helps to have it all in once place, provided that it's
sensibly structured, named and organized.
- typing `make` prints irrelevant "Entering directory" messages. So much
so, that at the end of the build, the terminal is filled with such
messages and we have to scroll to see what even happened.
- with recursive-make, during build we see:
make[3]: Entering directory '.../nftables/src'
CC meta.lo
meta.c:13:2: error: #warning hello test [-Werror=cpp]
13 | #warning hello test
| ^~~~~~~
With a single Makefile we get
CC src/meta.lo
src/meta.c:13:2: error: #warning hello test [-Werror=cpp]
13 | #warning hello test
| ^~~~~~~
This shows the full filename -- assuming that the developer works from
the top level directory. The full name is useful, for example to
copy+paste into the terminal.
- single Makefile is also faster:
$ make && perf stat -r 200 -B make -j
I measure 35msec vs. 80msec.
- recursive-make limits parallel make. You have to craft the SUBDIRS= in
the correct order. The dependencies between directories are limited,
as make only sees "LDADD = $(top_builddir)/src/libnftables.la" and
not the deeper dependencies for the library.
- I presume, some people like recursive-make because of `make -C $subdir`
to only rebuild one directory. Rebuilding the entire tree is already very
fast, so this feature seems not relevant. Also, as dependency handling
is limited, we might wrongly not rebuild a target. For example,
make check
touch src/meta.c
make -C examples check
does not rebuild "examples/nft-json-file".
What we now can do with single Makefile (and better than before), is
`make examples/nft-json-file`, which works as desired and rebuilds all
dependencies.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was currently not possible to match the target address of a neighbor
solicitation or neighbor advertisement against a dynamic set, unlike in
IPv4.
Since they are many ICMPv6 messages with an address at the same offset,
allow filtering on the target address for all icmp types that have one.
While at it, also allow matching the destination address of an ICMPv6
redirect.
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
| |
|
|
|
|
|
| |
Add map statement stub to restore compilation without json support.
Fixes: 27a2da23d508 ("netlink_linearize: skip set element expression in map statement key")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
| |
<string.h> provides strcmp(), as such it's very basic and used
everywhere.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix is similar to 22d201010919 ("netlink_linearize: skip set element
expression in set statement key") to fix map statement.
netlink_gen_map_stmt() relies on the map key, that is expressed as a set
element. Use the set element key instead to skip the set element wrap,
otherwise get_register() abort execution:
nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.
This includes JSON support to make this feature complete and it updates
tests/shell to cover for this support.
Reported-by: Luci Stanescu <luci@cnix.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make fewer assumptions about the underlying integer type of the enum.
Instead, be clear about where we have an untrusted uint32_t from netlink
and an enum. Rename expr_ops_by_type() to expr_ops_by_type_u32() to make
this clearer. Later we might make the enum as packed, when this starts
to matter more.
Also, only the code path expr_ops() wants strict validation and assert
against valid enum values. Move the assertion out of
__expr_ops_by_type(). Then expr_ops_by_type_u32() does not need to
duplicate the handling of EXPR_INVALID. We still need to duplicate the
check against EXPR_MAX, to ensure that the uint32_t value can be cast to
an enum value.
[ Remove cast on EXPR_MAX. --pablo ]
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
"struct datatype" is for the most part immutable, and most callers deal
with const pointers. That's why datatype_get() accepts a const pointer
to increase the reference count (mutating the refcnt field).
It should also return a const pointer. In fact, all callers are fine
with that already.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
| |
Use the enum types as we have them.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The headers should be self-contained so they can be included in any
order. With exception of <nft.h>, which any internal header can rely on.
Some fixes for <cache.h>/<headers.h>.
In case of <cache.h>, forward declare some of the structs instead of
including the headers. <headers.h> uses struct in6_addr.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Matching on ct event makes no sense since this is mostly used as
statement to globally filter out ctnetlink events, but do not crash
if it is used from concatenations.
Add the missing slot in the datatype array so this does not crash.
Fixes: 2595b9ad6840 ("ct: add conntrack event mask support")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, ct label with concatenations such as:
table ip x {
chain y {
ct label . ct mark { 0x1 . 0x1 }
}
}
crashes:
../include/datatype.h:196:11: runtime error: member access within null pointer of type 'const struct datatype'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==640948==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fc970d3199b bp 0x7fffd1f20560 sp 0x7fffd1f20540 T0)
==640948==The signal is caused by a READ memory access.
==640948==Hint: address points to the zero page.
sudo #0 0x7fc970d3199b in datatype_equal ../include/datatype.h:196
Fixes: 2fcce8b0677b ("ct: connlabel matching support")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Setting global handles for libgmp via mp_set_memory_functions() is very
ugly. When we don't use mini-gmp, then potentially there are other users
of the library in the same process, and every process fighting about the
allocation functions is not gonna work.
It also means, we must not reset the allocation functions after somebody
already allocated GMP data with them. Which we cannot ensure, as we
don't know what other parts of the process are doing.
It's also unnecessary. The default allocation functions for gmp and
mini-gmp already abort the process on allocation failure ([1], [2]),
just like our xmalloc().
Just don't do this.
[1] https://gmplib.org/repo/gmp/file/8225bdfc499f/memory.c#l37
[2] https://git.netfilter.org/nftables/tree/src/mini-gmp.c?id=6d19a902c1d77cb51b940b1ce65f31b1cad38b74#n286
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test `./tests/shell/run-tests.sh -V tests/shell/testcases/maps/nat_addr_port`
fails:
==118== 195 (112 direct, 83 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==118== at 0x484682C: calloc (vg_replace_malloc.c:1554)
==118== by 0x48A39DD: xmalloc (utils.c:37)
==118== by 0x48A39DD: xzalloc (utils.c:76)
==118== by 0x487BDFD: datatype_alloc (datatype.c:1205)
==118== by 0x487BDFD: concat_type_alloc (datatype.c:1288)
==118== by 0x488229D: stmt_evaluate_nat_map (evaluate.c:3786)
==118== by 0x488229D: stmt_evaluate_nat (evaluate.c:3892)
==118== by 0x488229D: stmt_evaluate (evaluate.c:4450)
==118== by 0x488328E: rule_evaluate (evaluate.c:4956)
==118== by 0x48ADC71: nft_evaluate (libnftables.c:552)
==118== by 0x48AEC29: nft_run_cmd_from_buffer (libnftables.c:595)
==118== by 0x402983: main (main.c:534)
I think the reference handling for datatype is wrong. It was introduced
by commit 01a13882bb59 ('src: add reference counter for dynamic
datatypes').
We don't notice it most of the time, because instances are statically
allocated, where datatype_get()/datatype_free() is a NOP.
Fix and rework.
- Commit 01a13882bb59 comments "The reference counter of any newly
allocated datatype is set to zero". That seems not workable.
Previously, functions like datatype_clone() would have returned the
refcnt set to zero. Some callers would then then set the refcnt to one, but
some wouldn't (set_datatype_alloc()). Calling datatype_free() with a
refcnt of zero will overflow to UINT_MAX and leak:
if (--dtype->refcnt > 0)
return;
While there could be schemes with such asymmetric counting that juggle the
appropriate number of datatype_get() and datatype_free() calls, this is
confusing and error prone. The common pattern is that every
alloc/clone/get/ref is paired with exactly one unref/free.
Let datatype_clone() return references with refcnt set 1 and in
general be always clear about where we transfer ownership (take a
reference) and where we need to release it.
- set_datatype_alloc() needs to consistently return ownership to the
reference. Previously, some code paths would and others wouldn't.
- Replace
datatype_set(key, set_datatype_alloc(dtype, key->byteorder))
with a __datatype_set() with takes ownership.
Fixes: 01a13882bb59 ('src: add reference counter for dynamic datatypes')
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It provides malloc()/free(), which is so basic that we need it
everywhere. Include via <nft.h>.
The ultimate purpose is to define more things in <nft.h>. While it has
not corresponding C sources, <nft.h> can contain macros and static
inline functions, and is a good place for things that we shall have
everywhere. Since <stdlib.h> provides malloc()/free() and size_t, that
is a very basic dependency, that will be needed for that.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The struct is called "datatype" and related functions have the fitting
"datatype_" prefix. Rename.
Also rename the internal "dtype_alloc()" to "datatype_alloc()".
This is a follow up to commit 01a13882bb59 ('src: add reference counter
for dynamic datatypes'), which started adding "datatype_*()" functions.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|