| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This expression is not used anymore, since:
("src: transform flag match expression to binop expression from parser")
remove it.
This completes the revert of c3d57114f119 ("parser_bison: add shortcut
syntax for matching flags without binary operations"), except the parser
chunk for backwards compatibility.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Transform flagcmp expression to a relational with binop on the left hand
side, ie.
relational
/ \
binop value
/ \
payload mask
Add list_expr_to_binop() to make this transformation.
Goal is two-fold:
- Allow -o/--optimize to pick up on this representation.
- Remove the flagcmp expression in a follow up patch.
This prepare for the removal of the flagcmp expression added by:
c3d57114f119 ("parser_bison: add shortcut syntax for matching flags without binary operations")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
EXPR_MAX was never updated to the newest expression, add __EXPR_MAX and
use it to define EXPR_MAX.
Add case to expr_ops() other gcc complains with a warning on the
__EXPR_MAX case is not handled.
Fixes: 347039f64509 ("src: add symbol range expression to further compact intervals")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shrink struct stmt in 8 bytes.
__stmt_ops_by_type() provides an operation for STMT_INVALID since this
is required by -o/--optimize.
There are many checks for stmt->ops->type, which is the most accessed
field, that can be trivially replaced.
BUG() uses statement type enum instead of name.
Similar to:
68e76238749f ("src: expr: add and use expr_name helper").
72931553828a ("src: expr: add expression etype")
2cc91e6198e7 ("src: expr: add and use internal expr_ops helper")
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the set element:
- represents a mapping
- has a timeout
- has a comment
- has counter/quota/limit
- concatenation (already printed in a single line before this patch)
ie. if the set element requires several words, then print it in one
single line.
Before this patch:
table ip x {
set y {
typeof ip saddr
counter
elements = { 192.168.10.35 counter packets 0 bytes 0, 192.168.10.101 counter packets 0 bytes 0,
192.168.10.135 counter packets 0 bytes 0 }
}
}
After this patch:
table ip x {
set y {
typeof ip saddr
counter
elements = { 192.168.10.35 counter packets 0 bytes 0,
192.168.10.101 counter packets 0 bytes 0,
192.168.10.135 counter packets 0 bytes 0 }
}
}
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
operation
Add a new function to deal with payload statement delinearization with
binop expression.
Infer the payload offset from the mask, then walk the template list to
determine if estimated offset falls within a matching header field. If
so, then validate that this is not a raw expression but an actual
bitfield matching. Finally, trim the payload expression length
accordingly and adjust the payload offset.
instead of:
@nh,8,5 set 0x0
it displays:
ip dscp and 0x1
Update tests/py to cover for this enhancement.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nft can be used t match on specific multipath tcp subtypes:
tcp option mptcp subtype 0
However, depending on which subtype to match, users need to look up the
type/value to use in rfc8684. Add support for mnemonics and
"nft describe tcp option mptcp subtype" to get the subtype list.
Because the number of unique 'enum datatypes' is limited by ABI contraints
this adds a new mptcp suboption type as integer alias.
After this patch, nft supports all of the following:
add element t s { mp-capable }
add rule t c tcp option mptcp subtype mp-capable
add rule t c tcp option mptcp subtype { mp-capable, mp-fail }
For the 3rd case, listing will break because unlike for named sets, nft
lacks the type information needed to pretty-print the integer values,
i.e. nft will print the 3rd rule as 'subtype { 0, 6 }'.
This is resolved in a followup patch.
Other problematic constructs are:
set s1 {
typeof tcp option mptcp subtype . ip saddr
elements = { mp-fail . 1.2.3.4 }
}
Followed by:
tcp option mptcp subtype . ip saddr @s1
nft will print this as:
tcp option mptcp unknown & 240) >> 4 . ip saddr @s1
All of these issues are not related to this patch, however, they also occur
with other bit-sized extheader fields.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update parser to use a new symbol range expression with smaller memory
footprint than range expression + two symbol expressions.
The evaluation step translates this into EXPR_RANGE_VALUE for interval
sets.
Note that maps or concatenations still use the less compact range
expressions representation, those require more work to use this new
symbol range expression. The parser also uses the classic range
expression if variables are used.
Testing with a 100k intervals, worst case scenario: no prefix or
singleton elements. This shows a reduction from 49.58 Mbytes to
35.47 Mbytes (-29.56% memory footprint for this case).
This follow up work to previous commits:
91dc281a82ea ("src: rework singleton interval transformation to reduce memory consumption")
c9ee9032b0ee ("src: add EXPR_RANGE_VALUE expression and use it")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous commit fixed erroneous handling of raw expressions when RHS sets
a zero value.
Input: @ih,58,6 set 0 @ih,86,6 set 0 @ih,170,22 set 0
Output:@ih,48,16 set @ih,48,16 & 0xffc0 @ih,80,16 set \
@ih,80,16 & 0xfc0f @ih,160,32 set @ih,160,32 & 0xffc00000
After this patch, this will instead display:
@ih,58,6 set 0x0 @ih,86,6 set 0x0 @ih,170,22 set 0x0
payload_expr_trim_force() only works when the payload has no known
protocol (template) attached, i.e. will be printed as raw payload syntax.
It performs sanity checks on @mask and then adjusts the payload expression
length and offset according to the mask.
Also add this check in __binop_postprocess() so we can also discard masks
when matching, e.g.
'@ih,7,5 2' becomes '@ih,7,5 0x2', not '@ih,0,16 & 0xffc0 == 0x20'.
binop_postprocess now returns if it performed an action or not; if this
returns true then arguments might have been freed so callers must no longer
refer to any of the expressions attached to the binop.
Next patch adds test cases for this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set_to_intervals() expands range expressions into a list of singleton
elements before building the netlink message that is sent to userspace.
This is because the kernel expects this list of singleton elements where
EXPR_F_INTERVAL_END denotes a closing interval. This expansion
significantly increases memory consumption in userspace.
This patch updates the logic to transform the range expression up to two
temporary singleton element expressions through setelem_to_interval().
Then, these two elements are used to allocate the nftnl_set_elem objects
through alloc_nftnl_setelem_interval() to build the netlink message,
finally all these temporary objects are released. For anonymous sets,
when adjacent ranges are found, the end element is not added to the set
to pack the set representation as in the original set_to_intervals()
routine.
After this update, set_to_intervals() only deals with adding the
non-matching all zero element to the interval set when it is not there
as the kernel expects.
In combination with the new EXPR_RANGE_VALUE expression, this shrinks
runtime userspace memory consumption from 70.50 Mbytes to 43.38 Mbytes
for a 100k intervals set sample.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
This is read-only, constify it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set element with range takes 4 instances of struct expr:
EXPR_SET_ELEM -> EXPR_RANGE -> (2) EXPR_VALUE
where EXPR_RANGE represents two references to struct expr with constant
value.
This new EXPR_RANGE_VALUE trims it down to two expressions:
EXPR_SET_ELEM -> EXPR_RANGE_VALUE
with two direct low and high values that represent the range:
struct {
mpz_t low;
mpz_t high;
};
this two new direct values in struct expr do not modify its size.
setelem_expr_to_range() translates EXPR_RANGE to EXPR_RANGE_VALUE, this
conversion happens at a later stage.
constant_range_expr_print() translates this structure to constant values
to reuse the existing datatype_print() which relies in singleton values.
The automerge routine has been updated to use EXPR_RANGE_VALUE.
This requires a follow up patch to rework the conversion from range
expression to singleton element to provide a noticeable memory
consumption reduction.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
line_offset of 2^32 bytes should be enough.
This requires the removal of the last_line field (in a previous patch) to
shrink struct expr to 112 bytes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
This 4 bytes field is never used, remove it.
This does not shrink struct location in x86_64 due to alignment.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
This saves 8 bytes in x86_64 in struct location which is embedded in
every expression.
This shrinks struct expr to 120 bytes according to pahole.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move NFTNL_SET_ELEM_F_INTERVAL_OPEN flag to the existing flags field in
struct expr.
This saves 4 bytes in struct expr, shrinking it to 128 bytes according to
pahole. This reworks:
6089630f54ce ("segtree: Introduce flag for half-open range elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hitherto, the kernel has required constant values for the `xor` and
`mask` attributes of boolean bitwise expressions. This has meant that
the right-hand operand of a boolean binop must be constant. Now the
kernel has support for AND, OR and XOR operations with right-hand
operands passed via registers, we can relax this restriction. Allow
non-constant right-hand operands if the left-hand operand is not
constant, e.g.:
ct mark & 0xffff0000 | meta mark & 0xffff
The kernel now supports performing AND, OR and XOR operations directly,
on one register and an immediate value or on two registers, so we need
to be able to generate and parse bitwise boolean expressions of this
form.
If a boolean operation has a constant RHS, we continue to send a
mask-and-xor expression to the kernel.
Add tests for {ct,meta} mark with variable RHS operands.
JSON support is also included.
This requires Linux kernel >= 6.13-rc.
[ Originally posted as patch 1/8 and 6/8 which has been collapsed and
simplified to focus on initial {ct,meta} mark support. Tests have
been extracted from 8/8 including a tests/py fix to payload output
due to incorrect output in original patchset. JSON support has been
extracted from patch 7/8 --pablo]
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow to specify a numeric queue id as part of a map.
The parser side is easy, but the reverse direction (listing) is not.
'queue' is a statement, it doesn't have an expression.
Add a generic 'queue_type' datatype as a shim to the real basetype with
constant expressions, this is used only for udata build/parse, it stores
the "key" (the parser token, here "queue") as udata in kernel and can
then restore the original key.
Add a dumpfile to validate parser & output.
JSON support is missing because JSON allow typeof only since quite
recently.
Joint work with Pablo.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1455
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
Leftover unused struct datatype field, remove it.
Fixes: e35aabd511c4 ("datatype: replace DTYPE_F_ALLOC by bitfield")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
These were entirely ignored before, add the necessary code analogous to
e.g. objects.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Large sets can expand into several netlink messages, use sequence number
and attribute offset to correlate the set element and the location.
When set element command expands into several netlink messages,
increment sequence number for each netlink message. Update struct cmd to
store the range of netlink messages that result from this command.
struct nlerr_loc remains in the same size in x86_64.
# nft -f set-65535.nft
set-65535.nft:65029:22-32: Error: Could not process rule: File exists
create element x y { 1.1.254.253 }
^^^^^^^^^^^
Fixes: f8aec603aa7e ("src: initial extended netlink error reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum netlink message length (nlh->nlmsg_len) is uint32_t, struct
nlerr_loc stores the offset to the netlink attribute which must be
uint32_t, not uint16_t.
While at it, remove check for zero netlink attribute offset in
nft_cmd_error() which should not ever happen, likely this check was
there to prevent the uint16_t offset overflow.
Fixes: f8aec603aa7e ("src: initial extended netlink error reporting")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
To prepare for a fix for very large sets.
No functional change is intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
rename mnl_seqnum_alloc() to mnl_seqnum_inc().
No functional change is intended.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
498a5f0c219d ("rule: collapse set element commands") does not help to
reduce memory consumption in the case of large sets defined by one
element per line:
add element ip x y { 1.1.1.1 }
add element ip x y { 1.1.1.2 }
...
This patch reduces memory consumption by ~75%, set elements are
collapsed into an existing cmd object wherever possible to reduce the
number of cmd objects.
This patch also adds a special case for variables for sets similar to:
be055af5c58d ("cmd: skip variable set elements when collapsing commands")
This patch requires this small kernel fix:
commit b53c116642502b0c85ecef78bff4f826a7dd4145
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri May 20 00:02:06 2022 +0200
netfilter: nf_tables: set element extended ACK reporting support
which is already included in recent -stable kernels:
# cat ruleset.nft
add table ip x
add chain ip x y
add set ip x y { type ipv4_addr; }
create element ip x y { 1.1.1.1 }
create element ip x y { 1.1.1.1 }
# nft -f ruleset.nft
ruleset.nft:5:25-31: Error: Could not process rule: File exists
create element ip x y { 1.1.1.1 }
^^^^^^^
since there is no need to relate commands via sequence number anymore,
this allows also removes the uncollapse step.
Fixes: 498a5f0c219d ("rule: collapse set element commands")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow to specify elements that never expire in sets with global
timeout.
set x {
typeof ip saddr
timeout 1m
elements = { 1.1.1.1 timeout never,
2.2.2.2,
3.3.3.3 timeout 2m }
}
in this example above:
- 1.1.1.1 is a permanent element
- 2.2.2.2 expires after 1 minute (uses default set timeout)
- 3.3.3.3 expires after 2 minutes (uses specified timeout override)
Use internal NFT_NEVER_TIMEOUT marker as UINT64_MAX to differenciate
between use default set timeout and timeout never if "timeout N" is used
in set declaration. Maximum supported timeout in milliseconds which is
conveyed within a netlink attribute is 0x10c6f7a0b5ec.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reset command does not utilize the cache infrastructure.
This implicitly fixes a crash with anonymous sets because elements are
not fetched. I initially tried to fix it by toggling the missing cache
flags, but then ASAN reports memleaks.
To address these issues relies on Phil's list filtering infrastructure
which updates is expanded to accomodate filtering requirements of the
reset commands, such as 'reset table ip' where only the family is sent
to the kernel.
After this update, tests/shell reports a few inconsistencies between
reset and list commands:
- reset rules chain t c2
display sets, but it should only list the given chain.
- reset rules table t
reset rules ip
do not list elements in the set. In both cases, these are fully
listing a given table and family, elements should be included.
The consolidation also ensures list and reset will not differ.
A few more notes:
- CMD_OBJ_TABLE is used for:
rules family table
from the parser, due to the lack of a better enum, same applies to
CMD_OBJ_CHAIN.
- CMD_OBJ_ELEMENTS still does not use the cache, but same occurs in
the CMD_GET command case which needs to be consolidated.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1763
Fixes: 83e0f4402fb7 ("Implement 'reset {set,map,element}' commands")
Fixes: 1694df2de79f ("Implement 'reset rule' and 'reset rules' commands")
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, full ruleset flag is set on to fetch objects.
Follow a similar approach to these patches from Phil:
de961b930660 ("cache: Filter set list on server side") and
cb4b07d0b628 ("cache: Support filtering for a specific flowtable")
in preparation to update the reset command to use the cache
infrastructure.
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
Only user of the datatype flags field is DTYPE_F_ALLOC, replace it by
bitfield, squash byteorder to 8 bits which is sufficient.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
only ipv4 and ipv6 datatype support this, add datatype_prefix_notation()
helper function to report that datatype prefers prefix notation, if
possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Instead of not returning any results for
nft list hooks netdev
Iterate all interfaces and then query all of them.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updates on verdict maps that require many non-base chains are slowed
down due to fetching existing non-base chains into the cache.
Chains are only required for error reporting hints if kernel reports
ENOENT. Populate the cache from this error path only.
Similar approach already exists from rule ENOENT error path since:
deb7c5927fad ("cmd: add misspelling suggestions for rule commands")
however, NFT_CACHE_CHAIN was toggled inconditionally for rule
commands, rendering this on-demand cache population useless.
before this patch, running Neels' nft_slew benchmark (peak values):
created idx 4992 in 52587950 ns (128 in 7122 ms)
...
deleted idx 128 in 43542500 ns (127 in 6187 ms)
after this patch:
created idx 4992 in 11361299 ns (128 in 1612 ms)
...
deleted idx 1664 in 5239633 ns (128 in 733 ms)
Tested-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
since commit b98fee20bfe2 ("mnl: revisit hook listing"), handle.chain is
never set in this path, so 'hook' is always set to -1, so the hook arg
can be dropped.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
| |
Removed two years ago with v6.1, ditch this from hook list code as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Add a string preprocessor to identify and replace variables in a string.
Rework existing support to variables in log prefix strings to use it.
Fixes: e76bb3794018 ("src: allow for variables in the log prefix string")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Bison parser lacked support for passing multiple flags, JSON parser
did not support table flags at all.
Document also 'owner' flag (and describe their relationship in nft.8.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, ICMP{v4,v6,inet} code datatypes only describe those that are
supported by the reject statement, but they can also be used for icmp
code matching. Moreover, ICMP code types go hand-to-hand with ICMP
types, that is, ICMP code symbols depend on the ICMP type.
Thus, the output of:
nft describe icmp_code
look confusing because that only displays the values that are supported
by the reject statement.
Disentangle this by adding internal datatypes for the reject statement
to handle the ICMP code symbol conversion to value as well as ruleset
listing.
The existing icmp_code, icmpv6_code and icmpx_code remain in place. For
backward compatibility, a parser function is defined in case an existing
ruleset relies on these symbols.
As for the manpage, move existing ICMP code tables from the DATA TYPES
section to the REJECT STATEMENT section, where this really belongs to.
But the icmp_code and icmpv6_code table stubs remain in the DATA TYPES
section because that describe that this is an 8-bit integer field.
After this patch:
# nft describe icmp_code
datatype icmp_code (icmp code) (basetype integer), 8 bits
# nft describe icmpv6_code
datatype icmpv6_code (icmpv6 code) (basetype integer), 8 bits
# nft describe icmpx_code
datatype icmpx_code (icmpx code) (basetype integer), 8 bits
do not display the symbol table of the reject statement anymore.
icmpx_code_type is not used anymore, but keep it in place for backward
compatibility reasons.
And update tests/shell accordingly.
Fixes: 5fdd0b6a0600 ("nft: complete reject support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f8f32deda31d ("meta: Introduce new conditions 'time', 'day' and 'hour'")
reverses the hour range in case that a cross-day range is used, eg.
meta hour "03:00"-"14:00" counter accept
which results in (Sidney, Australia AEDT time):
meta hour != "14:00"-"03:00" counter accept
kernel handles time in UTC, therefore, cross-day range may not be
obvious according to local time.
The ruleset listing above is not very intuitive to the reader depending
on their timezone, therefore, complete netlink delinearize path to
reverse the cross-day meta range.
Update manpage to recommend to use a range expression when matching meta
hour range. Recommend range expression for meta time and meta day too.
Extend testcases/listing/meta_time to cover for this scenario.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1737
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The included sample causes a crash because we attempt to
range-merge a prefix expression with a symbolic expression.
The first set is evaluated, the symbol expression evaluation fails
and nft queues an error message ("Could not resolve hostname").
However, nft continues evaluation.
nft then encounters the same set definition again and merges the
new content with the preceeding one.
But the first set structure is dodgy, it still contains the
unresolved symbolic expression.
That then makes nft crash (assert) in the set internals.
There are various different incarnations of this issue, but the low
level set processing code does not allow for any partially transformed
expressions to still remain.
Before:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
BUG: invalid range expression type binop
nft: src/expression.c:1479: range_expr_value_low: Assertion `0' failed.
After:
nft --check -f tests/shell/testcases/bogons/nft-f/invalid_range_expr_type_binop
invalid_range_expr_type_binop:4:18-25: Error: Could not resolve hostname: Name or service not known
elements = { 1&.141.0.1 - 192.168.0.2}
^^^^^^^^
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
129f9d153279 ("nft: migrate man page examples with `meter` directive to
sets") already replaced meters by dynamic sets.
This patch removes NFT_SET_ANONYMOUS flag from the implicit set that is
instantiated via meter, so the listing shows a dynamic set instead which
is the recommended approach these days.
Therefore, a batch like this:
add table t
add chain t c
add rule t c tcp dport 80 meter m size 128 { ip saddr timeout 1s limit rate 10/second }
gets translated to a dynamic set:
table ip t {
set m {
type ipv4_addr
size 128
flags dynamic,timeout
}
chain c {
tcp dport 80 update @m { ip saddr timeout 1s limit rate 10/second burst 5 packets }
}
}
Check for NFT_SET_ANONYMOUS flag is also relaxed for list and flush
meter commands:
# nft list meter ip t m
table ip t {
set m {
type ipv4_addr
size 128
flags dynamic,timeout
}
}
# nft flush meter ip t m
As a side effect the legacy 'list meter' and 'flush meter' commands allow
to flush a dynamic set to retain backward compatibility.
This patch updates testcases/sets/0022type_selective_flush_0 and
testcases/sets/0038meter_list_0 as well as the json output which now
uses the dynamic set representation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AFL found following crash:
table ip filter {
map ipsec_in {
typeof ipsec in reqid . iif : verdict
flags interval
}
chain INPUT {
type filter hook input priority filter; policy drop;
ipsec in reqid . 100 @ipsec_in
}
}
Which yields:
nft: evaluate.c:1213: expr_evaluate_unary: Assertion `!expr_is_constant(arg)' failed.
All existing test cases with constant values use big endian values, but
"iif" expects host endian values.
As raw values were not supported before, concat byteorder conversion
doesn't handle constants.
Fix this:
1. Add constant handling so that the number is converted in-place,
without unary expression.
2. Add the inverse handling on delinearization for non-interval set
types.
When dissecting the concat data soup, watch for integer constants where
the datatype indicates host endian integer.
Last, extend an existing test case with the afl input to cover
in/output.
A new test case is added to test linearization, delinearization and
matching.
Based on original patch from Florian Westphal, patch subject and
description wrote by him.
Fixes: b422b07ab2f9 ("src: permit use of constant values in set lookup keys")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement a symbol_table_print() wrapper for the run-time populated
rt_symbol_tables which formats output similar to expr_describe() and
includes the data source.
Since these tables reside in struct output_ctx there is no implicit
connection between data type and therefore providing callbacks for
relevant datat types which feed the data into said wrapper is a simpler
solution than extending expr_describe() itself.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
netlink_linearize.c has never supported more than 16 chained binops.
Adding more is possible but overwrites the stack in
netlink_gen_bitwise().
Add a recursion counter to catch this at eval stage.
Its not enough to just abort once the counter hits
NFT_MAX_EXPR_RECURSION.
This is because there are valid test cases that exceed this.
For example, evaluation of 1 | 2 will merge the constans, so even
if there are a dozen recursive eval calls this will not end up
with large binop chain post-evaluation.
v2: allow more than 16 binops iff the evaluation function
did constant-merging.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
| |
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kernel will reject this too, but unfortunately nft may try
to cram the data into the underlying libnftnl expr.
This causes heap corruption or
BUG: nld buffer overflow: want to copy 132, max 64
After:
Error: Concatenation of size 544 exceeds maximum size of 512
udp length . @th,0,512 . @th,512,512 { 47-63 . 0xe373135363130 . 0x33131303735353203 }
^^^^^^^^^
resp. same warning for an over-sized raw expression.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a stack overflow somewhere in this code, we end
up memcpy'ing a way too large expr into a fixed-size on-stack
buffer.
This is hard to diagnose, most of this code gets inlined so
the crash happens later on return from alloc_nftnl_setelem.
Condense the mempy into a helper and add a BUG so we can catch
the overflow before it occurs.
->value is too small (4, should be 16), but for normal
cases (well-formed data must fit into max reg space, i.e.
64 byte) the chain buffer that comes after value in the
structure provides a cushion.
In order to have the new BUG() not trigger on valid data,
bump value to the correct size, this is userspace so the additional
60 bytes of stack usage is no concern.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch consolidates ctx->stmt_len reset in stmt_evaluate() to avoid
this problem. Note that stmt_evaluate_meta() and stmt_evaluate_ct()
already reset it after the statement evaluation.
Moreover, statement dependency can be generated while evaluating a meta
and ct statement. Payload statement dependency already manually stashes
this before calling stmt_evaluate(). Add a new stmt_dependency_evaluate()
function to stash statement length context when evaluating a new statement
dependency and use it for all of the existing statement dependencies.
Florian also says:
'meta mark set vlan id map { 1 : 0x00000001, 4095 : 0x00004095 }' will
crash. Reason is that the l2 dependency generated here is errounously
expanded to a 32bit-one, so the evaluation path won't recognize this
as a L2 dependency. Therefore, pctx->stacked_ll_count is 0 and
__expr_evaluate_payload() crashes with a null deref when
dereferencing pctx->stacked_ll[0].
nft-test.py gains a fugly hack to tolerate '!map typeof vlan id : meta mark'.
For more generic support we should find something more acceptable, e.g.
!map typeof( everything here is a key or data ) timeout ...
tests/py update and assert(pctx->stacked_ll_count) by Florian Westphal.
Fixes: edecd58755a8 ("evaluate: support shifts larger than the width of the left operand")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
xmalloc() (and similar x-functions) are used for allocation. They wrap
malloc()/realloc() but will abort the program on ENOMEM.
The meaning of xmalloc() is that it wraps malloc() but aborts on
failure. I don't think x-functions should have the notion, that this
were potentially a different memory allocator that must be paired
with a particular xfree().
Even if the original intent was that the allocator is abstracted (and
possibly not backed by standard malloc()/free()), then that doesn't seem
a good idea. Nowadays libc allocators are pretty good, and we would need
a very special use cases to switch to something else. In other words,
it will never happen that xmalloc() is not backed by malloc().
Also there were a few places, where a xmalloc() was already "wrongly"
paired with free() (for example, iface_cache_release(), exit_cookie(),
nft_run_cmd_from_buffer()).
Or note how pid2name() returns an allocated string from fscanf(), which
needs to be freed with free() (and not xfree()). This requirement
bubbles up the callers portid2name() and name_by_portid(). This case was
actually handled correctly and the buffer was freed with free(). But it
shows that mixing different allocators is cumbersome to get right. Of
course, we don't actually have different allocators and whether to use
free() or xfree() makes no different. The point is that xfree() serves
no actual purpose except raising irrelevant questions about whether
x-functions are correctly paired with xfree().
Note that xfree() also used to accept const pointers. It is bad to
unconditionally for all deallocations. Instead prefer to use plain
free(). To free a const pointer use free_const() which obviously wraps
free, as indicated by the name.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Almost everywhere xmalloc() and friends is used instead of malloc().
This is almost everywhere paired with xfree().
xfree() has two problems. First, it brings the wrong notion that
xmalloc() should be paired with xfree(), as if xmalloc() would not use
the plain malloc() allocator. In practices, xfree() just wraps free(),
and it wouldn't make sense any other way. xfree() should go away. This
will be addressed in the next commit.
The problem addressed by this commit is that xfree() accepts a const
pointer. Paired with the practice of almost always using xfree() instead
of free(), all our calls to xfree() cast away constness of the pointer,
regardless whether that is necessary. Declaring a pointer as const
should help us to catch wrong uses. If the xfree() function always casts
aways const, the compiler doesn't help.
There are many places that rightly cast away const during free. But not
all of them. Add a free_const() macro, which is like free(), but accepts
const pointers. We should always make an intentional choice whether to
use free() or free_const(). Having a free_const() macro makes this very
common choice clearer, instead of adding a (void*) cast at many places.
Note that we now pair xmalloc() allocations with a free() call (instead
of xfree(). That inconsistency will be resolved in the next commit.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mpz_get_str() (with NULL as first argument) will allocate a buffer using
the allocator functions (mp_set_memory_functions()). We should free
those buffers with the corresponding free function.
Add nft_gmp_free() for that and use it.
The name nft_gmp_free() is chosen because "mini-gmp.c" already has an
internal define called gmp_free(). There wouldn't be a direct conflict,
but using the same name is confusing. And maybe our own defines should
have a clear nft prefix.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|