| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If user forgets to specify the hook and priority and the flowtable does
not exist, then bail out:
# cat flowtable-incomplete.nft
table t {
flowtable f {
devices = { lo }
}
}
# nft -f /tmp/k
flowtable-incomplete.nft:2:12-12: Error: missing hook and priority in flowtable declaration
flowtable f {
^
Update one existing tests/shell to specify a hook and priority.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to add/remove devices to an existing chain:
# cat ruleset.nft
table netdev x {
chain y {
type filter hook ingress devices = { eth0 } priority 0; policy accept;
}
}
# nft -f ruleset.nft
# nft add chain netdev x y '{ devices = { eth1 }; }'
# nft list ruleset
table netdev x {
chain y {
type filter hook ingress devices = { eth0, eth1 } priority 0; policy accept;
}
}
# nft delete chain netdev x y '{ devices = { eth0 }; }'
# nft list ruleset
table netdev x {
chain y {
type filter hook ingress devices = { eth1 } priority 0; policy accept;
}
}
This feature allows for creating an empty netdev chain, with no devices.
In such case, no packets are seen until a device is registered.
This patch includes extended netlink error reporting:
# nft add chain netdev x y '{ devices = { x } ; }'
Error: Could not process rule: No such file or directory
add chain netdev x y { devices = { x } ; }
^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Users have to specify a transport protocol match such as
meta l4proto tcp
before the redirect statement, even if the redirect statement already
implicitly refers to the transport protocol, for instance:
test.nft:3:16-53: Error: transport protocol mapping is only valid after transport protocol match
redirect to :tcp dport map { 83 : 8083, 84 : 8084 }
~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Evaluate the redirect expression before the mandatory check for the
transport protocol match, so protocol context already provides a
transport protocol.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Get length from statement, instead infering it from the expression that
is used to set the value. In the particular case of {ct|meta} mark, this
is 32 bits.
Otherwise, bytecode generation is not correct:
# nft -c --debug=netlink 'add rule ip6 x y ct mark set ip6 dscp << 2 | 0x10'
[ payload load 2b @ network header + 0 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x0000c00f ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000006 ) ]
[ byteorder reg 1 = ntoh(reg 1, 2, 1) ]
[ bitwise reg 1 = ( reg 1 << 0x00000002 ) ]
[ bitwise reg 1 = ( reg 1 & 0x00000fef ) ^ 0x00000010 ] <--- incorrect!
[ ct set mark with reg 1 ]
the previous bitwise shift already upgraded to 32-bits (not visible from
the netlink debug output above).
After this patch, the last | 0x10 uses 32-bits:
[ bitwise reg 1 = ( reg 1 & 0xffffffef ) ^ 0x00000010 ]
note that mask 0xffffffef is used instead of 0x00000fef.
Patch ("evaluate: support shifts larger than the width of the left operand")
provides the statement length through eval context. Use it to evaluate the
bitwise expression accordingly, otherwise bytecode is incorrect:
# nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000'
ip x y
[ payload load 1b @ network header + 1 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ]
[ bitwise reg 1 = ( reg 1 & 0x1e000000 ) ^ 0x000000ff ] <-- incorrect byteorder for OR
[ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <-- no needed for single ip dscp byte
[ ct set mark with reg 1 ]
Correct bytecode:
# nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000
ip x y
[ payload load 1b @ network header + 1 => reg 1 ]
[ bitwise reg 1 = ( reg 1 & 0x000000fc ) ^ 0x00000000 ]
[ bitwise reg 1 = ( reg 1 >> 0x00000002 ) ]
[ bitwise reg 1 = ( reg 1 & 0x0000001e ) ^ 0xff000000 ]
[ ct set mark with reg 1 ]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, bogus error is reported:
# nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1 | 0xff000000'
Error: Value 4278190080 exceeds valid range 0-63
add rule ip x y ct mark set ip dscp & 0x0f << 1 | 0xff000000
^^^^^^^^^^
Use the statement length as the maximum value in the mark statement
expression.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
Otherwise expr_evaluate_value() fails with invalid datatype:
# nft --debug=netlink add rule ip x y 'ct mark set ip dscp & 0x0f << 1'
BUG: invalid basetype invalid
nft: evaluate.c:440: expr_evaluate_value: Assertion `0' failed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to be able to set ct and meta marks to values derived from
payload expressions, we need to relax the requirement that the type of
the statement argument must match that of the statement key. Instead,
we require that the base-type of the argument is integer and that the
argument is small enough to fit.
Moreover, swap expression byteorder before to make it compatible with
the statement byteorder, to ensure rulesets are portable.
# nft --debug=netlink add rule ip t c 'meta mark set ip saddr'
ip t c
[ payload load 4b @ network header + 12 => reg 1 ]
[ byteorder reg 1 = ntoh(reg 1, 4, 4) ] <----------- byteorder swap
[ meta set mark with reg 1 ]
Based on original work from Jeremy Sowden.
The following patches are required for this to work:
evaluate: get length from statement instead of lhs expression
evaluate: don't eval unary arguments
evaluate: support shifts larger than the width of the left operand
netlink_delinearize: correct type and byte-order of shifts
evaluate: insert byte-order conversions for expressions between 9 and 15 bits
Add one testcase for tests/py.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a unary expression is inserted to implement a byte-order
conversion, the expression being converted has already been evaluated
and so `expr_evaluate_unary` doesn't need to do so.
This is required by {ct|meta} statements with bitwise operations, which
might result in byteorder conversion of the expression.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we want to left-shift a value of narrower type and assign the result
to a variable of a wider type, we are constrained to only shifting up to
the width of the narrower type. Thus:
add rule t c meta mark set ip dscp << 2
works, but:
add rule t c meta mark set ip dscp << 8
does not, even though the lvalue is large enough to accommodate the
result.
Upgrade the maximum length based on the statement datatype length, which
is provided via context, if it is larger than expression lvalue.
Update netlink_delinearize.c to handle the case where the length of a
shift expression does not match that of its left-hand operand.
Based on patch from Jeremy Sowden.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Round up expression lengths when determining whether to insert a
byte-order conversion. For example, if one is masking a network header
which spans a byte boundary, the mask will span two bytes and so it will
need to be in NBO.
Fixes: bb03cbcd18a1 ("evaluate: no need to swap byte-order for values of fewer than 16 bits.")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts eab3eb7f146c ("evaluate: relax type-checking for
integer arguments in mark statements") since it might cause ruleset
portability issues when moving a ruleset from little to big endian
host (and vice-versa).
Let's revert this until we agree on what to do in this case.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
8c75d3a16960 ("Reject invalid chain priority values in user space")
provides error reporting from the evaluation phase. Instead, this patch
infers the error after the kernel reports EOPNOTSUPP.
test.nft:3:28-40: Error: Chains of type "nat" must have a priority value above -200
type nat hook prerouting priority -300;
^^^^^^^^^^^^^
This patch also adds another common issue for users compiling their own
kernels if they forget to enable CONFIG_NFT_NAT in their .config file.
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
The kernel doesn't accept nat type chains with a priority of -200 or
below. Catch this and provide a better error message than the kernel's
EOPNOTSUPP.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new statement allows you to know how long ago there was a matching
packet.
# nft list ruleset
table ip x {
chain y {
[...]
ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64
}
}
if this statement never sees a packet, then the listing says:
ip protocol icmp last used never counter packets 0 bytes 0
Add tests/py in this patch too.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the data in the mapping contains a range, then upgrade value to range.
Otherwise, the following error is displayed:
/dev/stdin:11:57-75: Error: Could not process rule: Invalid argument
dnat ip to iifname . ip saddr map { enp2s0 . 10.1.1.136 : 1.1.2.69, enp2s0 . 10.1.1.1-10.1.1.135 : 1.1.2.66-1.84.236.78 }
^^^^^^^^^^^^^^^^^^^
The kernel rejects this command because userspace sends a single value
while the kernel expects the range that represents the min and the max
IP address to be used for NAT. The upgrade is also done when concatenation
with intervals is used in the rhs of the mapping.
For anonymous sets, expansion cannot be done from expr_evaluate_mapping()
because the EXPR_F_INTERVAL flag is inferred from the elements. For
explicit sets, this can be done from expr_evaluate_mapping() because the
user already specifies the interval flag in the rhs of the map definition.
Update tests/shell and tests/py to improve testing coverage in this case.
Fixes: 9599d9d25a6b ("src: NAT support for intervals in maps")
Fixes: 66746e7dedeb ("src: support for nat with interval concatenation")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nested syntax notation results in one single table command which
includes all other objects. This differs from the flat notation where
there is usually one command per object.
This patch adds a previous step to the evaluation phase to expand the
objects that are contained in the table into independent commands, so
both notations have similar representations.
Remove the code to evaluate the nested representation in the evaluation
phase since commands are independently evaluated after the expansion.
The commands are expanded after the set element collapse step, in case
that there is a long list of singleton element commands to be added to
the set, to shorten the command list iteration.
This approach also avoids interference with the object cache that is
populated in the evaluation, which might refer to objects coming in the
existing command list that is being processed.
There is still a post_expand phase to detach the elements from the set
which could be consolidated by updating the evaluation step to handle
the CMD_OBJ_SETELEMS command type.
This patch fixes 27c753e4a8d4 ("rule: expand standalone chain that
contains rules") which broke rule addition/insertion by index because
the expansion code after the evaluation messes up the cache.
Fixes: 27c753e4a8d4 ("rule: expand standalone chain that contains rules")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
If the key in the nat mapping is either ip or ip6, then set the nat
family accordingly, no need for explicit family in the nat statement.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Print error message in case family cannot be inferred, before this
patch, $? shows 1 after nft execution but no error message was printed.
While at it, update error reporting for consistency in similar use
cases.
Fixes: e5c9c8fe0bcc ("evaluate: stmt_evaluate_nat_map() only if stmt->nat.ipportmap == true")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to be able to set ct and meta marks to values derived from
payload expressions, we need to relax the requirement that the type of
the statement argument must match that of the statement key. Instead,
we require that the base-type of the argument is integer and that the
argument is small enough to fit.
Add one testcase for tests/py.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"destroy" command performs a deletion as "delete" command but does not fail
if the object does not exist. As there is no NLM_F_* flag for ignoring such
error, it needs to be ignored directly on error handling.
Example of use:
# nft list ruleset
table ip filter {
chain output {
}
}
# nft destroy table ip missingtable
# echo $?
0
# nft list ruleset
table ip filter {
chain output {
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eric reports that nft asserts when using integer basetype constants with
'typeof' sets. Example:
table netdev t {
set s {
typeof ether saddr . vlan id
flags dynamic,timeout
}
chain c { }
}
loads fine. But adding a rule with add/update statement fails:
nft 'add rule netdev t c set update ether saddr . 0 @s'
nft: netlink_linearize.c:867: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed.
When the 'ether saddr . 0' concat expression is processed, there is
no set definition available anymore to deduce the required size of the
integer constant.
nft eval step then derives the required length using the data types.
'0' has integer basetype, so the deduced length is 0.
The assertion triggers because serialization step finds that it
needs one more register.
2 are needed to store the ethernet address, another register is
needed for the vlan id.
Update eval step to make the expression context store the set key
information when processing the preceeding set reference, then
let stmt_evaluate_set() preserve the existing context instead of
zeroing it again via stmt_evaluate_arg().
This makes concat expression evaluation compute the total size
needed based on the sets key definition.
Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Reset rule counters and quotas in kernel, i.e. without having to reload
them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET
message type.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GRE has a number of fields that are conditional based on flags,
which requires custom dependency code similar to icmp and icmpv6.
Matching on optional fields is not supported at this stage.
Since this is a layer 3 tunnel protocol, an implicit dependency on
NFT_META_L4PROTO for IPPROTO_GRE is generated. To achieve this, this
patch adds new infrastructure to remove an outer dependency based on
the inner protocol from delinearize path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For easier debugging, add decoration on protocol context:
# nft --debug=proto-ctx add rule netdev x y udp dport 4789 vxlan ip protocol icmp counter
update link layer protocol context (inner):
link layer : netdev <-
network layer : none
transport layer : none
payload data : none
update network layer protocol context (inner):
link layer : netdev
network layer : ip <-
transport layer : none
payload data : none
update network layer protocol context (inner):
link layer : netdev
network layer : ip <-
transport layer : none
payload data : none
update transport layer protocol context (inner):
link layer : netdev
network layer : ip
transport layer : icmp <-
payload data : none
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the initial infrastructure to support for inner header
tunnel matching and its first user: vxlan.
A new struct proto_desc field for payload and meta expression to specify
that the expression refers to inner header matching is used.
The existing codebase to generate bytecode is fully reused, allowing for
reusing existing supported layer 2, 3 and 4 protocols.
Syntax requires to specify vxlan before the inner protocol field:
... vxlan ip protocol udp
... vxlan ip saddr 1.2.3.0/24
This also works with concatenations and anonymous sets, eg.
... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 }
You have to restrict vxlan matching to udp traffic, otherwise it
complains on missing transport protocol dependency, e.g.
... udp dport 4789 vxlan ip daddr 1.2.3.4
The bytecode that is generated uses the new inner expression:
# nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
netdev x y
[ meta load l4proto => reg 1 ]
[ cmp eq reg 1 0x00000011 ]
[ payload load 2b @ transport header + 2 => reg 1 ]
[ cmp eq reg 1 0x0000b512 ]
[ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
[ cmp eq reg 1 0x00000008 ]
[ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
[ cmp eq reg 1 0x04030201 ]
JSON support is not included in this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Add eval_proto_ctx() to access protocol context (struct proto_ctx).
Rename struct proto_ctx field to _pctx to highlight that this field
is internal and the helper function should be used.
This patch comes in preparation for supporting outer and inner
protocol context.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an underflow of the index that iterates over the concatenation:
../include/datatype.h:292:15: runtime error: shift exponent 4294967290 is too large for 32-bit type 'unsigned int'
set the datatype to invalid which is fine to evaluate a concatenation
in a set/map statement.
Update b8e1940aa190 ("tests: add a test case for map update from packet
path with concat") so it does not need a workaround to work.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Map updates can use timeouts, just like with sets, but the
linearization step did not pass this info to the kernel.
meta l4proto tcp update @pinned { ip saddr . ct original proto-src timeout 90s : ip daddr . tcp dport
Listing this won't show the "timeout 90s" because kernel never saw it to
begin with.
Also update evaluation step to reject a timeout that was set on
the data part: Timeouts are only allowed for the key-value pair
as a whole.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set pointer to list of expression to NULL and check that it is set on
before using it.
In function ‘expr_evaluate_concat’,
inlined from ‘expr_evaluate’ at evaluate.c:2488:10:
evaluate.c:1338:20: warning: ‘expressions’ may be used uninitialized [-Wmaybe-uninitialized]
1338 | if (runaway) {
| ^
evaluate.c: In function ‘expr_evaluate’:
evaluate.c:1321:33: note: ‘expressions’ was declared here
1321 | const struct list_head *expressions;
| ^~~~~~~~~~~
Reported-by: Florian Westphal <fw@strlen.de>
Fixes: 508f3a270531 ("netlink: swap byteorder of value component in concatenation of intervals")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Display error message in case user specifies more data components than
those defined by the concatenation of selectors.
# cat example.nft
table ip x {
chain y {
type filter hook prerouting priority 0; policy drop;
ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept
}
}
# nft -f example.nft
example.nft:4:3-22: Error: too many concatenation components
ip saddr . meta mark { 1.2.3.4 . 0x00000100 . 1.2.3.6-1.2.3.8 } accept
~~~~~~~~~~~~~~~~~~~~ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Without this patch, nft crashes:
==464771==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60d000000418 at pc 0x7fbc17513aa5 bp 0x7ffc73d33c90 sp 0x7ffc73d33c88
READ of size 8 at 0x60d000000418 thread T0
#0 0x7fbc17513aa4 in expr_evaluate_concat /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1348
#1 0x7fbc1752a9da in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2476
#2 0x7fbc175175e2 in expr_evaluate_set_elem /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1504
#3 0x7fbc1752aa22 in expr_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:2482
#4 0x7fbc17512cb5 in list_member_evaluate /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1310
#5 0x7fbc17518ca0 in expr_evaluate_set /home/pablo/devel/scm/git-netfilter/nftables/src/evaluate.c:1590
[...]
Fixes: 64bb3f43bb96 ("src: allow to use typeof of raw expressions in set declaration")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Assuming the following interval set with concatenation:
set test {
typeof ip saddr . meta mark
flags interval
}
then, the following rule:
ip saddr . meta mark @test
requires bytecode that swaps the byteorder for the meta mark selector in
case the set contains intervals and concatenations.
inet x y
[ meta load nfproto => reg 1 ]
[ cmp eq reg 1 0x00000002 ]
[ payload load 4b @ network header + 12 => reg 1 ]
[ meta load mark => reg 9 ]
[ byteorder reg 9 = hton(reg 9, 4, 4) ] <----- this is required !
[ lookup reg 1 set test dreg 0 ]
This patch updates byteorder_conversion() to add the unary expression
that introduces the byteorder expression.
Moreover, store the meta mark range component of the element tuple in
the set in big endian as it is required for the range comparisons. Undo
the byteorder swap in the netlink delinearize path to listing the meta
mark values accordingly.
Update tests/py to validate that byteorder expression is emitted in the
bytecode. Update tests/shell to validate insertion and listing of a
named map declaration.
A similar commit 806ab081dc9a ("netlink: swap byteorder for
host-endian concat data") already exists in the tree to handle this for
strings with prefix (e.g. eth*).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following ruleset:
ip version vmap { 4 : jump t3, 6 : jump t4 }
results in a memleak.
expr_evaluate_shift() overrides the datatype which results in a datatype
memleak after the binop transfer that triggers a left-shift of the
constant (in the map).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
Use datatype_equal(), otherwise dynamically allocated datatype fails
to fulfill the datatype pointer check, triggering the assertion:
nft: evaluate.c:1249: expr_evaluate_binop: Assertion `expr_basetype(left) == expr_basetype(right)' failed.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1636
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
'vlan id 1'
must also add a ethernet header dep, else nft fetches the payload from
header offset 0 instead of 14.
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
| |
nft add rule inet filter input vlan id 2
Error: conflicting protocols specified: ether vs. vlan
Refresh the current dependency after superseding the dummy
dependency to make this work.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a couple of spelling mistakes:
'expresion' -> 'expression'
and correct some non-native usages:
'allows to' -> 'allows one to'
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
| |
'rule inet dscpclassify dscp_match meta l4proto { udp } th dport { 3478 } th sport { 3478-3497, 16384-16387 } goto ct_set_ef'
works with 'nft add', but not 'nft insert', the latter yields: "BUG: unhandled op 4".
Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In verdict map, string values are accidentally treated as verdicts.
For example:
table t {
map foo {
type ipv4_addr : verdict
elements = {
192.168.0.1 : bar
}
}
chain output {
type filter hook output priority mangle;
ip daddr vmap @foo
}
}
Though "bar" is not a valid verdict (should be "jump bar" or something),
the string is taken as the element value. Then NFTA_DATA_VALUE is sent
to the kernel instead of NFTA_DATA_VERDICT. This would be rejected by
recent kernels. On older ones (e.g. v5.4.x) that don't validate the
type, a warning can be seen when the rule is hit, because of the
corrupted verdict value:
[5120263.467627] WARNING: CPU: 12 PID: 303303 at net/netfilter/nf_tables_core.c:229 nft_do_chain+0x394/0x500 [nf_tables]
Indeed, we don't parse verdicts during evaluation, but only chain names,
which is of type string rather than verdict. For example, "jump $var" is
a verdict while "$var" is a string.
Fixes: c64457cff967 ("src: Allow goto and jump to a variable")
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"ether saddr 0:1:2:3:4:6 vlan id 2" works, but reverse fails:
"vlan id 2 ether saddr 0:1:2:3:4:6" will give
Error: conflicting protocols specified: vlan vs. ether
After "proto: track full stack of seen l2 protocols, not just cumulative offset",
we have a list of all l2 headers, so search those to see if we had this
proto base in the past before rejecting this.
Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For input, a cumulative size counter of all pushed l2 headers is enough,
because we have the full expression tree available to us.
For delinearization we need to track all seen l2 headers, else we lose
information that we might need at a later time.
Consider:
rule netdev nt nc set update ether saddr . vlan id
during delinearization, the vlan proto_desc replaces the ethernet one,
and by the time we try to split the concatenation apart we will search
the ether saddr offset vs. the templates for proto_vlan.
This replaces the offset with an array that stores the protocol
descriptions seen.
Then, if the payload offset is larger than our description, search the
l2 stack and adjust the offset until we're within the expected offset
boundary.
Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
If set declaration is missing the interval flag, and user specifies an
element with either prefix or range, then bail out.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1592
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding elements to a set or map with an invalid definition causes nft to
segfault. The following nftables.conf triggers the crash:
flush ruleset
create table inet filter
set inet filter foo {}
add element inet filter foo { foobar }
Simply parsing and checking the config will trigger it:
$ nft -c -f nftables.conf.crash
Segmentation fault
The error in the set/map definition is correctly caught and queued, but
because the set is invalid and does not contain a key type, adding to it
causes a NULL pointer dereference of set->key within setelem_evaluate().
I don't think it's necessary to queue another error since the underlying
problem is correctly detected and reported when parsing the definition
of the set. Simply checking the validity of set->key before using it
seems to fix it, causing the error in the definition of the set to be
reported properly. The element type error isn't caught, but that seems
reasonable since the key type is invalid or unknown anyway:
$ ./nft -c -f ~/nftables.conf.crash
/home/pti/nftables.conf.crash:3:21-21: Error: set definition does not specify key
set inet filter foo {}
^
[ Add tests to cover this case --pablo ]
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1597
Signed-off-by: Peter Tirsek <peter@tirsek.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise bogus error reports on set datatype mismatch might occur, such as:
Error: datatype mismatch, expected Internet protocol, expression has type IPv4 address
meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1
~~~~~~~~~~~~ ^^^^^^^^^^^^
with an unrelated set declaration.
table ip test {
set set_with_interval {
type ipv4_addr
flags interval
}
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
meta l4proto { tcp, udp } th dport 443 dnat to 10.0.0.1
}
}
This bug has been introduced in the evaluation step.
Reported-by: Roman Petrov <nwhisper@gmail.com>
Fixes: 81e36530fcac ("src: replace interval segment tree overlap and automerge)"
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
assert(1) is a no-op, this should be assert(0). Use BUG() instead.
Add missing CATCHALL to avoid BUG().
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"typeof ip saddr . ipsec in reqid" won't work because reqid uses
integer type, i.e. dtype->size is 0.
With "typeof", the size can be derived from the expression length,
via set->key.
This computes the concat length based either on dtype->size or
expression length.
It also updates concat evaluation to permit a zero datatype size
if the subkey expression has nonzero length (i.e., typeof was used).
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splice the existing set element cache with the elements to be deleted
and merge sort it. The elements to be deleted are identified by the
EXPR_F_REMOVE flag.
The set elements to be deleted is automerged in first place if the
automerge flag is set on.
There are four possible deletion scenarios:
- Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set.
- Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b.
- Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a.
- Split range, eg. delete [a-b] from range [x-y] where x < a and b < y.
Update nft_evaluate() to use the safe list variant since new commands
are dynamically registered to the list to update ranges.
This patch also restores the set element existence check for Linux
kernels <= 5.7.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Allow for ranges such as, eg. 30-30.
This is required by the new intervals.c code, which normalize constant,
prefix set elements to all ranges.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the interval codebase to support for merging elements in the
kernel with userspace element updates.
Add a list of elements to be purged to cmd and set objects. These
elements representing outdated intervals are deleted before adding the
updated ranges.
This routine splices the list of userspace and kernel elements, then it
mergesorts to identify overlapping and contiguous ranges. This splice
operation is undone so the set userspace cache remains consistent.
Incrementally update the elements in the cache, this allows to remove
dd44081d91ce ("segtree: Fix add and delete of element in same batch").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a rewrite of the segtree interval codebase.
This patch now splits the original set_to_interval() function in three
routines:
- add set_automerge() to merge overlapping and contiguous ranges.
The elements, expressed either as single value, prefix and ranges are
all first normalized to ranges. This elements expressed as ranges are
mergesorted. Then, there is a linear list inspection to check for
merge candidates. This code only merges elements in the same batch,
ie. it does not merge elements in the kernela and the userspace batch.
- add set_overlap() to check for overlapping set elements. Linux
kernel >= 5.7 already checks for overlaps, older kernels still needs
this code. This code checks for two conflict types:
1) between elements in this batch.
2) between elements in this batch and kernelspace.
The elements in the kernel are temporarily merged into the list of
elements in the batch to check for this overlaps. The EXPR_F_KERNEL
flag allows us to restore the set cache after the overlap check has
been performed.
- set_to_interval() now only transforms set elements, expressed as range
e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END
flag notation to represent e.g. [a,b+1), where b+1 has the
EXPR_F_INTERVAL_END flag set on.
More relevant updates:
- The overlap and automerge routines are now performed in the evaluation
phase.
- The userspace set object representation now stores a reference to the
existing kernel set object (in case there is already a set with this
same name in the kernel). This is required by the new overlap and
automerge approach.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To make something like "eth*" work for interval sets (match
eth0, eth1, and so on...) we must treat the string as a 128 bit
integer.
Without this, segtree will do the wrong thing when applying the prefix,
because we generate the prefix based on 'eth*' as input, with a length of 3.
The correct import needs to be done on "eth\0\0\0\0\0\0\0...", i.e., if
the input buffer were an ipv6 address, it should look like "eth\0::",
not "::eth".
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|