| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
All these are used to reset state in set/map elements, i.e. reset the
timeout or zero quota and counter values.
While 'reset element' expects a (list of) elements to be specified which
should be reset, 'reset set/map' will reset all elements in the given
set/map.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set SO_SNDBUF before SO_SNDBUFFORCE: Unpriviledged user namespace does
not have CAP_NET_ADMIN on the host (user_init_ns) namespace.
SO_SNDBUF always succeeds in Linux, always try SO_SNDBUFFORCE after it.
Moreover, suggest the user to bump socket limits if EMSGSIZE after
having see EPERM previously, when calling SO_SNDBUFFORCE.
Provide a hint to the user too:
# nft -f test.nft
netlink: Error: Could not process rule: Message too long
Please, rise /proc/sys/net/core/wmem_max on the host namespace. Hint: 4194304 bytes
Dave Pfike says:
Prior to this patch, nft inside a systemd-nspawn container was failing
to install my ruleset (which includes a large-ish map), with the error
netlink: Error: Could not process rule: Message too long
strace reveals:
setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
This is despite the nspawn process supposedly having CAP_NET_ADMIN.
A web search reveals at least one other user having the same issue:
https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
Reported-by: Dave Pifke <dave@pifke.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
Reset rule counters and quotas in kernel, i.e. without having to reload
them. Requires respective kernel patch to support NFT_MSG_GETRULE_RESET
message type.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the initial infrastructure to support for inner header
tunnel matching and its first user: vxlan.
A new struct proto_desc field for payload and meta expression to specify
that the expression refers to inner header matching is used.
The existing codebase to generate bytecode is fully reused, allowing for
reusing existing supported layer 2, 3 and 4 protocols.
Syntax requires to specify vxlan before the inner protocol field:
... vxlan ip protocol udp
... vxlan ip saddr 1.2.3.0/24
This also works with concatenations and anonymous sets, eg.
... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 }
You have to restrict vxlan matching to udp traffic, otherwise it
complains on missing transport protocol dependency, e.g.
... udp dport 4789 vxlan ip daddr 1.2.3.4
The bytecode that is generated uses the new inner expression:
# nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
netdev x y
[ meta load l4proto => reg 1 ]
[ cmp eq reg 1 0x00000011 ]
[ payload load 2b @ transport header + 2 => reg 1 ]
[ cmp eq reg 1 0x0000b512 ]
[ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
[ cmp eq reg 1 0x00000008 ]
[ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
[ cmp eq reg 1 0x04030201 ]
JSON support is not included in this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
Add dl_proto_ctx() to access protocol context (struct proto_ctx and
struct payload_dep_ctx) from the delinearize path.
This patch comes in preparation for supporting outer and inner
protocol context.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pablo reports:
add rule netdev nt y update @macset { vlan id timeout 5s }
listing still shows the raw expression:
update @macset { @ll,112,16 & 0xfff timeout 5s }
so also cover the 'set element' case.
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Input:
update ether saddr . vlan id timeout 5s @macset
ether saddr . vlan id @macset
Before this patch, gets rendered as:
update @macset { @ll,48,48 . @ll,112,16 & 0xfff timeout 5s }
@ll,48,48 . @ll,112,16 & 0xfff @macset
After this, listing will show:
update @macset { @ll,48,48 . vlan id timeout 5s }
@ll,48,48 . vlan id @macset
The @ll, ... is due to vlan description replacing the ethernet one,
so payload decode fails to take the concatenation apart (the ethernet
header payload info is matched vs. vlan template).
This will be adjusted by a followup patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
| |
If the user is requesting a chain listing, e.g. nft list chain x y
and a rule refers to an anonymous chain that cannot be found in the cache,
then fetch such anonymous chain and its ruleset.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of requesting a dump of all tables and filtering the data in
user space, construct a non-dump request if filter contains a table so
kernel returns only that single table.
This should improve nft performance in rulesets with many tables
present.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With an autogenerated ruleset with ~20k chains.
# time nft list ruleset &> /dev/null
real 0m1,712s
user 0m1,258s
sys 0m0,454s
Speed up listing of a specific chain:
# time nft list chain nat MWDG-UGR-234PNG3YBUOTS5QD &> /dev/null
real 0m0,542s
user 0m0,251s
sys 0m0,292s
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not call alloc_setelem_cache() to build the set element list in
nftnl_set. Instead, translate one single set element expression to
nftnl_set_elem object at a time and use this object to build the netlink
header.
Using a huge test set containing 1.1 million element blocklist, this
patch is reducing userspace memory consumption by 40%.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Partially revert 913979f882d1 ("src: add expression handler hashtable")
which is causing a crash with two instances of the nftables handler.
$ sudo python
[sudo] password for echerkashin:
Python 3.9.7 (default, Sep 3 2021, 06:18:44)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nftables import Nftables
>>> n1=Nftables()
>>> n2=Nftables()
>>> <Ctrl-D>
double free or corruption (top)
Aborted
Reported-by: Eugene Crosser <crosser@average.org>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refer to chain, not table.
Error: No such file or directory; did you mean table ‘z’ in family ip?
add chain x y { type filter nat prerouting priority dstnat; }
^
It should say instead:
Error: No such file or directory; did you mean chain ‘z’ in table ip ‘x’?
[ Florian added args check for fmt to the netlink_io_error() prototype. ]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
Add flowtable hashtable cache.
Actually I am not expecting that many flowtables to benefit from the
hashtable to be created by streamline this code with tables, chains,
sets and policy objects.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a hashtable for set lookups.
This patch also splits table->sets in two:
- Sets that reside in the cache are stored in the new
tables->cache_set and tables->cache_set_ht.
- Set that defined via command line / ruleset file reside in
tables->set.
Sets in the cache (already in the kernel) are not placed in the
table->sets list.
By keeping separated lists, sets defined via command line / ruleset file
can be added to cache.
Adding 10000 sets, before:
# time nft -f x
real 0m6,415s
user 0m3,126s
sys 0m3,284s
After:
# time nft -f x
real 0m3,949s
user 0m0,743s
sys 0m3,205s
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kernel provides information regarding expression since
83d9dcba06c5 ("netfilter: nf_tables: extended netlink error reporting for
expressions").
A common mistake is to refer a chain which does not exist, e.g.
# nft add rule x y jump test
Error: Could not process rule: No such file or directory
add rule x y jump test
^^^^
Use the existing netlink extended error reporting infrastructure to
provide better error reporting as in the example above.
Requires Linux kernel patch 83d9dcba06c5 ("netfilter: nf_tables:
extended netlink error reporting for expressions").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This significantly improves ruleset listing time with large rulesets
(~50k rules) with _lots_ of non-base chains.
# time nft list ruleset &> /dev/null
Before this patch:
real 0m11,172s
user 0m6,810s
sys 0m4,220s
After this patch:
real 0m4,747s
user 0m0,802s
sys 0m3,912s
This patch also removes list_bindings from netlink_ctx since there is no
need to keep a temporary list of chains anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
netlink_parsers is actually small, but update this code to use a
hashtable instead since more expressions may come in the future.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new field to the cmd structure for elements to store a
reference to the set. This saves an extra lookup in the netlink bytecode
generation step.
This patch also allows to incrementally update during the evaluation
phase according to the command actions, which is required by the follow
up ("evaluate: remove table from cache on delete table") bugfix patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to group rules in a subchain, e.g.
table inet x {
chain y {
type filter hook input priority 0;
tcp dport 22 jump {
ip saddr { 127.0.0.0/8, 172.23.0.0/16, 192.168.13.0/24 } accept
ip6 saddr ::1/128 accept;
}
}
}
This also supports for the `goto' chain verdict.
This patch adds a new chain binding list to avoid a chain list lookup from the
delinearize path for the usual chains. This can be simplified later on with a
single hashtable per table for all chains.
From the shell, you have to use the explicit separator ';', in bash you
have to escape this:
# nft add rule inet x y tcp dport 80 jump { ip saddr 127.0.0.1 accept\; ip6 saddr ::1 accept \; }
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to restore counters in dynamic sets:
table ip test {
set test {
type ipv4_addr
size 65535
flags dynamic,timeout
timeout 30d
gc-interval 1d
elements = { 192.168.10.13 expires 19d23h52m27s576ms counter packets 51 bytes 17265 }
}
chain output {
type filter hook output priority 0;
update @test { ip saddr }
}
}
You can also add counters to elements from the control place, ie.
table ip test {
set test {
type ipv4_addr
size 65535
elements = { 192.168.2.1 counter packets 75 bytes 19043 }
}
chain output {
type filter hook output priority filter; policy accept;
ip daddr @test
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch correlates the in-kernel extended netlink error offset and
the location information.
Assuming 'foo' table does not exist, then error reporting shows:
# nft delete table foo
Error: Could not process rule: No such file or directory
delete table foo
^^^
Similarly, if table uniquely identified by handle '1234' does not exist,
then error reporting shows:
# nft delete table handle 1234
Error: Could not process rule: No such file or directory
delete table handle 1234
^^^^
Assuming 'bar' chain does not exists in the kernel, while 'foo' does:
# nft delete chain foo bar
Error: Could not process rule: No such file or directory
delete chain foo bar
^^^
This also gives us a hint when adding rules:
# nft add rule ip foo bar counter
Error: Could not process rule: No such file or directory
add rule ip foo bar counter
^^^
This is based on ("src: basic support for extended netlink errors") from
Florian Westphal, posted in 2018, with no netlink offset correlation
support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will be needed once we add support for the 'typeof' keyword to
handle maps that could e.g. store 'ct helper' "type" values.
Instead of:
set foo {
type ipv4_addr . mark;
this would allow
set foo {
typeof(ip saddr) . typeof(ct mark);
(exact syntax TBD).
This would be needed to allow sets that store variable-sized data types
(string, integer and the like) that can't be used at at the moment.
Adding special data types for everything is problematic due to the
large amount of different types needed.
For anonymous sets, e.g. "string" can be used because the needed size can
be inferred from the statement, e.g. 'osf name { "Windows", "Linux }',
but in case of named sets that won't work because 'type string' lacks the
context needed to derive the size information.
With 'typeof(osf name)' the context is there, but at the moment it won't
help because the expression is discarded instantly and only the data
type is retained.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check from the delinearize set element path if the nul-root element
already exists in the interval set. Hence, the element insertion path
skips the implicit nul-root interval insertion.
Under some circunstances, nft bogusly fails to delete the last element
of the interval set and to create an element in an existing empty
internal set. This patch includes a test that reproduces the issue.
Fixes: 4935a0d561b5 ("segtree: special handling for the first non-matching segment")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If --echo is passed, then the cache already contains the commands that
have been sent to the kernel. However, anonymous sets are an exception
since the cache needs to be updated in this case.
Remove the old cache logic from the monitor code that has been replaced
by 01e5c6f0ed03 ("src: add cache level flags").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Remove this wrapper, call netlink_list_rules() instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
netlink_release_registers() needs to go a bit further to release the
expressions in the register array. This should be safe since
netlink_get_register() clones expressions in the context registers.
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update parser to display this error message:
# nft export json
Error: JSON export is no longer supported, use 'nft -j list ruleset' instead
export json
^^^^^^^^^^^^
Just like:
# nft export vm json
Error: JSON export is no longer supported, use 'nft -j list ruleset' instead
export vm json
^^^^^^^^^^^^^^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic principle is to not return a JSON object freshly created from
netlink responses, but just update the existing user-provided one to
make sure callers get back exactly what they expect.
To achieve that, keep the parsed JSON object around in a global variable
('cur_root') and provide a custom callback to insert handles into it
from received netlink messages. The tricky bit here is updating rules
since unique identification is problematic. Therefore drop possibly
present handles from input and later assume updates are received in
order so the first rule not having a handle set is the right one.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Replace it by direct call to mnl_batch_talk().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
We can remove alloc_nftnl_flowtable() and consolidate infrastructure in
the src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
We can remove alloc_nftnl_obj() and consolidate infrastructure in the
src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
These functions are part of the mnl backend, move them there. Remove
netlink_close_sock(), use direct call to mnl_socket_close().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Otherwise we keep using the old netlink socket if we hit EINTR.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
| |
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
| |
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
Just a simple wrapper function, replace it by direct call to
mnl_nft_rule_del().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
We can remove alloc_nftnl_set() and consolidate infrastructure in the
src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
We can remove alloc_nftnl_rule() and consolidate infrastructure in the
src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
The netlink layer sits in between the mnl and the rule layers, remove
it. We can remove alloc_nftnl_chain() and consolidate infrastructure in
the src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
The netlink layer sits in between the mnl and the rule layers, remove
it. We can remove alloc_nftnl_table() and consolidate infrastructure in
the src/mnl.c file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
| |
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
This is called from cache population path, remove netlink_io_error()
call since this is not needed. Rename it for consistency with similar
netlink_list_*() NLM_F_DUMP functions. Get rid of location parameter.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
| |
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Not needed anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Simplify function footprint.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Follow up after cc8c5fd02448 ("netlink: remove non-batching routine").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
netlink.c is rather large file, move the monitor code to its own file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
You need a Linux kernel >= 4.15 to use this feature.
This patch allows us to dump the content of an existing set.
# nft list ruleset
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 3.3.3.3,
5.5.5.5-6.6.6.6 }
}
}
You check if a single element exists in the set:
# nft get element x x { 1.1.1.5 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
Output means '1.1.1.5' belongs to the '1.1.1.1-2.2.2.2' interval.
You can also check for intervals:
# nft get element x x { 1.1.1.1-2.2.2.2 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
If you try to check for an element that doesn't exist, an error is
displayed.
# nft get element x x { 1.1.1.0 }
Error: Could not receive set elements: No such file or directory
get element x x { 1.1.1.0 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can also check for multiple elements in one go:
# nft get element x x { 1.1.1.5, 5.5.5.10 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 5.5.5.5-6.6.6.6 }
}
}
You can also use this to fetch the existing timeout for specific
elements, in case you have a set with timeouts in place:
# nft get element w z { 2.2.2.2 }
table ip w {
set z {
type ipv4_addr
timeout 30s
elements = { 2.2.2.2 expires 17s }
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
This patch allows you to delete an existing flowtable:
# nft delete flowtable x m
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|