| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding element(s) to a non-empty set, code merged the two lists and
sorted the result. With many individual 'add element' commands this
causes substantial overhead. Make use of the fact that
existing_set->init is sorted already, sort only the list of new elements
and use list_splice_sorted() to merge the two sorted lists.
Add set_sort_splice() and use it for set element overlap detection and
automerge.
A test case adding ~25k elements in individual commands completes in
about 1/4th of the time with this patch applied.
Joint work with Pablo.
Fixes: 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Robots might generate a long list of singleton element commands such as:
add element t s { 1.0.1.0/24 }
...
add element t s { 1.0.2.0/23 }
collapse them into one single command before the evaluation step, ie.
add element t s { 1.0.1.0/24, ..., 1.0.2.0/23 }
this speeds up overlap detection and set element automerge operations in
this worst case scenario.
Since 3da9643fb9ff9 ("intervals: add support to automerge with kernel
elements"), the new interval tracking relies on mergesort. The pattern
above triggers the set sorting for each element.
This patch adds a list to cmd objects that store collapsed commands.
Moreover, expressions also contain a reference to the original command,
to uncollapse the commands after the evaluation step.
These commands are uncollapsed after the evaluation step to ensure error
reporting works as expected (command and netlink message are mapped
1:1).
For the record:
- nftables versions <= 1.0.2 did not perform any kind of overlap
check for the described scenario above (because set cache only contained
elements in the kernel in this case). This is a problem for kernels < 5.7
which rely on userspace to detect overlaps.
- the overlap detection could be skipped for kernels >= 5.7.
- The extended netlink error reporting available for set elements
since 5.19-rc might allow to remove the uncollapse step, in this case,
error reporting does not rely on the netlink sequence to refer to the
command triggering the problem.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Excess nesting of scanner scopes is very fragile and error prone:
rule `iif != lo ip daddr 127.0.0.1/8 counter limit rate 1/second log flags all prefix "nft_lo4 " drop`
fails with `Error: No symbol type information` hinting at `prefix`
Problem is that we nest via:
counter
limit
log
flags
By the time 'prefix' is scanned, state is still stuck in 'counter' due
to this nesting. Working around "prefix" isn't enough, any other
keyword, e.g. "level" in 'flags all level debug' will be parsed as 'string' too.
So, revert this.
Fixes: a16697097e2b ("scanner: flags: move to own scope")
Reported-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splice the existing set element cache with the elements to be deleted
and merge sort it. The elements to be deleted are identified by the
EXPR_F_REMOVE flag.
The set elements to be deleted is automerged in first place if the
automerge flag is set on.
There are four possible deletion scenarios:
- Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set.
- Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b.
- Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a.
- Split range, eg. delete [a-b] from range [x-y] where x < a and b < y.
Update nft_evaluate() to use the safe list variant since new commands
are dynamically registered to the list to update ranges.
This patch also restores the set element existence check for Linux
kernels <= 5.7.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the interval codebase to support for merging elements in the
kernel with userspace element updates.
Add a list of elements to be purged to cmd and set objects. These
elements representing outdated intervals are deleted before adding the
updated ranges.
This routine splices the list of userspace and kernel elements, then it
mergesorts to identify overlapping and contiguous ranges. This splice
operation is undone so the set userspace cache remains consistent.
Incrementally update the elements in the cache, this allows to remove
dd44081d91ce ("segtree: Fix add and delete of element in same batch").
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Pass handle and element list as parameters to allow for code reuse.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Not used by anyone anymore, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a rewrite of the segtree interval codebase.
This patch now splits the original set_to_interval() function in three
routines:
- add set_automerge() to merge overlapping and contiguous ranges.
The elements, expressed either as single value, prefix and ranges are
all first normalized to ranges. This elements expressed as ranges are
mergesorted. Then, there is a linear list inspection to check for
merge candidates. This code only merges elements in the same batch,
ie. it does not merge elements in the kernela and the userspace batch.
- add set_overlap() to check for overlapping set elements. Linux
kernel >= 5.7 already checks for overlaps, older kernels still needs
this code. This code checks for two conflict types:
1) between elements in this batch.
2) between elements in this batch and kernelspace.
The elements in the kernel are temporarily merged into the list of
elements in the batch to check for this overlaps. The EXPR_F_KERNEL
flag allows us to restore the set cache after the overlap check has
been performed.
- set_to_interval() now only transforms set elements, expressed as range
e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END
flag notation to represent e.g. [a,b+1), where b+1 has the
EXPR_F_INTERVAL_END flag set on.
More relevant updates:
- The overlap and automerge routines are now performed in the evaluation
phase.
- The userspace set object representation now stores a reference to the
existing kernel set object (in case there is already a set with this
same name in the kernel). This is required by the new overlap and
automerge approach.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
This allows to identify the set elements that reside in the kernel.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
datatype.h uses bool and so should include <stdbool.h>.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set
length and byteorder. Add missing information to the set userdata area
for raw payload expressions which allows to rebuild the set typeof from
the listing path.
A few examples:
- With anonymous sets:
nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e }
- With named sets:
table x {
set y {
typeof ip saddr . @ih,32,32
elements = { 1.1.1.1 . 0x14 }
}
}
Incremental updates are also supported, eg.
nft add element x y { 3.3.3.3 . 0x28 }
expr_evaluate_concat() is used to evaluate both set key definitions
and set key values, using two different function might help to simplify
this code in the future.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
With these three scopes in place, keyword 'to' may be isolated.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
This allows to isolate 'length' and 'protocol' keywords shared by other
scopes as well.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
Modification of raw TCP option rule is a bit more complicated to avoid
pushing tcp_hdr_option_type into the introduced scope by accident.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
| |
Unify nat, masquerade and redirect statements, they widely share their
syntax.
Note the workaround of adding "prefix" to SCANSTATE_IP. This is required
to fix for 'snat ip prefix ...' style expressions.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Isolate 'performance' and 'memory' keywords.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
This isolates at least 'constant', 'dynamic' and 'all' keywords.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Two more keywords isolated.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
In theory, one could use a common scope for both import and export
commands, their parameters are identical.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Isolate two more keywords shared with list command.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Some keywords are shared with list command.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
As a side-effect, this fixes for use of 'classid' as set data type.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
These are the remaining IPv6 extension header expressions, only rt
expression was scoped already.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
They share 'sequence' keyword with icmp and tcp expressions.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
It shares two keywords with PARSER_SC_IP.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
With them in place, heavily shared keywords 'sport' and 'dport' may be
isolated.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
All used keywords are shared with others, so no separation for now apart
from 'csumcov' which was actually missing from scanner.l.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Isolates only 'cpi' keyword for now.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Quite a few keywords are shared with PARSER_SC_TCP.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
At least isolates 'mrt' and 'group' keywords, the latter is shared with
log statement.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
| |
Unify the two, header fields are almost identical.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
| |
This allows to replace a tcp option with nops, similar
to the TCPOPTSTRIP feature of iptables.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
| |
flow statement has no export, its shown as:
".. }, "flow add @ft" ] } }"
With this patch:
".. }, {"flow": {"op": "add", "flowtable": "@ft"}}]}}"
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Change the payload-dependency context to store a dependency for every
protocol layer. This allows us to eliminate more redundant protocol
expressions.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, with only one base and dependency stored this is superfluous,
but it will become more useful when the next commit adds support for
storing a payload for every base.
Remove redundant `ctx->pbase` check.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
| |
If the user is requesting a chain listing, e.g. nft list chain x y
and a rule refers to an anonymous chain that cannot be found in the cache,
then fetch such anonymous chain and its ruleset.
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1577
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Pass the table and chain strings to mnl_nft_rule_dump() instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new -o/--optimize option to enable ruleset
optimization.
You can combine this option with the dry run mode (--check) to review
the proposed ruleset updates without actually loading the ruleset, e.g.
# nft -c -o -f ruleset.test
Merging:
ruleset.nft:16:3-37: ip daddr 192.168.0.1 counter accept
ruleset.nft:17:3-37: ip daddr 192.168.0.2 counter accept
ruleset.nft:18:3-37: ip daddr 192.168.0.3 counter accept
into:
ip daddr { 192.168.0.1, 192.168.0.2, 192.168.0.3 } counter packets 0 bytes 0 accept
This infrastructure collects the common statements that are used in
rules, then it builds a matrix of rules vs. statements. Then, it looks
for common statements in consecutive rules which allows to merge rules.
This ruleset optimization always performs an implicit dry run to
validate that the original ruleset is correct. Then, on a second pass,
it performs the ruleset optimization and add the rules into the kernel
(unless --check has been specified by the user).
From libnftables perspective, there is a new API to enable
this feature:
uint32_t nft_ctx_get_optimize(struct nft_ctx *ctx);
void nft_ctx_set_optimize(struct nft_ctx *ctx, uint32_t flags);
This patch adds support for the first optimization: Collapse a linear
list of rules matching on a single selector into a set as exposed in the
example above.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reading from stdin requires to store the ruleset in a buffer so error
reporting works accordingly, eg.
# cat ruleset.nft | nft -f -
/dev/stdin:3:13-13: Error: unknown identifier 'x'
ip saddr $x
^
The error reporting infrastructure performs a fseek() on the file
descriptor which does not work in this case since the data from the
descriptor has been already consumed.
This patch adds a new stdin input descriptor to perform this special
handling which consists on re-routing this request through the buffer
functions.
Fixes: 935f82e7dd49 ("Support 'nft -f -' to read from stdin")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
Add a few helper functions to reuse code in the new rule optimization
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Its always 0, so remove it.
Looks like this was intended to support variable options that have
array-like members, but so far this isn't implemented, better remove
dead code and implement it properly when such support is needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
| |
Extend nft_cache_filter to hold a flowtable name so 'list flowtable'
command causes fetching the requested flowtable only.
Dump flowtables just once instead of for each table, merely assign
fetched data to tables inside the loop.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
| |
Fetch either all tables' sets at once, a specific table's sets or even a
specific set if needed instead of iterating over the list of previously
fetched tables and fetching for each, then ignoring anything returned
that doesn't match the filter.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When operating on a specific chain, add payload to NFT_MSG_GETCHAIN so
kernel returns only relevant data. Since ENOENT is an expected return
code, do not treat this as error.
While being at it, improve code in chain_cache_cb() a bit:
- Check chain's family first, it is a less expensive check than
comparing table names.
- Do not extract chain name of uninteresting chains.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of fetching all existing rules in kernel's ruleset and filtering
in user space, add payload to the dump request specifying the table and
chain to filter for.
Since list_rule_cb() no longer needs the filter, pass only netlink_ctx
to the callback and drop struct rule_cache_dump_ctx.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of requesting a dump of all tables and filtering the data in
user space, construct a non-dump request if filter contains a table so
kernel returns only that single table.
This should improve nft performance in rulesets with many tables
present.
Signed-off-by: Phil Sutter <phil@nwl.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unify binop handling for ipv6 extension header, ip option and tcp option
processing.
Pass the real offset and length expected, not the one used in the kernel.
This was already done for extension headers and ip options, but tcp
option parsing did not do this.
This was fine before because no existing tcp option template
had a non-byte sized member.
With mptcp addition this isn't the case anymore, subtype field is
only 4 bits wide, but tcp option delinearization passed 8bits instead.
Pass the offset and mask delta, just like ip option/ipv6 exthdr.
This makes nft show 'tcp option mptcp subtype 1' instead of
'tcp option mptcp unknown & 240 == 16'.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MPTCP multiplexes the various mptcp signalling data using the
first 4 bits of the mptcp option.
This allows to match on the mptcp subtype via:
tcp option mptcp subtype 1
This misses delinearization support. mptcp subtype is the first tcp
option field that has a length of less than one byte.
Serialization processing will add a binop for this, but netlink
delinearization can't remove them, yet.
Also misses a new datatype/symbol table to allow to use mnemonics like
'mp_join' instead of raw numbers.
For this reason, no tests are added yet.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
| |
Allow to use "fastopen", "md5sig" and "mptcp" mnemonics rather than the
raw option numbers.
These new keywords are only recognized while scanner is in tcp state.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
This moves tcp options not used anywhere else (e.g. in synproxy) to a
distinct scope. This will also allow to avoid exposing new option
keywords in the ruleset context.
Signed-off-by: Florian Westphal <fw@strlen.de>
|