| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a minimum base that all our sources will end up needing. This
is what <nft.h> provides.
Add <stdbool.h> and <stdint.h> there. It's unlikely that we want to
implement anything, without having "bool" and "uint32_t" types
available.
Yes, this means the internal headers are not self-contained, with
respect to what <nft.h> provides. This is the exception to the rule, and
our internal headers should rely to have <nft.h> included for them.
They should not include <nft.h> themselves, because <nft.h> needs always
be included as first. So when an internal header would include <nft.h>
it would be unnecessary, because the header is *always* included
already.
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the initial infrastructure to support for inner header
tunnel matching and its first user: vxlan.
A new struct proto_desc field for payload and meta expression to specify
that the expression refers to inner header matching is used.
The existing codebase to generate bytecode is fully reused, allowing for
reusing existing supported layer 2, 3 and 4 protocols.
Syntax requires to specify vxlan before the inner protocol field:
... vxlan ip protocol udp
... vxlan ip saddr 1.2.3.0/24
This also works with concatenations and anonymous sets, eg.
... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 }
You have to restrict vxlan matching to udp traffic, otherwise it
complains on missing transport protocol dependency, e.g.
... udp dport 4789 vxlan ip daddr 1.2.3.4
The bytecode that is generated uses the new inner expression:
# nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
netdev x y
[ meta load l4proto => reg 1 ]
[ cmp eq reg 1 0x00000011 ]
[ payload load 2b @ transport header + 2 => reg 1 ]
[ cmp eq reg 1 0x0000b512 ]
[ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
[ cmp eq reg 1 0x00000008 ]
[ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
[ cmp eq reg 1 0x04030201 ]
JSON support is not included in this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
WHen flagcmp and catchall expressions got added the EXPR_MAX definition
wasn't changed.
Should have no impact in practice however, this value is only checked to
prevent crash when old nft release is used to list a ruleset generated
by a newer nft release and a unknown 'typeof' expression.
v2: Pablo suggested to add EXPR_MAX into enum so its easier to spot.
Adding __EXPR_MAX + define EXPR_MAX (__EXPR_MAX - 1) causes '__EXPR_MAX
not handled in switch' warnings, hence the 'EXPR_MAX =' solution.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When adding element(s) to a non-empty set, code merged the two lists and
sorted the result. With many individual 'add element' commands this
causes substantial overhead. Make use of the fact that
existing_set->init is sorted already, sort only the list of new elements
and use list_splice_sorted() to merge the two sorted lists.
Add set_sort_splice() and use it for set element overlap detection and
automerge.
A test case adding ~25k elements in individual commands completes in
about 1/4th of the time with this patch applied.
Joint work with Pablo.
Fixes: 3da9643fb9ff9 ("intervals: add support to automerge with kernel elements")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Robots might generate a long list of singleton element commands such as:
add element t s { 1.0.1.0/24 }
...
add element t s { 1.0.2.0/23 }
collapse them into one single command before the evaluation step, ie.
add element t s { 1.0.1.0/24, ..., 1.0.2.0/23 }
this speeds up overlap detection and set element automerge operations in
this worst case scenario.
Since 3da9643fb9ff9 ("intervals: add support to automerge with kernel
elements"), the new interval tracking relies on mergesort. The pattern
above triggers the set sorting for each element.
This patch adds a list to cmd objects that store collapsed commands.
Moreover, expressions also contain a reference to the original command,
to uncollapse the commands after the evaluation step.
These commands are uncollapsed after the evaluation step to ensure error
reporting works as expected (command and netlink message are mapped
1:1).
For the record:
- nftables versions <= 1.0.2 did not perform any kind of overlap
check for the described scenario above (because set cache only contained
elements in the kernel in this case). This is a problem for kernels < 5.7
which rely on userspace to detect overlaps.
- the overlap detection could be skipped for kernels >= 5.7.
- The extended netlink error reporting available for set elements
since 5.19-rc might allow to remove the uncollapse step, in this case,
error reporting does not rely on the netlink sequence to refer to the
command triggering the problem.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Splice the existing set element cache with the elements to be deleted
and merge sort it. The elements to be deleted are identified by the
EXPR_F_REMOVE flag.
The set elements to be deleted is automerged in first place if the
automerge flag is set on.
There are four possible deletion scenarios:
- Exact match, eg. delete [a-b] and there is a [a-b] range in the kernel set.
- Adjust left side of range, eg. delete [a-b] from range [a-x] where x > b.
- Adjust right side of range, eg. delete [a-b] from range [x-b] where x < a.
- Split range, eg. delete [a-b] from range [x-y] where x < a and b < y.
Update nft_evaluate() to use the safe list variant since new commands
are dynamically registered to the list to update ranges.
This patch also restores the set element existence check for Linux
kernels <= 5.7.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a rewrite of the segtree interval codebase.
This patch now splits the original set_to_interval() function in three
routines:
- add set_automerge() to merge overlapping and contiguous ranges.
The elements, expressed either as single value, prefix and ranges are
all first normalized to ranges. This elements expressed as ranges are
mergesorted. Then, there is a linear list inspection to check for
merge candidates. This code only merges elements in the same batch,
ie. it does not merge elements in the kernela and the userspace batch.
- add set_overlap() to check for overlapping set elements. Linux
kernel >= 5.7 already checks for overlaps, older kernels still needs
this code. This code checks for two conflict types:
1) between elements in this batch.
2) between elements in this batch and kernelspace.
The elements in the kernel are temporarily merged into the list of
elements in the batch to check for this overlaps. The EXPR_F_KERNEL
flag allows us to restore the set cache after the overlap check has
been performed.
- set_to_interval() now only transforms set elements, expressed as range
e.g. [a,b], to individual set elements using the EXPR_F_INTERVAL_END
flag notation to represent e.g. [a,b+1), where b+1 has the
EXPR_F_INTERVAL_END flag set on.
More relevant updates:
- The overlap and automerge routines are now performed in the evaluation
phase.
- The userspace set object representation now stores a reference to the
existing kernel set object (in case there is already a set with this
same name in the kernel). This is required by the new overlap and
automerge approach.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
This allows to identify the set elements that reside in the kernel.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the dynamic datatype to allocate an instance of TYPE_INTEGER and set
length and byteorder. Add missing information to the set userdata area
for raw payload expressions which allows to rebuild the set typeof from
the listing path.
A few examples:
- With anonymous sets:
nft add rule x y ip saddr . @ih,32,32 { 1.1.1.1 . 0x14, 2.2.2.2 . 0x1e }
- With named sets:
table x {
set y {
typeof ip saddr . @ih,32,32
elements = { 1.1.1.1 . 0x14 }
}
}
Incremental updates are also supported, eg.
nft add element x y { 3.3.3.3 . 0x28 }
expr_evaluate_concat() is used to evaluate both set key definitions
and set key values, using two different function might help to simplify
this code in the future.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the following shortcut syntax:
expression flags / flags
instead of:
expression and flags == flags
For example:
tcp flags syn,ack / syn,ack,fin,rst
^^^^^^^ ^^^^^^^^^^^^^^^
value mask
instead of:
tcp flags and (syn|ack|fin|rst) == syn|ack
The second list of comma-separated flags represents the mask which are
examined and the first list of comma-separated flags must be set.
You can also use the != operator with this syntax:
tcp flags != fin,rst / syn,ack,fin,rst
This shortcut is based on the prefix notation, but it is also similar to
the iptables tcp matching syntax.
This patch introduces the flagcmp expression to print the tcp flags in
this new notation. The delinearize path transforms the binary expression
to this new flagcmp expression whenever possible.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a catchall expression (EXPR_SET_ELEM_CATCHALL).
Use the asterisk (*) to represent the catch-all set element, e.g.
table x {
set y {
type ipv4_addr
counter
elements = { 1.2.3.4 counter packets 0 bytes 0, * counter packets 0 bytes 0 }
}
}
Special handling for segtree: zap the catch-all element from the set
element list and re-add it after processing.
Remove wildcard_expr deadcode in src/parser_bison.y
This patch also adds several tests for the tests/py and tests/shell
infrastructures.
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Add support for matching on the cgroups version 2.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch provides a shortcut for:
ct status and dnat == 0
which allows to check for the packet whose dnat bit is unset:
# nft add rule x y ct status ! dnat counter
This operation is only available for expression with a bitmask basetype, eg.
# nft describe ct status
ct expression, datatype ct_status (conntrack status) (basetype bitmask, integer), 32 bits
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
Extend the set element infrastructure to support for several statements.
This patch places the statements right after the key when printing it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nft currently doesn't allow to check for presence of arbitrary tcp options.
Only known options where nft provides a template can be tested for.
This allows to test for presence of raw protocol values as well.
Example:
tcp option 42 exists
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the protocol context infrastructure to track multiple
transport protocols when they are specified from sets.
This removes errors like:
"transport protocol mapping is only valid after transport protocol match"
when invoking:
# nft add rule x z meta l4proto { tcp, udp } dnat to 1.1.1.1:80
This patch also catches conflicts like:
# nft add rule x z ip protocol { tcp, udp } tcp dport 20 dnat to 1.1.1.1:80
Error: conflicting protocols specified: udp vs. tcp
add rule x z ip protocol { tcp, udp } tcp dport 20 dnat to 1.1.1.1:80
^^^^^^^^^
and:
# nft add rule x z meta l4proto { tcp, udp } tcp dport 20 dnat to 1.1.1.1:80
Error: conflicting protocols specified: udp vs. tcp
add rule x z meta l4proto { tcp, udp } tcp dport 20 dnat to 1.1.1.1:80
^^^^^^^^^
Note that:
- the singleton protocol context tracker is left in place until the
existing users are updated to use this new multiprotocol tracker.
Moving forward, it would be good to consolidate things around this new
multiprotocol context tracker infrastructure.
- link and network layers are not updated to use this infrastructure
yet. The code that deals with vlan conflicts relies on forcing
protocol context updates to the singleton protocol base.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new field to the cmd structure for elements to store a
reference to the set. This saves an extra lookup in the netlink bytecode
generation step.
This patch also allows to incrementally update during the evaluation
phase according to the command actions, which is required by the follow
up ("evaluate: remove table from cache on delete table") bugfix patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to group rules in a subchain, e.g.
table inet x {
chain y {
type filter hook input priority 0;
tcp dport 22 jump {
ip saddr { 127.0.0.0/8, 172.23.0.0/16, 192.168.13.0/24 } accept
ip6 saddr ::1/128 accept;
}
}
}
This also supports for the `goto' chain verdict.
This patch adds a new chain binding list to avoid a chain list lookup from the
delinearize path for the usual chains. This can be simplified later on with a
single hashtable per table for all chains.
From the shell, you have to use the explicit separator ';', in bash you
have to escape this:
# nft add rule inet x y tcp dport 80 jump { ip saddr 127.0.0.1 accept\; ip6 saddr ::1 accept \; }
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Intsead of using an array of char.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
This patch transform a range of IP addresses to prefix when listing the
ruleset.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to specify an interval of IP address in maps.
table ip x {
chain y {
type nat hook postrouting priority srcnat; policy accept;
snat ip interval to ip saddr map { 10.141.11.4 : 192.168.2.2-192.168.2.4 }
}
}
The example above performs SNAT to packets that comes from 10.141.11.4
to an interval of IP addresses from 192.168.2.2 to 192.168.2.4 (both
included).
You can also combine this with dynamic maps:
table ip x {
map y {
type ipv4_addr : interval ipv4_addr
flags interval
elements = { 10.141.10.0/24 : 192.168.2.2-192.168.2.4 }
}
chain y {
type nat hook postrouting priority srcnat; policy accept;
snat ip interval to ip saddr map @y
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Payload munging means that evaluation of payload expressions may not be
idempotent. Add a flag to prevent them from being evaluated more than
once.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
Useless duplication. Also, this avoids bloating expr_ops_by_type()
when it needs to cope with more expressions.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After exporting field lengths via NFTNL_SET_DESC_CONCAT attributes,
we now need to adjust parsing of user input and generation of
netlink key data to complete support for concatenation of set
ranges.
Instead of using separate elements for start and end of a range,
denoting the end element by the NFT_SET_ELEM_INTERVAL_END flag,
as it's currently done for ranges without concatenation, we'll use
the new attribute NFTNL_SET_ELEM_KEY_END as suggested by Pablo. It
behaves in the same way as NFTNL_SET_ELEM_KEY, but it indicates
that the included key represents the upper bound of a range.
For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", needs to be
expressed as a single element with two keys:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 22
NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25
To achieve this, we need to:
- adjust the lexer rules to allow multiton expressions as elements
of a concatenation. As wildcards are not allowed (semantics would
be ambiguous), exclude wildcards expressions from the set of
possible multiton expressions, and allow them directly where
needed. Concatenations now admit prefixes and ranges
- generate, for each element in a range concatenation, a second key
attribute, that includes the upper bound for the range
- also expand prefixes and non-ranged values in the concatenation
to ranges: given a set with interval and concatenation support,
the kernel has no way to tell which elements are ranged, so they
all need to be. For example, 192.0.2.0 . 192.0.2.9 : 1024 is
sent as:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 1024
NFTA_SET_ELEM_KEY_END: 192.0.2.9 . 1024
- aggregate ranges when elements received by the kernel represent
concatenated ranges, see concat_range_aggregate()
- perform a few minor adjustments where interval expressions
are already handled: we have intervals in these sets, but
the set specification isn't just an interval, so we can't
just aggregate and deaggregate interval ranges linearly
v4: No changes
v3:
- rework to use a separate key for closing element of range instead of
a separate element with EXPR_F_INTERVAL_END set (Pablo Neira Ayuso)
v2:
- reworked netlink_gen_concat_data(), moved loop body to a new function,
netlink_gen_concat_data_expr() (Phil Sutter)
- dropped repeated pattern in bison file, replaced by a new helper,
compound_expr_alloc_or_add() (Phil Sutter)
- added set_is_nonconcat_range() helper (Phil Sutter)
- in expr_evaluate_set(), we need to set NFT_SET_SUBKEY also on empty
sets where the set in the context already has the flag
- dropped additional 'end' parameter from netlink_gen_data(),
temporarily set EXPR_F_INTERVAL_END on expressions and use that from
netlink_gen_concat_data() to figure out we need to add the 'end'
element (Phil Sutter)
- replace range_mask_len() by a simplified version, as we don't need
to actually store the composing masks of a range (Phil Sutter)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support arbitrary range concatenations, the kernel needs to know
how long each field in the concatenation is. The new libnftnl
NFTNL_SET_DESC_CONCAT set attribute describes this as an array of
lengths, in bytes, of concatenated fields.
While evaluating concatenated expressions, export the datatype size
into the new field_len array, and hand the data over via libnftnl.
Similarly, when data is passed back from libnftnl, parse it into
the set description.
When set data is cloned, we now need to copy the additional fields
in set_clone(), too.
This change depends on the libnftnl patch with title:
set: Add support for NFTA_SET_DESC_CONCAT attributes
v4: No changes
v3: Rework to use set description data instead of a stand-alone
attribute
v2: No changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two new expression operations to build and to parse the
userdata area that describe the set key and data typeof definitions.
For maps, the grammar enforces either
"type data_type : data_type" or or "typeof expression : expression".
Check both key and data for valid user typeof info first.
If they check out, flag set->key_typeof_valid as true and use it for
printing the key info.
This patch comes with initial support for using payload expressions
with the 'typeof' keyword, followup patches will add support for other
expressions as well.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
Fetch expression operation from the expression type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
ops has been removed, and etype has been added
Signed-off-by: Brett Mastbergen <bmastbergen@untangle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two datatypes are using runtime datatype allocation:
* Concatenations.
* Integer, that require byteorder adjustment.
From the evaluation / postprocess step, transformations are common,
hence expressions may end up fetching (infering) datatypes from an
existing one.
This patch adds a reference counter to release the dynamic datatype
object when it is shared.
The API includes the following helper functions:
* datatype_set(expr, datatype), to assign a datatype to an expression.
This helper already deals with reference counting for dynamic
datatypes. This also drops the reference counter of any previous
datatype (to deal with the datatype replacement case).
* datatype_get(datatype) bumps the reference counter. This function also
deals with nul-pointers, that occurs when the datatype is unset.
* datatype_free() drops the reference counter, and it also releases the
datatype if there are not more clients of it.
Rule of thumb is: The reference counter of any newly allocated datatype
is set to zero.
This patch also updates every spot to use datatype_set() for non-dynamic
datatypes, for consistency. In this case, the helper just makes an
simple assignment.
Note that expr_alloc() has been updated to call datatype_get() on the
datatype that is assigned to this new expression. Moreover, expr_free()
calls datatype_free().
This fixes valgrind reports like this one:
==28352== 1,350 (440 direct, 910 indirect) bytes in 5 blocks are definitely lost in loss recor 3 of 3
==28352== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==28352== by 0x4E79558: xmalloc (utils.c:36)
==28352== by 0x4E7963D: xzalloc (utils.c:65)
==28352== by 0x4E6029B: dtype_alloc (datatype.c:1073)
==28352== by 0x4E6029B: concat_type_alloc (datatype.c:1127)
==28352== by 0x4E6D3B3: netlink_delinearize_set (netlink.c:578)
==28352== by 0x4E6D68E: list_set_cb (netlink.c:648)
==28352== by 0x5D74023: nftnl_set_list_foreach (set.c:780)
==28352== by 0x4E6D6F3: netlink_list_sets (netlink.c:669)
==28352== by 0x4E5A7A3: cache_init_objects (rule.c:159)
==28352== by 0x4E5A7A3: cache_init (rule.c:216)
==28352== by 0x4E5A7A3: cache_update (rule.c:266)
==28352== by 0x4E7E0EE: nft_evaluate (libnftables.c:388)
==28352== by 0x4E7EADD: nft_run_cmd_from_filename (libnftables.c:479)
==28352== by 0x109A53: main (main.c:310)
This patch also removes the DTYPE_F_CLONE flag which is broken and not
needed anymore since proper reference counting is in place.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Introduce expressions as a chain in jump and goto statements.
This is going to be used to support variables as a chain in the
following patches.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for version fingerprint in "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority filter; policy accept;
osf ttl skip name "Linux"
osf ttl skip version "Linux:4.20"
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
Breaks custom-defined configuration in library mode, ie. user may want
to store output in a file, instead of stderr.
Fixes: 35f6cd327c2e ("src: Pass stateless, numeric, ip2name and handle variables as structure members.")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Fixes: e3f195777ee54 ("src: expr: remove expr_ops from struct expr")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
size of struct expr changes from 144 to 128 bytes on x86_64.
This doesn't look like much, but large rulesets can have tens of thousands
of expressions (each set element is represented by an expression).
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Temporary kludge to remove all the expr->ops->type == ... patterns.
Followup patch will remove expr->ops, and make expr_ops() lookup
the correct expr_ops struct instead to reduce struct expr size.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Currently callers use expr->ops->name, but follouwp patch will remove the
ops pointer from struct expr. So add this helper and use it everywhere.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for ttl option in "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority filter; policy accept;
osf ttl skip name "Linux"
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
table ip x {
set y {
type inet_service
flags interval
elements = { 10, 20-30, 40, 50-60 }
}
}
# nft get element x y { 20-40 }
table ip x {
set y {
type inet_service
flags interval
elements = { 20-40 }
}
}
20 and 40 exist in the tree, but they are part of different ranges.
This patch adds a new get_set_decompose() function to validate that the
left and the right side of the range.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows matching on ipsec tunnel/beet addresses in xfrm state
associated with a packet, ipsec request id and the SPI.
Examples:
ipsec in ip saddr 192.168.1.0/24
ipsec out ip6 daddr @endpoints
ipsec in spi 1-65536
Joint work with Florian Westphal.
Cc: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority 0; policy accept;
osf name "Linux" counter packets 3 bytes 132
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For now it can only match sockets with IP(V6)_TRANSPARENT socket option
set. Example:
table inet sockin {
chain sockchain {
type filter hook prerouting priority -150; policy accept;
socket transparent 1 mark set 0x00000001 nftrace set 1 counter packets 9 bytes 504 accept
}
}
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although technically there already is support for JSON output via 'nft
export json' command, it is hardly useable since it exports all the gory
details of nftables VM. Also, libnftables has no control over what is
exported since the content comes directly from libnftnl.
Instead, implement JSON format support for regular 'nft list' commands.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes static flag and adds header prototype for the following
functions:
* must_print_eq_op() from src/expression.c
* fib_result_str() from src/fib.c
* set_policy2str() and chain_policy2str from src/rule.c
In fib.h, include linux/netfilter/nf_tables.h to make sure enum
nft_fib_result is known when including this file.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a bit of code reorganization, relational meta OPs OP_RANGE,
OP_FLAGCMP and OP_LOOKUP become unused and can be removed. The only meta
OP left is OP_IMPLICIT which is usually treated as alias to OP_EQ.
Though it needs to stay in place for one reason: When matching against a
bitmask (e.g. TCP flags or conntrack states), it has a different
meaning:
| nft --debug=netlink add rule ip t c tcp flags syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ bitwise reg 1 = (reg=1 & 0x00000002 ) ^ 0x00000000 ]
| [ cmp neq reg 1 0x00000000 ]
| nft --debug=netlink add rule ip t c tcp flags == syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ cmp eq reg 1 0x00000002 ]
OP_IMPLICIT creates a match which just checks the given flag is present,
while OP_EQ creates a match which ensures the given flag and no other is
present.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
You need a Linux kernel >= 4.15 to use this feature.
This patch allows us to dump the content of an existing set.
# nft list ruleset
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 3.3.3.3,
5.5.5.5-6.6.6.6 }
}
}
You check if a single element exists in the set:
# nft get element x x { 1.1.1.5 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
Output means '1.1.1.5' belongs to the '1.1.1.1-2.2.2.2' interval.
You can also check for intervals:
# nft get element x x { 1.1.1.1-2.2.2.2 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
If you try to check for an element that doesn't exist, an error is
displayed.
# nft get element x x { 1.1.1.0 }
Error: Could not receive set elements: No such file or directory
get element x x { 1.1.1.0 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can also check for multiple elements in one go:
# nft get element x x { 1.1.1.5, 5.5.5.10 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 5.5.5.5-6.6.6.6 }
}
}
You can also use this to fetch the existing timeout for specific
elements, in case you have a set with timeouts in place:
# nft get element w z { 2.2.2.2 }
table ip w {
set z {
type ipv4_addr
timeout 30s
elements = { 2.2.2.2 expires 17s }
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to create flowtable:
# nft add table x
# nft add flowtable x m { hook ingress priority 10\; devices = { eth0, wlan0 }\; }
You have to specify hook and priority. So far, only the ingress hook is
supported. The priority represents where this flowtable is placed in the
ingress hook, which is registered to the devices that the user
specifies.
You can also use the 'create' command instead to bail out in case that
there is an existing flowtable with this name.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new variable expression that we can use to attach symbols in
runtime, this allows us to redefine variables via new keyword, eg.
table ip x {
chain y {
define address = { 1.1.1.1, 2.2.2.2 }
ip saddr $address
redefine address = { 3.3.3.3 }
ip saddr $address
}
}
# nft list ruleset
table ip x {
chain y {
ip saddr { 1.1.1.1, 2.2.2.2 }
ip saddr { 3.3.3.3 }
}
}
Note that redefinition just places a new symbol version before the
existing one, so symbol lookups always find the latest version. The
undefine keyword decrements the reference counter and removes the symbol
from the list, so it cannot be used anymore. Still, previous references
to this symbol via variable expression are still valid.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make syntax consistent between print and parse.
No dependency handling -- once you use raw expression, you need
to make sure the raw expression only sees the packets that you'd
want it to see.
based on an earlier patch from Laurent Fasnacht <l@libres.ch>.
Laurents patch added a different syntax:
@<protocol>,<base>,<data type>,<offset>,<length>
data_type is useful to make nftables not err when
asking for "@payload,32,32 192.168.0.1", this patch still requires
manual convsersion to an integer type (hex or decimal notation).
data_type should probably be added later by adding an explicit
cast expression, independent of the raw payload syntax.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, when adding multiple ranges to a set they were merged if
overlapping or adjacent. This might cause inconvenience though since it
is afterwards not easily possible anymore to remove one of the merged
ranges again while keeping the others in place.
Since it is not possible to have overlapping ranges, this patch adds a
check for newly added ranges to make sure they don't overlap if merging
is turned off.
Note that it is not possible (yet?) to enable range merging using nft
tool.
Testsuite had to be adjusted as well: One test in tests/py changed avoid
adding overlapping ranges and the test in tests/shell which explicitly
tests for this feature dropped.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an obscure bug on big-endian systems when trying to list a rule
containing the expression 'ct helper tftp' which triggers the assert()
call in mpz_get_type().
Florian identified the cause: ct_expr_pctx_update() is called for the
relational expression which calls mpz_get_uint32() to get RHS value
(assuming it is a protocol number). On big-endian systems, the
misinterpreted value exceeds UINT_MAX.
Expressions' pctx_update() callback should only be called for protocol
matches, so ct_meta_common_postprocess() lacked a check for 'left->flags
& EXPR_F_PROTOCOL' like the one already present in
payload_expr_pctx_update().
In order to fix this in a clean way, this patch introduces a wrapper
relational_expr_pctx_update() to be used instead of directly calling
LHS's pctx_update() callback which unifies the necessary checks (and
adds one more assert):
- assert(expr->ops->type == EXPR_RELATIONAL)
-> This is new, just to ensure the wrapper is called properly.
- assert(expr->op == OP_EQ)
-> This was moved from {ct,meta,payload}_expr_pctx_update().
- left->ops->pctx_update != NULL
-> This was taken from expr_evaluate_relational(), a necessary
requirement for the introduced wrapper to function at all.
- (left->flags & EXPR_F_PROTOCOL) != 0
-> The crucial missing check which led to the problem.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|