| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Payload munging means that evaluation of payload expressions may not be
idempotent. Add a flag to prevent them from being evaluated more than
once.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
Useless duplication. Also, this avoids bloating expr_ops_by_type()
when it needs to cope with more expressions.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After exporting field lengths via NFTNL_SET_DESC_CONCAT attributes,
we now need to adjust parsing of user input and generation of
netlink key data to complete support for concatenation of set
ranges.
Instead of using separate elements for start and end of a range,
denoting the end element by the NFT_SET_ELEM_INTERVAL_END flag,
as it's currently done for ranges without concatenation, we'll use
the new attribute NFTNL_SET_ELEM_KEY_END as suggested by Pablo. It
behaves in the same way as NFTNL_SET_ELEM_KEY, but it indicates
that the included key represents the upper bound of a range.
For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", needs to be
expressed as a single element with two keys:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 22
NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25
To achieve this, we need to:
- adjust the lexer rules to allow multiton expressions as elements
of a concatenation. As wildcards are not allowed (semantics would
be ambiguous), exclude wildcards expressions from the set of
possible multiton expressions, and allow them directly where
needed. Concatenations now admit prefixes and ranges
- generate, for each element in a range concatenation, a second key
attribute, that includes the upper bound for the range
- also expand prefixes and non-ranged values in the concatenation
to ranges: given a set with interval and concatenation support,
the kernel has no way to tell which elements are ranged, so they
all need to be. For example, 192.0.2.0 . 192.0.2.9 : 1024 is
sent as:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 1024
NFTA_SET_ELEM_KEY_END: 192.0.2.9 . 1024
- aggregate ranges when elements received by the kernel represent
concatenated ranges, see concat_range_aggregate()
- perform a few minor adjustments where interval expressions
are already handled: we have intervals in these sets, but
the set specification isn't just an interval, so we can't
just aggregate and deaggregate interval ranges linearly
v4: No changes
v3:
- rework to use a separate key for closing element of range instead of
a separate element with EXPR_F_INTERVAL_END set (Pablo Neira Ayuso)
v2:
- reworked netlink_gen_concat_data(), moved loop body to a new function,
netlink_gen_concat_data_expr() (Phil Sutter)
- dropped repeated pattern in bison file, replaced by a new helper,
compound_expr_alloc_or_add() (Phil Sutter)
- added set_is_nonconcat_range() helper (Phil Sutter)
- in expr_evaluate_set(), we need to set NFT_SET_SUBKEY also on empty
sets where the set in the context already has the flag
- dropped additional 'end' parameter from netlink_gen_data(),
temporarily set EXPR_F_INTERVAL_END on expressions and use that from
netlink_gen_concat_data() to figure out we need to add the 'end'
element (Phil Sutter)
- replace range_mask_len() by a simplified version, as we don't need
to actually store the composing masks of a range (Phil Sutter)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support arbitrary range concatenations, the kernel needs to know
how long each field in the concatenation is. The new libnftnl
NFTNL_SET_DESC_CONCAT set attribute describes this as an array of
lengths, in bytes, of concatenated fields.
While evaluating concatenated expressions, export the datatype size
into the new field_len array, and hand the data over via libnftnl.
Similarly, when data is passed back from libnftnl, parse it into
the set description.
When set data is cloned, we now need to copy the additional fields
in set_clone(), too.
This change depends on the libnftnl patch with title:
set: Add support for NFTA_SET_DESC_CONCAT attributes
v4: No changes
v3: Rework to use set description data instead of a stand-alone
attribute
v2: No changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two new expression operations to build and to parse the
userdata area that describe the set key and data typeof definitions.
For maps, the grammar enforces either
"type data_type : data_type" or or "typeof expression : expression".
Check both key and data for valid user typeof info first.
If they check out, flag set->key_typeof_valid as true and use it for
printing the key info.
This patch comes with initial support for using payload expressions
with the 'typeof' keyword, followup patches will add support for other
expressions as well.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
Fetch expression operation from the expression type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
| |
ops has been removed, and etype has been added
Signed-off-by: Brett Mastbergen <bmastbergen@untangle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two datatypes are using runtime datatype allocation:
* Concatenations.
* Integer, that require byteorder adjustment.
From the evaluation / postprocess step, transformations are common,
hence expressions may end up fetching (infering) datatypes from an
existing one.
This patch adds a reference counter to release the dynamic datatype
object when it is shared.
The API includes the following helper functions:
* datatype_set(expr, datatype), to assign a datatype to an expression.
This helper already deals with reference counting for dynamic
datatypes. This also drops the reference counter of any previous
datatype (to deal with the datatype replacement case).
* datatype_get(datatype) bumps the reference counter. This function also
deals with nul-pointers, that occurs when the datatype is unset.
* datatype_free() drops the reference counter, and it also releases the
datatype if there are not more clients of it.
Rule of thumb is: The reference counter of any newly allocated datatype
is set to zero.
This patch also updates every spot to use datatype_set() for non-dynamic
datatypes, for consistency. In this case, the helper just makes an
simple assignment.
Note that expr_alloc() has been updated to call datatype_get() on the
datatype that is assigned to this new expression. Moreover, expr_free()
calls datatype_free().
This fixes valgrind reports like this one:
==28352== 1,350 (440 direct, 910 indirect) bytes in 5 blocks are definitely lost in loss recor 3 of 3
==28352== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==28352== by 0x4E79558: xmalloc (utils.c:36)
==28352== by 0x4E7963D: xzalloc (utils.c:65)
==28352== by 0x4E6029B: dtype_alloc (datatype.c:1073)
==28352== by 0x4E6029B: concat_type_alloc (datatype.c:1127)
==28352== by 0x4E6D3B3: netlink_delinearize_set (netlink.c:578)
==28352== by 0x4E6D68E: list_set_cb (netlink.c:648)
==28352== by 0x5D74023: nftnl_set_list_foreach (set.c:780)
==28352== by 0x4E6D6F3: netlink_list_sets (netlink.c:669)
==28352== by 0x4E5A7A3: cache_init_objects (rule.c:159)
==28352== by 0x4E5A7A3: cache_init (rule.c:216)
==28352== by 0x4E5A7A3: cache_update (rule.c:266)
==28352== by 0x4E7E0EE: nft_evaluate (libnftables.c:388)
==28352== by 0x4E7EADD: nft_run_cmd_from_filename (libnftables.c:479)
==28352== by 0x109A53: main (main.c:310)
This patch also removes the DTYPE_F_CLONE flag which is broken and not
needed anymore since proper reference counting is in place.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
Introduce expressions as a chain in jump and goto statements.
This is going to be used to support variables as a chain in the
following patches.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for version fingerprint in "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority filter; policy accept;
osf ttl skip name "Linux"
osf ttl skip version "Linux:4.20"
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
| |
Breaks custom-defined configuration in library mode, ie. user may want
to store output in a file, instead of stderr.
Fixes: 35f6cd327c2e ("src: Pass stateless, numeric, ip2name and handle variables as structure members.")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
| |
Fixes: e3f195777ee54 ("src: expr: remove expr_ops from struct expr")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
size of struct expr changes from 144 to 128 bytes on x86_64.
This doesn't look like much, but large rulesets can have tens of thousands
of expressions (each set element is represented by an expression).
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Temporary kludge to remove all the expr->ops->type == ... patterns.
Followup patch will remove expr->ops, and make expr_ops() lookup
the correct expr_ops struct instead to reduce struct expr size.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
| |
Currently callers use expr->ops->name, but follouwp patch will remove the
ops pointer from struct expr. So add this helper and use it everywhere.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for ttl option in "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority filter; policy accept;
osf ttl skip name "Linux"
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
table ip x {
set y {
type inet_service
flags interval
elements = { 10, 20-30, 40, 50-60 }
}
}
# nft get element x y { 20-40 }
table ip x {
set y {
type inet_service
flags interval
elements = { 20-40 }
}
}
20 and 40 exist in the tree, but they are part of different ranges.
This patch adds a new get_set_decompose() function to validate that the
left and the right side of the range.
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows matching on ipsec tunnel/beet addresses in xfrm state
associated with a packet, ipsec request id and the SPI.
Examples:
ipsec in ip saddr 192.168.1.0/24
ipsec out ip6 daddr @endpoints
ipsec in spi 1-65536
Joint work with Florian Westphal.
Cc: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for "osf" expression. Example:
table ip foo {
chain bar {
type filter hook input priority 0; policy accept;
osf name "Linux" counter packets 3 bytes 132
}
}
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For now it can only match sockets with IP(V6)_TRANSPARENT socket option
set. Example:
table inet sockin {
chain sockchain {
type filter hook prerouting priority -150; policy accept;
socket transparent 1 mark set 0x00000001 nftrace set 1 counter packets 9 bytes 504 accept
}
}
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although technically there already is support for JSON output via 'nft
export json' command, it is hardly useable since it exports all the gory
details of nftables VM. Also, libnftables has no control over what is
exported since the content comes directly from libnftnl.
Instead, implement JSON format support for regular 'nft list' commands.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes static flag and adds header prototype for the following
functions:
* must_print_eq_op() from src/expression.c
* fib_result_str() from src/fib.c
* set_policy2str() and chain_policy2str from src/rule.c
In fib.h, include linux/netfilter/nf_tables.h to make sure enum
nft_fib_result is known when including this file.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a bit of code reorganization, relational meta OPs OP_RANGE,
OP_FLAGCMP and OP_LOOKUP become unused and can be removed. The only meta
OP left is OP_IMPLICIT which is usually treated as alias to OP_EQ.
Though it needs to stay in place for one reason: When matching against a
bitmask (e.g. TCP flags or conntrack states), it has a different
meaning:
| nft --debug=netlink add rule ip t c tcp flags syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ bitwise reg 1 = (reg=1 & 0x00000002 ) ^ 0x00000000 ]
| [ cmp neq reg 1 0x00000000 ]
| nft --debug=netlink add rule ip t c tcp flags == syn
| ip t c
| [ meta load l4proto => reg 1 ]
| [ cmp eq reg 1 0x00000006 ]
| [ payload load 1b @ transport header + 13 => reg 1 ]
| [ cmp eq reg 1 0x00000002 ]
OP_IMPLICIT creates a match which just checks the given flag is present,
while OP_EQ creates a match which ensures the given flag and no other is
present.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
You need a Linux kernel >= 4.15 to use this feature.
This patch allows us to dump the content of an existing set.
# nft list ruleset
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 3.3.3.3,
5.5.5.5-6.6.6.6 }
}
}
You check if a single element exists in the set:
# nft get element x x { 1.1.1.5 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
Output means '1.1.1.5' belongs to the '1.1.1.1-2.2.2.2' interval.
You can also check for intervals:
# nft get element x x { 1.1.1.1-2.2.2.2 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2 }
}
}
If you try to check for an element that doesn't exist, an error is
displayed.
# nft get element x x { 1.1.1.0 }
Error: Could not receive set elements: No such file or directory
get element x x { 1.1.1.0 }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can also check for multiple elements in one go:
# nft get element x x { 1.1.1.5, 5.5.5.10 }
table ip x {
set x {
type ipv4_addr
flags interval
elements = { 1.1.1.1-2.2.2.2, 5.5.5.5-6.6.6.6 }
}
}
You can also use this to fetch the existing timeout for specific
elements, in case you have a set with timeouts in place:
# nft get element w z { 2.2.2.2 }
table ip w {
set z {
type ipv4_addr
timeout 30s
elements = { 2.2.2.2 expires 17s }
}
}
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows you to create flowtable:
# nft add table x
# nft add flowtable x m { hook ingress priority 10\; devices = { eth0, wlan0 }\; }
You have to specify hook and priority. So far, only the ingress hook is
supported. The priority represents where this flowtable is placed in the
ingress hook, which is registered to the devices that the user
specifies.
You can also use the 'create' command instead to bail out in case that
there is an existing flowtable with this name.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new variable expression that we can use to attach symbols in
runtime, this allows us to redefine variables via new keyword, eg.
table ip x {
chain y {
define address = { 1.1.1.1, 2.2.2.2 }
ip saddr $address
redefine address = { 3.3.3.3 }
ip saddr $address
}
}
# nft list ruleset
table ip x {
chain y {
ip saddr { 1.1.1.1, 2.2.2.2 }
ip saddr { 3.3.3.3 }
}
}
Note that redefinition just places a new symbol version before the
existing one, so symbol lookups always find the latest version. The
undefine keyword decrements the reference counter and removes the symbol
from the list, so it cannot be used anymore. Still, previous references
to this symbol via variable expression are still valid.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
make syntax consistent between print and parse.
No dependency handling -- once you use raw expression, you need
to make sure the raw expression only sees the packets that you'd
want it to see.
based on an earlier patch from Laurent Fasnacht <l@libres.ch>.
Laurents patch added a different syntax:
@<protocol>,<base>,<data type>,<offset>,<length>
data_type is useful to make nftables not err when
asking for "@payload,32,32 192.168.0.1", this patch still requires
manual convsersion to an integer type (hex or decimal notation).
data_type should probably be added later by adding an explicit
cast expression, independent of the raw payload syntax.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, when adding multiple ranges to a set they were merged if
overlapping or adjacent. This might cause inconvenience though since it
is afterwards not easily possible anymore to remove one of the merged
ranges again while keeping the others in place.
Since it is not possible to have overlapping ranges, this patch adds a
check for newly added ranges to make sure they don't overlap if merging
is turned off.
Note that it is not possible (yet?) to enable range merging using nft
tool.
Testsuite had to be adjusted as well: One test in tests/py changed avoid
adding overlapping ranges and the test in tests/shell which explicitly
tests for this feature dropped.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an obscure bug on big-endian systems when trying to list a rule
containing the expression 'ct helper tftp' which triggers the assert()
call in mpz_get_type().
Florian identified the cause: ct_expr_pctx_update() is called for the
relational expression which calls mpz_get_uint32() to get RHS value
(assuming it is a protocol number). On big-endian systems, the
misinterpreted value exceeds UINT_MAX.
Expressions' pctx_update() callback should only be called for protocol
matches, so ct_meta_common_postprocess() lacked a check for 'left->flags
& EXPR_F_PROTOCOL' like the one already present in
payload_expr_pctx_update().
In order to fix this in a clean way, this patch introduces a wrapper
relational_expr_pctx_update() to be used instead of directly calling
LHS's pctx_update() callback which unifies the necessary checks (and
adds one more assert):
- assert(expr->ops->type == EXPR_RELATIONAL)
-> This is new, just to ensure the wrapper is called properly.
- assert(expr->op == OP_EQ)
-> This was moved from {ct,meta,payload}_expr_pctx_update().
- left->ops->pctx_update != NULL
-> This was taken from expr_evaluate_relational(), a necessary
requirement for the introduced wrapper to function at all.
- (left->flags & EXPR_F_PROTOCOL) != 0
-> The crucial missing check which led to the problem.
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
| |
ct keys can match on network and tranasport header protocol
elements, such as port numbers or ip addresses.
Store this base type so a followup commit can store and kill
dependencies, e.g. if bsae is network header we might be able
to kill an earlier expression because the dependency is implicit.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
current syntax is:
ct original saddr $address
problem is that in inet, bridge etc. we lack context to
figure out if this should fetch ipv6 or ipv4 from the conntrack
structure.
$address might not exist, rhs could e.g. be a set reference.
One way to do this is to have users manually specifiy the dependeny:
ct l3proto ipv4 ct original saddr $address
Thats ugly, and, moreover, only needed for table families
other than ip or ipv6.
Pablo suggested to instead specify ip saddr, ip6 saddr:
ct original ip saddr $address
and let nft handle the dependency injection.
This adds the required parts to the scanner and the grammar, next
commit adds code to eval step to make use of this.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces nft_print()/nft_gmp_print() functions which have
to be used instead of printf to output information that were previously
send to stdout. These functions print to a FILE pointer defined in
struct output_ctx. It is set by calling:
| old_fp = nft_ctx_set_output(ctx, new_fp);
Having an application-defined FILE pointer is actually quite flexible:
Using fmemopen() or even fopencookie(), an application gains full
control over what is printed and where it should go to.
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
So this toggle is not global anymore. Update name that fits better with
the semantics of this variable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
This flag is required by userspace only, so can live within userdata.
It's sole purpose is for 'nft monitor' to detect half-open ranges (which
are comprised of a single element only).
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usually one wants to at least initialize set_flags from the parent, so
make allocation of a set's set expression more convenient.
The idea to do this came when fixing an issue with output formatting of
larger anonymous sets in nft monitor: Since
netlink_events_cache_addset() didn't initialize set_flags,
calculate_delim() didn't detect it's an anonymous set and therefore
added newlines to the output.
Reported-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
Fixes: a9dc3ceabc10f ("expression: print sets and maps in pretty format")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libnftables library will be created soon. So declare numeric_output,
stateless_output, ip2name_output and handle_output as members of
structure output_ctx, instead of global variables. Rename these
variables as following,
numeric_output -> numeric
stateless_output -> stateless
ip2name_output -> ip2name
handle_output -> handle
Also add struct output_ctx *octx as member of struct netlink_ctx.
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Typing the "nft add rule x y ct mark set jhash ip saddr mod 2" will
not generate a random seed, instead, the seed will always be zero.
So if seed option is empty, we shoulde not set the NFTA_HASH_SEED
attribute, then a random seed will be generated in the kernel.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows to check whether a FIB entry exists for a given packet by
comparing the expression with a boolean keyword like so:
| fib daddr oif exists
The implementation requires introduction of a generic expression flag
EXPR_F_BOOLEAN which allows relational expression to signal it's LHS
that a boolean comparison is being done (indicated by boolean type on
RHS). In contrast to exthdr existence checks, fib expression can't know
this in beforehand because the LHS syntax is absolutely identical to a
non-boolean comparison.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
This allows to have custom flags in exthdr expression, which is
necessary for upcoming existence checks (of both IPv6 extension headers
as well as TCP options).
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch provides symmetric hash support according to source
ip address and port, and destination ip address and port.
The new attribute NFTA_HASH_TYPE has been included to support
different types of hashing functions. Currently supported
NFT_HASH_JENKINS through jhash and NFT_HASH_SYM through symhash.
The main difference between both types are:
- jhash requires an expression with sreg, symhash doesn't.
- symhash supports modulus and offset, but not seed.
Examples:
nft add rule ip nat prerouting ct mark set jhash ip saddr mod 2
nft add rule ip nat prerouting ct mark set symhash mod 2
Signed-off-by: Laura Garcia Liebana <laura.garcia@zevenet.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables nft to match against TCP options.
Currently these TCP options are supported:
* End of Option List (eol)
* No-Operation (noop)
* Maximum Segment Size (maxseg)
* Window Scale (window)
* SACK Permitted (sack_permitted)
* SACK (sack)
* Timestamps (timestamp)
Syntax: tcp options $option_name [$offset] $field_name
Example:
# count all incoming packets with a specific maximum segment size `x`
# nft add rule filter input tcp option maxseg size x counter
# count all incoming packets with a SACK TCP option where the third
# (counted from zero) left field is greater `x`.
# nft add rule filter input tcp option sack 2 left \> x counter
If the offset (the `2` in the example above) is zero, it can optionally
be omitted.
For all non-SACK TCP options it is always zero, thus can be left out.
Option names and field names are parsed from templates, similar to meta
and ct options rather than via keywords to prevent adding more keywords
than necessary.
Signed-off-by: Manuel Messner <mm@skelett.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So users can better track their ruleset via git.
Without sorting, the elements can be listed in a different order
every time the set is created, generating unnecessary git changes.
Mergesort is used. Doesn't sort sets with 'flags interval' set on.
Pablo appends to this changelog description:
Currently these interval set elements are dumped in order. We'll likely
get new representations soon that may not guarantee this anymore, so
let's revisit this later in case we need it.
Without this patch, nft list ruleset with a set containing 40000
elements takes on my laptop:
real 0m2.742s
user 0m0.112s
sys 0m0.280s
With this patch:
real 0m2.846s
user 0m0.180s
sys 0m0.284s
Difference is small, so don't get nft more complicated with yet another
getopt() option, enable this by default.
Signed-off-by: Elise Lennion <elise.lennion@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Add support to add an offset to the hash generator, eg.
ct mark set hash ip saddr mod 10 offset 100
This will generate marks with series between 100-109.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the 'fib' expression which can be used to
obtain the output interface from the route table based on either
source or destination address of a packet.
This can be used to e.g. add reverse path filtering:
# drop if not coming from the same interface packet
# arrived on
# nft add rule x prerouting fib saddr . iif oif eq 0 drop
# accept only if from eth0
# nft add rule x prerouting fib saddr . iif oif eq "eth0" accept
# accept if from any valid interface
# nft add rule x prerouting fib saddr oif accept
Querying of address type is also supported. This can be used
to e.g. only accept packets to addresses configured in the same
interface:
# fib daddr . iif type local
Its also possible to use mark and verdict map, e.g.:
# nft add rule x prerouting meta mark set 0xdead fib daddr . mark type vmap {
blackhole : drop,
prohibit : drop,
unicast : accept
}
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce rt expression for routing related data with support for nexthop
(i.e. the directly connected IP address that an outgoing packet is sent
to), which can be used either for matching or accounting, eg.
# nft add rule filter postrouting \
ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.
# nft add rule filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
# nft add rule ip6 filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
These rules count outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10 minutes.
# nft add rule inet filter postrouting \
ether type ip \
flow table acct { rt nexthop timeout 600s counter }
# nft add rule inet filter postrouting \
ether type ip6 \
flow table acct { rt nexthop timeout 600s counter }
Same as above, but via the inet family, where the ether type must be
specified explicitly.
"rt classid" is also implemented identical to "meta rtclassid", since it
is more logical to have this match in the routing expression going forward.
Signed-off-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support to add an offset to the numgen generated value.
Example:
ct mark set numgen inc mod 2 offset 100
This will generate marks with serie like 100, 101, 100, ...
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can validate that values don't get over the maximum datatype
length, this is expressed in number of bits, so the maximum value
is always power of 2.
However, since we got the hash and numgen expressions, the user should
not set a value higher that what the specified modulus option, which
may not be power of 2. This patch extends the expression context with
a new optional field to store the maximum value.
After this patch, nft bails out if the user specifies non-sense rules
like those below:
# nft add rule x y jhash ip saddr mod 10 seed 0xa 10
<cmdline>:1:45-46: Error: Value 10 exceeds valid range 0-9
add rule x y jhash ip saddr mod 10 seed 0xa 10
^^
The modulus sets a valid value range of [0, n), so n is out of the valid
value range.
# nft add rule x y numgen inc mod 10 eq 12
<cmdline>:1:35-36: Error: Value 12 exceeds valid range 0-9
add rule x y numgen inc mod 10 eq 12
^^
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is special expression that transforms an input expression into a
32-bit unsigned integer. This expression takes a modulus parameter to
scale the result and the random seed so the hash result becomes harder
to predict.
You can use it to set the packet mark, eg.
# nft add rule x y meta mark set jhash ip saddr . ip daddr mod 2 seed 0xdeadbeef
You can combine this with maps too, eg.
# nft add rule x y dnat to jhash ip saddr mod 2 seed 0xdeadbeef map { \
0 : 192.168.20.100, \
1 : 192.168.30.100 \
}
Currently, this expression implements the jenkins hash implementation
available in the Linux kernel:
http://lxr.free-electrons.com/source/include/linux/jhash.h
But it should be possible to extend it to support any other hash
function type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new expression allows us to generate incremental and random numbers
bound to a specified modulus value.
The following rule sets the conntrack mark of 0 to the first packet seen,
then 1 to second packet, then 0 again to the third packet and so on:
# nft add rule x y ct mark set numgen inc mod 2
A more useful example is a simple load balancing scenario, where you can
also use maps to set the destination NAT address based on this new numgen
expression:
# nft add rule nat prerouting \
dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 }
So this is distributing new connections in a round-robin fashion between
192.168.10.100 and 192.168.20.200. Don't forget the special NAT chain
semantics: Only the first packet evaluates the rule, follow up packets
rely on conntrack to apply the NAT information.
You can also emulate flow distribution with different backend weights
using intervals:
# nft add rule nat prerouting \
dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200 }
So 192.168.10.100 gets 60% of the workload, while 192.168.20.200 gets 40%.
We can also be mixed with dynamic sets, thus weight can be updated in
runtime.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The flow statement allows to instantiate per flow statements for user
defined flows. This can so far be used for per flow accounting or limiting,
similar to what the iptables hashlimit provides. Flows can be aged using
the timeout option.
Examples:
# nft filter input flow ip saddr . tcp dport limit rate 10/second
# nft filter input flow table acct iif . ip saddr timeout 60s counter
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|