| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
During delinearization we attempt to remove masks, for instance
ip saddr $x/32. (mask matches the entire size).
However, in some special cases the lhs size is unknown (0), this
happens f.e. with
'ct saddr original 1.2.3.4/24' which had its '/24' chopped off.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expr->len 0 can appear for some data types whose size can be different
based on some external state, e.g. the conntrack src/dst addresses.
The nft type is 'invalid/0-length' in the template definition, the
size is set (on linearization) based on the network base family,
i.e. the type is changed to ip or ipv6 address at a later stage.
For delinarization, skip zero-length expression as concat type
and give expr_postprocess a chance to fix the types.
Without this change the previous patch will result in nft consuming all
available memory when trying to display e.g. a 'ct saddr' rule.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few keys in the ct expression are directional, i.e.
we need to tell kernel if it should fetch REPLY or ORIGINAL direction.
Split ct_keys into ct_keys & ct_keys_dir, the latter are those keys
that the kernel rejects unless also given a direction.
During postprocessing we also need to invoke ct_expr_update_type,
problem is that e.g. ct saddr can be any family (ip, ipv6) so we need
to update the expected data type based on the network base.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
564b0e7c13f9 ("netlink_delinearize: postprocess expression before range
merge") crashes nft when the previous statement is removed via
payload_dependency_kill() as this pointer is not valid anymore.
Move the pointer to the previous statement to rule_pp_ctx and invalidate
it when required.
Reported-by: "Pablo M. Bermudo Garay" <pablombg@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reported-by: "Pablo M. Bermudo Garay" <pablombg@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Dependency statement go away after postprocess, so we should consider
them for possible range merges.
This problem was uncovered when adding support for sub-byte payload
ranges.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update bitfield definitions to match according to the way they are
expressed in RFC and IEEE specifications.
This required a bit of update for c3f0501 ("src: netlink_linearize:
handle sub-byte lengths").
>From the linearize step, to calculate the shift based on the bitfield
offset, we need to obtain the length of the word in bytes:
len = round_up(expr->len, BITS_PER_BYTE);
Then, we substract the offset bits and the bitfield length.
shift = len - (offset + expr->len);
From the delinearize, payload_expr_trim() needs to obtain the real
offset through:
off = round_up(mask->len, BITS_PER_BYTE) - mask_len;
For vlan id (offset 12), this gets the position of the last bit set in
the mask (ie. 12), then we substract the length we fetch in bytes (16),
so we obtain the real bitfield offset (4).
Then, we add that to the original payload offset that was expressed in
bytes:
payload_offset += off;
Note that payload_expr_trim() now also adjusts the payload expression to
its real length and offset so we don't need to propagate the mask
expression.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
We have to clone the payload expression before attaching it to the lhs
of the relational expression, this payload expression is located at the
lhs of the binary operation that is released thereafter.
Fixes: 39f15c2 ("nft: support listing expressions that use non-byte header fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
| |
The conversion to the net libnftnl API has left a lot of indentation damage
in the netlink functions. Fix it up.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
| |
Add support for payload mangling using the payload statement. The syntax
is similar to the other data changing statements:
nft filter output tcp dport set 25
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
| |
The comment does not belong to the handle, it belongs to the rule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Contrary to iptables, we use the asterisk character '*' as wildcard.
# nft --debug=netlink add rule test test iifname eth\*
ip test test
[ meta load iifname => reg 1 ]
[ cmp eq reg 1 0x00687465 ]
Note that this generates an optimized comparison without bitwise.
In case you want to match a device that contains an asterisk, you have
to escape the asterisk, ie.
# nft add rule test test iifname eth\\*
The wildcard string handling occurs from the evaluation step, where we
convert from:
relational
/ \
/ \
meta value
oifname eth*
to:
relational
/ \
/ \
meta prefix
ofiname
As Patrick suggested, this not actually a wildcard but a prefix since it
only applies to the string when placed at the end.
More comments:
* This relaxes the left->size > right->size from netlink_parse_cmp()
for strings since the optimization that this patch applies may now
result in bogus errors.
* This patch can be later on extended to apply a similar optimization to
payload expressions when:
expr->len % BITS_PER_BYTE == 0
For meta and ct, the kernel checks for the exact length of the attributes
(it expects integer 32 bits) so we can't do it unless we relax that.
* Wildcard strings are not supported from sets and maps yet. Error
reporting is not very good at this stage since expr_evaluate_prefix()
doesn't have enough context (ctx->set is NULL, the set object is
currently created later after evaluating the lhs and rhs of the
relational). I'll be following up on this later.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
This allows you to clone packets to destination address, eg.
... dup to 172.20.0.2
... dup to 172.20.0.2 device eth1
... dup to ip saddr map { 192.168.0.2 : 172.20.0.2, ... } device eth1
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
... limit rate 1024 mbytes/second burst 10240 bytes
... limit rate 1/second burst 3 packets
This parameter is optional.
You need a Linux kernel >= 4.3-rc1.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
| |
This example show how to accept packets below the ratelimit:
... limit rate 1024 mbytes/second counter accept
You need a Linux kernel >= 4.3-rc1.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
|
|
|
|
|
|
|
| |
This allows to list rules that check fields that are not aligned on byte
boundary.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently 'vlan id 42' or even 'vlan type ip' doesn't work since
we expect ethernet header but get vlan.
So if we want to add another protocol header to the same base, we
attempt to figure out if the new header can fit on top of the existing
one (i.e. proto_find_num gives a protocol number when asking to find
link between the two).
We also annotate protocol description for eth and vlan with the full
header size and track the offset from the current base.
Otherwise, 'vlan type ip' fetches the protocol field from mac header
offset 0, which is some mac address.
Instead, we must consider full size of ethernet header.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
|
|
|
|
|
|
|
| |
Adapt the nftables code to use the new symbols in libnftnl. This patch contains
quite some renaming to reserve the nft_ prefix for our high level library.
Explicitly request libnftnl 1.0.5 at configure stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Florian Westphal says:
09565a4b1ed4863d44c4509a93c50f44efd12771 ("netlink_delinearize: consolidate
range printing") causes nft to segfault on 32bit machine when printing l4proto
ranges.
The problem is that meta_expr_pctx_update() assumes that right is a value, but
after this change it can also be a range.
Thus, expr->value contents are undefined (its union). On x86_64 this is also
broken but by virtue of struct layout and pointer sizes, value->_mp_size will
almost always be 0 so mpz_get_uint8() returns 0.
But on x86-32 _mp_size will be huge value (contains expr->right pointer of
range), so we crash in libgmp.
Pablo says:
We shouldn't call pctx_update(), before the transformation we had
there a expr->op == { OP_GT, OP_GTE, OP_LT, OP_LTE }. So we never
entered that path as the assert in payload_expr_pctx_update()
indicates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Florian Westphal <fw@strlen.de>
|
|
|
|
| |
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the RHS length differs from the LHS length (which is only the
first expression), both expressions are assumed to be concat expressions.
The LHS concat expression is reconstructed from the available register
values, advancing by the number of registers required by the subexpressions'
register space, until the RHS length has been reached.
The RHS concat expression is reconstructed by splitting the data value
into multiple subexpressions based on the LHS concat expressions types.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce a helper function to translate register numbers from the kernel
from the compat values to the NFT_REG32 values.
Internally we use the register numbers 0-16:
* 0 is the verdict register in both old and new addressing modes.
* 1-16 are the 32 bit data registers
The NFT_REG32_00 values are mapped to 1-16, the NFT_REG_1-NFT_REG_4
values are each use up 4 registers starting at 1 (1, 5, 9, 13).
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
before:
table ip filter {
chain test {
cpu { 67108864, 50331648, 33554432}
}
}
after:
table ip filter {
chain test {
cpu { 4, 3, 2 }
}
}
Related to 525323352904 ("expr: add set_elem_expr as container for set element
attributes").
We'll have to revisit this once we have support to use integer datatypes from
set declarations, see: http://patchwork.ozlabs.org/patch/480068/
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|\ \ \
| |_|/
|/| | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a routine to the postprocess stage to check if the previous
expression statement and the current actually represent a range, so we can
provide a more compact listing, eg.
# nft -nn list table test
table ip test {
chain test {
tcp dport 22
tcp dport 22-23
tcp dport != 22-23
ct mark != 0x00000016-0x00000017
ct mark 0x00000016-0x00000017
mark 0x00000016-0x00000017
mark != 0x00000016-0x00000017
}
}
To do so, the context state stores a pointer to the current statement. This
pointer needs to be invalidated in case the current statement is replaced.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This function encapsulates the payload expansion logic. This change in required
by the follow up patch to consolidate range printing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is required by the range postprocess routine that comes in follow up
patches.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |/
| |
| |
| |
| |
| | |
Instead of a copy of the context variable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The set statement is used to dynamically add or update elements in a set.
Syntax:
# nft filter input set add tcp dport @myset
# nft filter input set add ip saddr timeout 10s @myset
# nft filter input set update ip saddr timeout 10s @myset
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a new expression type "set_elem_expr" that is used as container for
the key in order to attach different attributes, such as timeout values,
to the key.
The expression hierarchy is as follows:
Sets:
elem
|
key
Maps:
mapping
/ \
elem data
|
key
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|/
|
|
|
|
|
|
| |
The FIXME was related to exclusion of string types from cmp length checks.
Since with fixed sized helper names the last case where this could happen
is gone, remove the FIXME and perform length checks on strings as well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
payload_dependency_kill() does not properly handle dependencies for link
layer expressions. Since those dependencies are logically defined on an
even lower layer (device layer), we don't have a payload base for them,
meaning they will use PROTO_BASE_INVALID, which is skipped.
So instead of storing the payload base on which the dependency is defined,
we store the base of the layer for which the dependency applies, meaning
dependencies defined by the device layer will properly work.
This fixes killing the dependency of ether saddr, instead of
iiftype ether ether ether saddr ...
we now only display
ether saddr ...
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
| |
Add a helper function to parse netlink register numbers in preparation
of concat support.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
|
| |
The netlink parsing code is full of long function calls spawning multiple
lines and in some cases parses netlink attributes multiple times.
Use local variables for the registers and other data required to
reconstruct the expressions and statements and reorder the code in
some cases to move related processing next to each other.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
| |
These are really badly chosen names, use parse_expr and parse_stmt instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
|
| |
netlink_delinearize is prepared to deal with malformed expressions from
the kernel that it doesn't understand. However since expressions are now
cloned unconditionally by netlink_get_register(), we crash before such
errors can be detected for invalid inputs.
Fix by only cloning non-NULL expressions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
| |
integer_type
nft list table filter
...
cpu { 50331648, 33554432, 0, 16777216} counter packets 8 bytes 344
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we add this rule:
nft add rule filter input meta length 33-55
the listing shows:
meta length >= 33 meta length <= 754974720
The two meta statements share the same left-hand side, thus, only the
first one is converted from network byte order to host byte order.
Update netlink_get_register() to return a clone so each left-hand side
has its own left-hand side.
Moreover, release the existing register before overriding it with fresh
expressions in netlink_set_register().
Thefore, if you manipulate a register from any of the existing parse
functions, you have to re-set it again to place fresh modified clone.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds redirect support for nft.
The syntax is:
% nft add rule nat prerouting redirect [port] [nat_flags]
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you add the rule:
nft add rule inet filter input reject with icmpx type host-unreachable
nft list table inet filter
shows:
table inet filter {
chain input {
reject with icmpx type 2
}
}
We have to attach the icmpx datatype when we list the rules that use it. With
this patch if we list the ruleset, the output is:
table inet filter {
chain input {
reject with icmpx type host-unreachable
}
}
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds masquerade support for nft.
The syntax is:
% nft add rule nat postrouting masquerade [flags]
Currently, flags are:
random, random-fully, persistent
Example:
% nft add rule nat postrouting masquerade random,persistent
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds more configuration options to the nat expression.
The syntax is as follow:
% nft add rule nat postrouting <snat|dnat> <nat_arguments> [flags]
Flags are: random, persistent, random-fully.
Example:
% nft add rule nat postrouting dnat 1.1.1.1 random,persistent
A requirement is to cache some [recent] copies of kernel headers.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows to use the reject action in rules. For example:
nft add rule filter input udp dport 22 reject
In this rule, we assume that the reason is network unreachable. Also
we can specify the reason with the option "with" and the reason. For example:
nft add rule filter input tcp dport 22 reject with icmp type host-unreachable
In the bridge tables and inet tables, we can use this action too. For example:
nft add rule inet filter input reject with icmp type host-unreachable
In this rule above, this generates a meta nfproto dependency to match
ipv4 traffic because we use a icmpv4 reason to reject.
If the reason is not specified, we infer it from the context.
Moreover, we have the new icmpx datatype. You can use this datatype for
the bridge and the inet tables to simplify your ruleset. For example:
nft add rule inet filter input reject with icmpx type host-unreachable
We have four icmpx reason and the mapping is:
ICMPX reason | ICMPv6 | ICMPv4
| |
admin-prohibited | admin-prohibited | admin-prohibited
port-unreachable | port-unreachable | port-unreachable
no-route | no-route | net-unreachable
host-unreachable | addr-unreachable | host-unreachable
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename keyword tokens to their actual keyword
- Change the grammar to follow the standard schema for statements and arguments
- Use actual expression for the queue numbers to support using normal range
expressions, symbolic expression and so on.
- restore comma seperation of flag keywords
The result is that its possible to use standard ranges, prefix expressions,
symbolic expressions etc for the queue number. We get checks for overflow,
negative ranges and so on automatically.
The comma seperation of flags is more similar to what we have for other
flag values. It is still possible to use spaces, however this could be
removed since we never had a release supporting that.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is required if you use upcoming Linux kernels >= 3.17
which come with a complete logging support for nf_tables.
If you use 'log' without options, the kernel logging buffer is used:
nft> add rule filter input log
You can also specify the logging prefix string:
nft> add rule filter input log prefix "input: "
You may want to specify the log level:
nft> add rule filter input log prefix "input: " level notice
By default, if not specified, the default level is 'warn' (just like
in iptables).
If you specify the group, then nft uses the nfnetlink_log instead:
nft> add rule filter input log prefix "input: " group 10
You can also specify the snaplen and qthreshold for the nfnetlink_log.
But you cannot mix level and group at the same time, they are mutually
exclusive.
Default values for both snaplen and qthreshold are 0 (just like in
iptables).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts Alvaro's 34040b1 ("reject: add ICMP code parameter
for indicating the type of error") and 11b2bb2 ("reject: Use protocol
context for indicating the reject type").
These patches are flawed by two things:
1) IPv6 support is broken, only ICMP codes are considered.
2) If you don't specify any transport context, the utility exits without
adding the rule, eg. nft add rule ip filter input reject.
The kernel is also flawed when it comes to the inet table. Let's revert
this until we can provide decent reject reason support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows to indicate the ICMP code field in case that we
use to reject. Before, we have always sent network unreachable error
as ICMP code, now we can explicitly indicate the ICMP code that
we want to use. Examples:
nft add rule filter input tcp dport 22 reject with host-unreach
nft add rule filter input udp dport 22 reject with host-unreach
In this case, it will use the host unreachable code to reject traffic.
The default code field still is network unreachable and we can also
use the rules without the with like that:
nft add rule filter input udp dport 22 reject
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
|
|
|
|
|
|
|
| |
This patch uses the protocol context to initialize the reject type
considering if the transport protocol is tcp, udp, etc. Before this
patch, this was left unset.
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|