summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/Makefile.am30
-rw-r--r--doc/data-types.txt70
-rw-r--r--doc/libnftables-json.adoc73
-rw-r--r--doc/libnftables.adoc49
-rw-r--r--doc/nft.txt111
-rw-r--r--doc/payload-expression.txt194
-rw-r--r--doc/primary-expression.txt20
-rw-r--r--doc/stateful-objects.txt2
-rw-r--r--doc/statements.txt198
9 files changed, 567 insertions, 180 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am
deleted file mode 100644
index 21482320..00000000
--- a/doc/Makefile.am
+++ /dev/null
@@ -1,30 +0,0 @@
-if BUILD_MAN
-man_MANS = nft.8 libnftables-json.5 libnftables.3
-
-A2X_OPTS_MANPAGE = -L --doctype manpage --format manpage -D ${builddir}
-
-ASCIIDOC_MAIN = nft.txt
-ASCIIDOC_INCLUDES = \
- data-types.txt \
- payload-expression.txt \
- primary-expression.txt \
- stateful-objects.txt \
- statements.txt
-ASCIIDOCS = ${ASCIIDOC_MAIN} ${ASCIIDOC_INCLUDES}
-
-EXTRA_DIST = ${ASCIIDOCS} ${man_MANS} libnftables-json.adoc libnftables.adoc
-
-CLEANFILES = \
- *~
-
-nft.8: ${ASCIIDOCS}
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-.adoc.3:
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-.adoc.5:
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-CLEANFILES += ${man_MANS}
-endif
diff --git a/doc/data-types.txt b/doc/data-types.txt
index 961fc624..6c0e2f94 100644
--- a/doc/data-types.txt
+++ b/doc/data-types.txt
@@ -242,35 +242,13 @@ integer
The ICMP Code type is used to conveniently specify the ICMP header's code field.
-.Keywords may be used when specifying the ICMP code
-[options="header"]
-|==================
-|Keyword | Value
-|net-unreachable |
-0
-|host-unreachable |
-1
-|prot-unreachable|
-2
-|port-unreachable|
-3
-|frag-needed|
-4
-|net-prohibited|
-9
-|host-prohibited|
-10
-|admin-prohibited|
-13
-|===================
-
ICMPV6 TYPE TYPE
~~~~~~~~~~~~~~~~
[options="header"]
|==================
|Name | Keyword | Size | Base type
|ICMPv6 Type |
-icmpx_code |
+icmpv6_type |
8 bit |
integer
|===================
@@ -340,52 +318,6 @@ integer
The ICMPv6 Code type is used to conveniently specify the ICMPv6 header's code field.
-.keywords may be used when specifying the ICMPv6 code
-[options="header"]
-|==================
-|Keyword |Value
-|no-route|
-0
-|admin-prohibited|
-1
-|addr-unreachable|
-3
-|port-unreachable|
-4
-|policy-fail|
-5
-|reject-route|
-6
-|==================
-
-ICMPVX CODE TYPE
-~~~~~~~~~~~~~~~~
-[options="header"]
-|==================
-|Name | Keyword | Size | Base type
-|ICMPvX Code |
-icmpv6_type |
-8 bit |
-integer
-|===================
-
-The ICMPvX Code type abstraction is a set of values which overlap between ICMP
-and ICMPv6 Code types to be used from the inet family.
-
-.keywords may be used when specifying the ICMPvX code
-[options="header"]
-|==================
-|Keyword |Value
-|no-route|
-0
-|port-unreachable|
-1
-|host-unreachable|
-2
-|admin-prohibited|
-3
-|=================
-
CONNTRACK TYPES
~~~~~~~~~~~~~~~
diff --git a/doc/libnftables-json.adoc b/doc/libnftables-json.adoc
index 9cc17ff2..a8a6165f 100644
--- a/doc/libnftables-json.adoc
+++ b/doc/libnftables-json.adoc
@@ -175,7 +175,7 @@ kind, optionally filtered by *family* and for some, also *table*.
____
*{ "reset":* 'RESET_OBJECT' *}*
-'RESET_OBJECT' := 'COUNTER' | 'COUNTERS' | 'QUOTA' | 'QUOTAS'
+'RESET_OBJECT' := 'COUNTER' | 'COUNTERS' | 'QUOTA' | 'QUOTAS' | 'RULE' | 'RULES' | 'SET' | 'MAP' | 'ELEMENT'
____
Reset state in suitable objects, i.e. zero their internal counter.
@@ -202,12 +202,19 @@ Rename a chain. The new name is expected in a dedicated property named
=== TABLE
[verse]
+____
*{ "table": {
"family":* 'STRING'*,
"name":* 'STRING'*,
- "handle":* 'NUMBER'
+ "handle":* 'NUMBER'*,
+ "flags":* 'TABLE_FLAGS'
*}}*
+'TABLE_FLAGS' := 'TABLE_FLAG' | *[* 'TABLE_FLAG_LIST' *]*
+'TABLE_FLAG_LIST' := 'TABLE_FLAG' [*,* 'TABLE_FLAG_LIST' ]
+'TABLE_FLAG' := *"dormant"* | *"owner"* | *"persist"*
+____
+
This object describes a table.
*family*::
@@ -217,6 +224,8 @@ This object describes a table.
*handle*::
The table's handle. In input, it is used only in *delete* command as
alternative to *name*.
+*flags*::
+ The table's flags.
=== CHAIN
[verse]
@@ -312,7 +321,8 @@ ____
"elem":* 'SET_ELEMENTS'*,
"timeout":* 'NUMBER'*,
"gc-interval":* 'NUMBER'*,
- "size":* 'NUMBER'
+ "size":* 'NUMBER'*,
+ "auto-merge":* 'BOOLEAN'
*}}*
*{ "map": {
@@ -327,7 +337,8 @@ ____
"elem":* 'SET_ELEMENTS'*,
"timeout":* 'NUMBER'*,
"gc-interval":* 'NUMBER'*,
- "size":* 'NUMBER'
+ "size":* 'NUMBER'*,
+ "auto-merge":* 'BOOLEAN'
*}}*
'SET_TYPE' := 'STRING' | *[* 'SET_TYPE_LIST' *]*
@@ -366,6 +377,8 @@ that they translate a unique key to a value.
Garbage collector interval in seconds.
*size*::
Maximum number of elements supported.
+*auto-merge*::
+ Automatic merging of adjacent/overlapping set elements in interval sets.
==== TYPE
The set type might be a string, such as *"ipv4_addr"* or an array
@@ -682,11 +695,6 @@ processing continues with the next rule in the same chain.
==== OPERATORS
[horizontal]
-*&*:: Binary AND
-*|*:: Binary OR
-*^*:: Binary XOR
-*<<*:: Left shift
-*>>*:: Right shift
*==*:: Equal
*!=*:: Not equal
*<*:: Less than
@@ -1059,10 +1067,22 @@ Assign connection tracking expectation.
=== XT
[verse]
-*{ "xt": null }*
+____
+*{ "xt": {
+ "type":* 'TYPENAME'*,
+ "name":* 'STRING'
+*}}*
+
+'TYPENAME' := *match* | *target* | *watcher*
+____
-This represents an xt statement from xtables compat interface. Sadly, at this
-point, it is not possible to provide any further information about its content.
+This represents an xt statement from xtables compat interface. It is a
+fallback if translation is not available or not complete.
+
+Seeing this means the ruleset (or parts of it) were created by *iptables-nft*
+and one should use that to manage it.
+
+*BEWARE:* nftables won't restore these statements.
== EXPRESSIONS
Expressions are the building blocks of (most) statements. In their most basic
@@ -1172,7 +1192,7 @@ point (*base*). The following *base* values are accepted:
*"th"*::
The offset is relative to Transport Layer header start offset.
-The second form allows to reference a field by name (*field*) in a named packet
+The second form allows one to reference a field by name (*field*) in a named packet
header (*protocol*).
=== EXTHDR
@@ -1214,6 +1234,17 @@ If the *field* property is not given, the expression is to be used as an SCTP
chunk existence check in a *match* statement with a boolean on the right hand
side.
+=== DCCP OPTION
+[verse]
+*{ "dccp option": {
+ "type":* 'NUMBER'*
+*}}*
+
+Create a reference to a DCCP option (*type*).
+
+The expression is to be used as a DCCP option existence check in a *match*
+statement with a boolean on the right hand side.
+
=== META
[verse]
____
@@ -1321,15 +1352,17 @@ Perform kernel Forwarding Information Base lookups.
=== BINARY OPERATION
[verse]
-*{ "|": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "^": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "&": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "+<<+": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ ">>": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
+*{ "|": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "^": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "&": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "+<<+": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ ">>": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+'EXPRESSIONS' := 'EXPRESSION' | 'EXPRESSION'*,* 'EXPRESSIONS'
-All binary operations expect an array of exactly two expressions, of which the
+All binary operations expect an array of at least two expressions, of which the
first element denotes the left hand side and the second one the right hand
-side.
+side. Extra elements are accepted in the given array and appended to the term
+accordingly.
=== VERDICT
[verse]
diff --git a/doc/libnftables.adoc b/doc/libnftables.adoc
index 3abb9595..2cf78d7a 100644
--- a/doc/libnftables.adoc
+++ b/doc/libnftables.adoc
@@ -18,6 +18,9 @@ void nft_ctx_free(struct nft_ctx* '\*ctx'*);
bool nft_ctx_get_dry_run(struct nft_ctx* '\*ctx'*);
void nft_ctx_set_dry_run(struct nft_ctx* '\*ctx'*, bool* 'dry'*);
+unsigned int nft_ctx_input_get_flags(struct nft_ctx* '\*ctx'*);
+unsigned int nft_ctx_input_set_flags(struct nft_ctx* '\*ctx'*, unsigned int* 'flags'*);
+
unsigned int nft_ctx_output_get_flags(struct nft_ctx* '\*ctx'*);
void nft_ctx_output_set_flags(struct nft_ctx* '\*ctx'*, unsigned int* 'flags'*);
@@ -37,6 +40,9 @@ const char *nft_ctx_get_error_buffer(struct nft_ctx* '\*ctx'*);
int nft_ctx_add_include_path(struct nft_ctx* '\*ctx'*, const char* '\*path'*);
void nft_ctx_clear_include_paths(struct nft_ctx* '\*ctx'*);
+int nft_ctx_add_var(struct nft_ctx* '\*ctx'*, const char* '\*var'*);
+void nft_ctx_clear_vars(struct nft_ctx '\*ctx'*);
+
int nft_run_cmd_from_buffer(struct nft_ctx* '\*nft'*, const char* '\*buf'*);
int nft_run_cmd_from_filename(struct nft_ctx* '\*nft'*,
const char* '\*filename'*);*
@@ -68,13 +74,37 @@ The *nft_ctx_free*() function frees the context object pointed to by 'ctx', incl
=== nft_ctx_get_dry_run() and nft_ctx_set_dry_run()
Dry-run setting controls whether ruleset changes are actually committed on kernel side or not.
-It allows to check whether a given operation would succeed without making actual changes to the ruleset.
+It allows one to check whether a given operation would succeed without making actual changes to the ruleset.
The default setting is *false*.
The *nft_ctx_get_dry_run*() function returns the dry-run setting's value contained in 'ctx'.
The *nft_ctx_set_dry_run*() function sets the dry-run setting in 'ctx' to the value of 'dry'.
+=== nft_ctx_input_get_flags() and nft_ctx_input_set_flags()
+The flags setting controls the input format.
+
+----
+enum {
+ NFT_CTX_INPUT_NO_DNS = (1 << 0),
+ NFT_CTX_INPUT_JSON = (1 << 1),
+};
+----
+
+NFT_CTX_INPUT_NO_DNS::
+ Avoid resolving IP addresses with blocking getaddrinfo(). In that case,
+ only plain IP addresses are accepted.
+
+NFT_CTX_INPUT_JSON:
+ When parsing the input, first try to interpret the input as JSON before
+ falling back to the nftables format. This behavior is implied when setting
+ the NFT_CTX_OUTPUT_JSON flag.
+
+The *nft_ctx_input_get_flags*() function returns the input flags setting's value in 'ctx'.
+
+The *nft_ctx_input_set_flags*() function sets the input flags setting in 'ctx' to the value of 'val'
+and returns the previous flags.
+
=== nft_ctx_output_get_flags() and nft_ctx_output_set_flags()
The flags setting controls the output format.
@@ -115,10 +145,11 @@ NFT_CTX_OUTPUT_HANDLE::
NFT_CTX_OUTPUT_JSON::
If enabled at compile-time, libnftables accepts input in JSON format and is able to print output in JSON format as well.
See *libnftables-json*(5) for a description of the supported schema.
- This flag controls JSON output format, input is auto-detected.
+ This flag enables JSON output format. If the flag is set, the input will first be tried as JSON format,
+ before falling back to nftables format. This flag implies NFT_CTX_INPUT_JSON.
NFT_CTX_OUTPUT_ECHO::
The echo setting makes libnftables print the changes once they are committed to the kernel, just like a running instance of *nft monitor* would.
- Amongst other things, this allows to retrieve an added rule's handle atomically.
+ Amongst other things, this allows one to retrieve an added rule's handle atomically.
NFT_CTX_OUTPUT_GUID::
Display UID and GID as described in the /etc/passwd and /etc/group files.
NFT_CTX_OUTPUT_NUMERIC_PROTO::
@@ -196,9 +227,9 @@ On failure, the functions return non-zero which may only happen if buffering was
The *nft_ctx_get_output_buffer*() and *nft_ctx_get_error_buffer*() functions return a pointer to the buffered output (which may be empty).
=== nft_ctx_add_include_path() and nft_ctx_clear_include_path()
-The *include* command in nftables rulesets allows to outsource parts of the ruleset into a different file.
+The *include* command in nftables rulesets allows one to outsource parts of the ruleset into a different file.
The include path defines where these files are searched for.
-Libnftables allows to have a list of those paths which are searched in order.
+Libnftables allows one to have a list of those paths which are searched in order.
The default include path list contains a single compile-time defined entry (typically '/etc/').
The *nft_ctx_add_include_path*() function extends the list of include paths in 'ctx' by the one given in 'path'.
@@ -206,6 +237,14 @@ The function returns zero on success or non-zero if memory allocation failed.
The *nft_ctx_clear_include_paths*() function removes all include paths, even the built-in default one.
+=== nft_ctx_add_var() and nft_ctx_clear_vars()
+The *define* command in nftables ruleset allows one to define variables.
+
+The *nft_ctx_add_var*() function extends the list of variables in 'ctx'. The variable must be given in the format 'key=value'.
+The function returns zero on success or non-zero if the variable is malformed.
+
+The *nft_ctx_clear_vars*() function removes all variables.
+
=== nft_run_cmd_from_buffer() and nft_run_cmd_from_filename()
These functions perform the actual work of parsing user input into nftables commands and executing them.
diff --git a/doc/nft.txt b/doc/nft.txt
index e4ed9824..2080c073 100644
--- a/doc/nft.txt
+++ b/doc/nft.txt
@@ -9,7 +9,7 @@ nft - Administration tool of the nftables framework for packet filtering and cla
SYNOPSIS
--------
[verse]
-*nft* [ *-nNscaeSupyjt* ] [ *-I* 'directory' ] [ *-f* 'filename' | *-i* | 'cmd' ...]
+*nft* [ *-nNscaeSupyjtT* ] [ *-I* 'directory' ] [ *-f* 'filename' | *-i* | 'cmd' ...]
*nft* *-h*
*nft* *-v*
@@ -62,6 +62,11 @@ understanding of their meaning. You can get information about options by running
*--check*::
Check commands validity without actually applying the changes.
+*-o*::
+*--optimize*::
+ Optimize your ruleset. You can combine this option with '-c' to inspect
+ the proposed optimizations.
+
.Ruleset list output formatting that modify the output of the list ruleset command:
*-a*::
@@ -165,17 +170,23 @@ SYMBOLIC VARIABLES
~~~~~~~~~~~~~~~~~~
[verse]
*define* 'variable' *=* 'expr'
+*undefine* 'variable'
+*redefine* 'variable' *=* 'expr'
*$variable*
Symbolic variables can be defined using the *define* statement. Variable
references are expressions and can be used to initialize other variables. The scope
of a definition is the current block and all blocks contained within.
+Symbolic variables can be undefined using the *undefine* statement, and modified
+using the *redefine* statement.
.Using symbolic variables
---------------------------------------
define int_if1 = eth0
define int_if2 = eth1
define int_ifs = { $int_if1, $int_if2 }
+redefine int_if2 = wlan0
+undefine int_if2
filter input iif $int_ifs accept
---------------------------------------
@@ -310,10 +321,11 @@ Effectively, this is the nft-equivalent of *iptables-save* and
TABLES
------
[verse]
-{*add* | *create*} *table* ['family'] 'table' [ {*comment* 'comment' *;*'} *{ flags* 'flags' *; }*]
-{*delete* | *list* | *flush*} *table* ['family'] 'table'
+{*add* | *create*} *table* ['family'] 'table' [*{* [*comment* 'comment' *;*] [*flags* 'flags' *;*] *}*]
+{*delete* | *destroy* | *list* | *flush*} *table* ['family'] 'table'
*list tables* ['family']
*delete table* ['family'] *handle* 'handle'
+*destroy table* ['family'] *handle* 'handle'
Tables are containers for chains, sets and stateful objects. They are identified
by their address family and their name. The address family must be one of *ip*,
@@ -331,8 +343,17 @@ return an error.
|Flag | Description
|dormant |
table is not evaluated any more (base chains are unregistered).
+|owner |
+table is owned by the creating process.
+|persist |
+table shall outlive the owning process.
|=================
+Creating a table with flag *owner* excludes other processes from manipulating
+it or its contents. By default, it will be removed when the process exits.
+Setting flag *persist* will prevent this and the resulting orphaned table will
+accept a new owner, e.g. a restarting daemon maintaining the table.
+
.*Add, change, delete a table*
---------------------------------------
# start nft in interactive mode
@@ -357,16 +378,18 @@ add table inet mytable
[horizontal]
*add*:: Add a new table for the given family with the given name.
*delete*:: Delete the specified table.
+*destroy*:: Delete the specified table, it does not fail if it does not exist.
*list*:: List all chains and rules of the specified table.
*flush*:: Flush all chains and rules of the specified table.
CHAINS
------
[verse]
-{*add* | *create*} *chain* ['family'] 'table' 'chain' [*{ type* 'type' *hook* 'hook' [*device* 'device'] *priority* 'priority' *;* [*policy* 'policy' *;*] [*comment* 'comment' *;*'] *}*]
-{*delete* | *list* | *flush*} *chain* ['family'] 'table' 'chain'
+{*add* | *create*} *chain* ['family'] 'table' 'chain' [*{ type* 'type' *hook* 'hook' [*device* 'device'] *priority* 'priority' *;* [*policy* 'policy' *;*] [*comment* 'comment' *;*] *}*]
+{*delete* | *destroy* | *list* | *flush*} *chain* ['family'] 'table' 'chain'
*list chains* ['family']
*delete chain* ['family'] 'table' *handle* 'handle'
+*destroy chain* ['family'] 'table' *handle* 'handle'
*rename chain* ['family'] 'table' 'chain' 'newname'
Chains are containers for rules. They exist in two kinds, base chains and
@@ -379,6 +402,7 @@ organization.
are specified, the chain is created as a base chain and hooked up to the networking stack.
*create*:: Similar to the *add* command, but returns an error if the chain already exists.
*delete*:: Delete the specified chain. The chain must not contain any rules or be used as jump target.
+*destroy*:: Delete the specified chain, it does not fail if it does not exist. The chain must not contain any rules or be used as jump target.
*rename*:: Rename the specified chain.
*list*:: List all rules of the specified chain.
*flush*:: Flush all rules of the specified chain.
@@ -400,7 +424,7 @@ statements for instance).
|route | ip, ip6 | output |
If a packet has traversed a chain of this type and is about to be accepted, a
new route lookup is performed if relevant parts of the IP header have changed.
-This allows to e.g. implement policy routing selectors in nftables.
+This allows one to e.g. implement policy routing selectors in nftables.
|=================
Apart from the special cases illustrated above (e.g. *nat* type not supporting
@@ -419,11 +443,19 @@ further quirks worth noticing:
*prerouting*, *input*, *forward*, *output*, *postrouting* and this *ingress*
hook.
+The *device* parameter accepts a network interface name as a string, and is
+required when adding a base chain that filters traffic on the ingress or
+egress hooks. Any ingress or egress chains will only filter traffic from the
+interface specified in the *device* parameter.
+
The *priority* parameter accepts a signed integer value or a standard priority
name which specifies the order in which chains with the same *hook* value are
traversed. The ordering is ascending, i.e. lower priority values have precedence
over higher ones.
+With *nat* type chains, there's a lower excluding limit of -200 for *priority*
+values, because conntrack hooks at this priority and NAT requires it.
+
Standard priority values can be replaced with easily memorizable names. Not all
names make sense in every family with every hook (see the compatibility matrices
below) but their numerical value can still be used for prioritizing chains.
@@ -461,7 +493,7 @@ with these standard names to ease relative prioritizing, e.g. *mangle - 5* stand
for *-155*. Values will also be printed like this until the value is not
further than 10 from the standard value.
-Base chains also allow to set the chain's *policy*, i.e. what happens to
+Base chains also allow one to set the chain's *policy*, i.e. what happens to
packets not explicitly accepted or refused in contained rules. Supported policy
values are *accept* (which is the default) or *drop*.
@@ -470,7 +502,9 @@ RULES
[verse]
{*add* | *insert*} *rule* ['family'] 'table' 'chain' [*handle* 'handle' | *index* 'index'] 'statement' ... [*comment* 'comment']
*replace rule* ['family'] 'table' 'chain' *handle* 'handle' 'statement' ... [*comment* 'comment']
-*delete rule* ['family'] 'table' 'chain' *handle* 'handle'
+{*delete* | *reset*} *rule* ['family'] 'table' 'chain' *handle* 'handle'
+*destroy rule* ['family'] 'table' 'chain' *handle* 'handle'
+*reset rules* ['family'] ['table' ['chain']]
Rules are added to chains in the given table. If the family is not specified, the
ip family is used. Rules are constructed from two kinds of components according
@@ -498,6 +532,8 @@ case the rule is inserted after the specified rule.
beginning of the chain or before the specified rule.
*replace*:: Similar to *add*, but the rule replaces the specified rule.
*delete*:: Delete the specified rule.
+*destroy*:: Delete the specified rule, it does not fail if it does not exist.
+*reset*:: Reset rule-contained state, e.g. counter and quota statement values.
.*add a rule to ip table output chain*
-------------
@@ -548,10 +584,10 @@ section describes nft set syntax in more detail.
[verse]
*add set* ['family'] 'table' 'set' *{ type* 'type' | *typeof* 'expression' *;* [*flags* 'flags' *;*] [*timeout* 'timeout' *;*] [*gc-interval* 'gc-interval' *;*] [*elements = {* 'element'[*,* ...] *} ;*] [*size* 'size' *;*] [*comment* 'comment' *;*'] [*policy* 'policy' *;*] [*auto-merge ;*] *}*
-{*delete* | *list* | *flush*} *set* ['family'] 'table' 'set'
+{*delete* | *destroy* | *list* | *flush* | *reset* } *set* ['family'] 'table' 'set'
*list sets* ['family']
*delete set* ['family'] 'table' *handle* 'handle'
-{*add* | *delete*} *element* ['family'] 'table' 'set' *{* 'element'[*,* ...] *}*
+{*add* | *delete* | *destroy* } *element* ['family'] 'table' 'set' *{* 'element'[*,* ...] *}*
Sets are element containers of a user-defined data type, they are uniquely
identified by a user-defined name and attached to tables. Their behaviour can
@@ -560,8 +596,10 @@ be tuned with the flags that can be specified at set creation time.
[horizontal]
*add*:: Add a new set in the specified table. See the Set specification table below for more information about how to specify properties of a set.
*delete*:: Delete the specified set.
+*destroy*:: Delete the specified set, it does not fail if it does not exist.
*list*:: Display the elements in the specified set.
*flush*:: Remove all elements from the specified set.
+*reset*:: Reset state in all contained elements, e.g. counter and quota statement values.
.Set specifications
[options="header"]
@@ -574,8 +612,7 @@ string: ipv4_addr, ipv6_addr, ether_addr, inet_proto, inet_service, mark
data type of set element |
expression to derive the data type from
|flags |
-set flags |
-string: constant, dynamic, interval, timeout
+set flags | string: constant, dynamic, interval, timeout. Used to describe the sets properties.
|timeout |
time an element stays in the set, mandatory if set is added to from the packet path (ruleset)|
string, decimal followed by unit. Units are: d, h, m, s
@@ -601,7 +638,7 @@ MAPS
-----
[verse]
*add map* ['family'] 'table' 'map' *{ type* 'type' | *typeof* 'expression' [*flags* 'flags' *;*] [*elements = {* 'element'[*,* ...] *} ;*] [*size* 'size' *;*] [*comment* 'comment' *;*'] [*policy* 'policy' *;*] *}*
-{*delete* | *list* | *flush*} *map* ['family'] 'table' 'map'
+{*delete* | *destroy* | *list* | *flush* | *reset* } *map* ['family'] 'table' 'map'
*list maps* ['family']
Maps store data based on some specific key used as input. They are uniquely identified by a user-defined name and attached to tables.
@@ -609,10 +646,10 @@ Maps store data based on some specific key used as input. They are uniquely iden
[horizontal]
*add*:: Add a new map in the specified table.
*delete*:: Delete the specified map.
+*destroy*:: Delete the specified map, it does not fail if it does not exist.
*list*:: Display the elements in the specified map.
*flush*:: Remove all elements from the specified map.
-*add element*:: Comma-separated list of elements to add into the specified map.
-*delete element*:: Comma-separated list of element keys to delete from the specified map.
+*reset*:: Reset state in all contained elements, e.g. counter and quota statement values.
.Map specifications
[options="header"]
@@ -626,7 +663,7 @@ data type of set element |
expression to derive the data type from
|flags |
map flags |
-string: constant, interval
+string, same as set flags
|elements |
elements contained by the map |
map data type
@@ -638,18 +675,34 @@ map policy |
string: performance [default], memory
|=================
+Users can specifiy the properties/features that the set/map must support.
+This allows the kernel to pick an optimal internal representation.
+If a required flag is missing, the ruleset might still work, as
+nftables will auto-enable features if it can infer this from the ruleset.
+This may not work for all cases, however, so it is recommended to
+specify all required features in the set/map definition manually.
+
+.Set and Map flags
+[options="header"]
+|=================
+|Flag | Description
+|constant | Set contents will never change after creation
+|dynamic | Set must support updates from the packet path with the *add*, *update* or *delete* keywords.
+|interval | Set must be able to store intervals (ranges)
+|timeout | Set must support element timeouts (auto-removal of elements once they expire).
+|=================
ELEMENTS
--------
[verse]
____
-{*add* | *create* | *delete* | *get* } *element* ['family'] 'table' 'set' *{* 'ELEMENT'[*,* ...] *}*
+{*add* | *create* | *delete* | *destroy* | *get* | *reset* } *element* ['family'] 'table' 'set' *{* 'ELEMENT'[*,* ...] *}*
'ELEMENT' := 'key_expression' 'OPTIONS' [*:* 'value_expression']
'OPTIONS' := [*timeout* 'TIMESPEC'] [*expires* 'TIMESPEC'] [*comment* 'string']
'TIMESPEC' := ['num'*d*]['num'*h*]['num'*m*]['num'[*s*]]
____
-Element-related commands allow to change contents of named sets and maps.
+Element-related commands allow one to change contents of named sets and maps.
'key_expression' is typically a value matching the set type.
'value_expression' is not allowed in sets but mandatory when adding to maps, where it
matches the data part in its type definition. When deleting from maps, it may
@@ -663,6 +716,9 @@ listed elements may already exist.
be non-trivial in very large and/or interval sets. In the latter case, the
containing interval is returned instead of just the element itself.
+*reset* command resets state attached to the given element(s), e.g. counter and
+quota statement values.
+
.Element options
[options="header"]
|=================
@@ -681,7 +737,7 @@ FLOWTABLES
[verse]
{*add* | *create*} *flowtable* ['family'] 'table' 'flowtable' *{ hook* 'hook' *priority* 'priority' *; devices = {* 'device'[*,* ...] *} ; }*
*list flowtables* ['family']
-{*delete* | *list*} *flowtable* ['family'] 'table' 'flowtable'
+{*delete* | *destroy* | *list*} *flowtable* ['family'] 'table' 'flowtable'
*delete* *flowtable* ['family'] 'table' *handle* 'handle'
Flowtables allow you to accelerate packet forwarding in software. Flowtables
@@ -705,6 +761,7 @@ and subtraction can be used to set relative priority, e.g. filter + 5 equals to
[horizontal]
*add*:: Add a new flowtable for the given family with the given name.
*delete*:: Delete the specified flowtable.
+*destroy*:: Delete the specified flowtable, it does not fail if it does not exist.
*list*:: List all flowtables.
LISTING
@@ -721,11 +778,22 @@ kernel modules, such as nf_conntrack.
STATEFUL OBJECTS
----------------
[verse]
-{*add* | *delete* | *list* | *reset*} 'type' ['family'] 'table' 'object'
-*delete* 'type' ['family'] 'table' *handle* 'handle'
+{*add* | *delete* | *destroy* | *list* | *reset*} *counter* ['family'] 'table' 'object'
+{*add* | *delete* | *destroy* | *list* | *reset*} *quota* ['family'] 'table' 'object'
+{*add* | *delete* | *destroy* | *list*} *limit* ['family'] 'table' 'object'
+*delete* 'counter' ['family'] 'table' *handle* 'handle'
+*delete* 'quota' ['family'] 'table' *handle* 'handle'
+*delete* 'limit' ['family'] 'table' *handle* 'handle'
+*destroy* 'counter' ['family'] 'table' *handle* 'handle'
+*destroy* 'quota' ['family'] 'table' *handle* 'handle'
+*destroy* 'limit' ['family'] 'table' *handle* 'handle'
*list counters* ['family']
*list quotas* ['family']
*list limits* ['family']
+*reset counters* ['family']
+*reset quotas* ['family']
+*reset counters* ['family'] 'table'
+*reset quotas* ['family'] 'table'
Stateful objects are attached to tables and are identified by a unique name.
They group stateful information from rules, to reference them in rules the
@@ -734,6 +802,7 @@ keywords "type name" are used e.g. "counter name".
[horizontal]
*add*:: Add a new stateful object in the specified table.
*delete*:: Delete the specified object.
+*destroy*:: Delete the specified object, it does not fail if it does not exist.
*list*:: Display stateful information the object holds.
*reset*:: List-and-reset stateful object.
diff --git a/doc/payload-expression.txt b/doc/payload-expression.txt
index 106ff74c..c7c267da 100644
--- a/doc/payload-expression.txt
+++ b/doc/payload-expression.txt
@@ -23,6 +23,14 @@ VLAN HEADER EXPRESSION
[verse]
*vlan* {*id* | *dei* | *pcp* | *type*}
+The vlan expression is used to match on the vlan header fields.
+This expression will not work in the *ip*, *ip6* and *inet* families,
+unless the vlan interface is configured with the *reorder_hdr off* setting.
+The default is *reorder_hdr on* which will automatically remove the vlan tag
+from the packet. See ip-link(8) for more information.
+For these families its easier to match the vlan interface name
+instead, using the *meta iif* or *meta iifname* expression.
+
.VLAN header expression
[options="header"]
|==================
@@ -126,6 +134,14 @@ Destination address |
ipv4_addr
|======================
+Careful with matching on *ip length*: If GRO/GSO is enabled, then the Linux
+kernel might aggregate several packets into one big packet that is larger than
+MTU. Moreover, if GRO/GSO maximum size is larger than 65535 (see man ip-link(8),
+specifically gro_ipv6_max_size and gso_ipv6_max_size), then *ip length* might
+be 0 for such jumbo packets. *meta length* allows you to match on the packet
+length including the IP header size. If you want to perform heuristics on the
+*ip length* field, then disable GRO/GSO.
+
ICMP HEADER EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~
[verse]
@@ -236,6 +252,14 @@ Destination address |
ipv6_addr
|=======================
+Careful with matching on *ip6 length*: If GRO/GSO is enabled, then the Linux
+kernel might aggregate several packets into one big packet that is larger than
+MTU. Moreover, if GRO/GSO maximum size is larger than 65535 (see man ip-link(8),
+specifically gro_ipv6_max_size and gso_ipv6_max_size), then *ip6 length* might
+be 0 for such jumbo packets. *meta length* allows you to match on the packet
+length including the IP header size. If you want to perform heuristics on the
+*ip6 length* field, then disable GRO/GSO.
+
.Using ip6 header expressions
-----------------------------
# matching if first extension header indicates a fragment
@@ -245,7 +269,7 @@ ip6 nexthdr ipv6-frag
ICMPV6 HEADER EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~~~
[verse]
-*icmpv6* {*type* | *code* | *checksum* | *parameter-problem* | *packet-too-big* | *id* | *sequence* | *max-delay*}
+*icmpv6* {*type* | *code* | *checksum* | *parameter-problem* | *packet-too-big* | *id* | *sequence* | *max-delay* | *taddr* | *daddr*}
This expression refers to ICMPv6 header fields. When using it in *inet*,
*bridge* or *netdev* families, it will cause an implicit dependency on IPv6 to
@@ -280,6 +304,12 @@ integer (16 bit)
|max-delay|
maximum response delay of MLD queries|
integer (16 bit)
+|taddr|
+target address of neighbor solicit/advert, redirect or MLD|
+ipv6_addr
+|daddr|
+destination address of redirect|
+ipv6_addr
|==============================
TCP HEADER EXPRESSION
@@ -524,6 +554,160 @@ compression Parameter Index |
integer (16 bit)
|============================
+GRE HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*gre* {*flags* | *version* | *protocol*}
+*gre* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*gre* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+
+The gre expression is used to match on the gre header fields. This expression
+also allows to match on the IPv4 or IPv6 packet within the gre header.
+
+.GRE header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|flags|
+checksum, routing, key, sequence and strict source route flags|
+integer (5 bit)
+|version|
+gre version field, 0 for GRE and 1 for PPTP|
+integer (3 bit)
+|protocol|
+EtherType of encapsulated packet|
+integer (16 bit)
+|==================
+
+.Matching inner IPv4 destination address encapsulated in gre
+------------------------------------------------------------
+netdev filter ingress gre ip daddr 9.9.9.9 counter
+------------------------------------------------------------
+
+GENEVE HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*geneve* {*vni* | *flags*}
+*geneve* *ether* {*daddr* | *saddr* | *type*}
+*geneve* *vlan* {*id* | *dei* | *pcp* | *type*}
+*geneve* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*geneve* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*geneve* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*geneve* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The geneve expression is used to match on the geneve header fields. The geneve
+header encapsulates a ethernet frame within a *udp* packet. This expression
+requires that you restrict the matching to *udp* packets (usually at
+port 6081 according to IANA-assigned ports).
+
+.GENEVE header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|protocol|
+EtherType of encapsulated packet|
+integer (16 bit)
+|vni|
+Virtual Network ID (VNI)|
+integer (24 bit)
+|==================
+
+.Matching inner TCP destination port encapsulated in geneve
+----------------------------------------------------------
+netdev filter ingress udp dport 4789 geneve tcp dport 80 counter
+----------------------------------------------------------
+
+GRETAP HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*gretap* {*vni* | *flags*}
+*gretap* *ether* {*daddr* | *saddr* | *type*}
+*gretap* *vlan* {*id* | *dei* | *pcp* | *type*}
+*gretap* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*gretap* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*gretap* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*gretap* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The gretap expression is used to match on the encapsulated ethernet frame
+within the gre header. Use the *gre* expression to match on the *gre* header
+fields.
+
+.Matching inner TCP destination port encapsulated in gretap
+----------------------------------------------------------
+netdev filter ingress gretap tcp dport 80 counter
+----------------------------------------------------------
+
+VXLAN HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*vxlan* {*vni* | *flags*}
+*vxlan* *ether* {*daddr* | *saddr* | *type*}
+*vxlan* *vlan* {*id* | *dei* | *pcp* | *type*}
+*vxlan* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*vxlan* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*vxlan* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*vxlan* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The vxlan expression is used to match on the vxlan header fields. The vxlan
+header encapsulates a ethernet frame within a *udp* packet. This expression
+requires that you restrict the matching to *udp* packets (usually at
+port 4789 according to IANA-assigned ports).
+
+.VXLAN header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|flags|
+vxlan flags|
+integer (8 bit)
+|vni|
+Virtual Network ID (VNI)|
+integer (24 bit)
+|==================
+
+.Matching inner TCP destination port encapsulated in vxlan
+----------------------------------------------------------
+netdev filter ingress udp dport 4789 vxlan tcp dport 80 counter
+----------------------------------------------------------
+
+ARP HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*arp* {*htype* | *ptype* | *hlen* | *plen* | *operation* | *saddr* { *ip* | *ether* } | *daddr* { *ip* | *ether* }
+
+.ARP header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|htype|
+ARP hardware type|
+integer (16 bit)
+|ptype|
+EtherType|
+ether_type
+|hlen|
+Hardware address len|
+integer (8 bit)
+|plen|
+Protocol address len |
+integer (8 bit)
+|operation|
+Operation |
+arp_op
+|saddr ether|
+Ethernet sender address|
+ether_addr
+|daddr ether|
+Ethernet target address|
+ether_addr
+|saddr ip|
+IPv4 sender address|
+ipv4_addr
+|daddr ip|
+IPv4 target address|
+ipv4_addr
+|======================
+
RAW PAYLOAD EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~
[verse]
@@ -548,6 +732,8 @@ Link layer, for example the Ethernet header
Network header, for example IPv4 or IPv6
|th|
Transport Header, for example TCP
+|ih|
+Inner Header / Payload, i.e. after the L4 transport level header
|==============================
.Matching destination port of both UDP and TCP
@@ -589,6 +775,7 @@ The following syntaxes are valid only in a relational expression with boolean ty
*exthdr* {*hbh* | *frag* | *rt* | *dst* | *mh*}
*tcp option* {*eol* | *nop* | *maxseg* | *window* | *sack-perm* | *sack* | *sack0* | *sack1* | *sack2* | *sack3* | *timestamp*}
*ip option* { lsrr | ra | rr | ssrr }
+*dccp option* 'dccp_option_type'
.IPv6 extension headers
[options="header"]
@@ -691,6 +878,11 @@ ip6 filter input frag more-fragments 1 counter
filter input ip option lsrr exists counter
---------------------------------------
+.finding DCCP option
+------------------
+filter input dccp option 40 exists counter
+---------------------------------------
+
CONNTRACK EXPRESSIONS
~~~~~~~~~~~~~~~~~~~~~
Conntrack expressions refer to meta data of the connection tracking entry associated with a packet. +
diff --git a/doc/primary-expression.txt b/doc/primary-expression.txt
index f97778b9..782494bd 100644
--- a/doc/primary-expression.txt
+++ b/doc/primary-expression.txt
@@ -168,15 +168,18 @@ Either an integer or a date in ISO format. For example: "2019-06-06 17:00".
Hour and seconds are optional and can be omitted if desired. If omitted,
midnight will be assumed.
The following three would be equivalent: "2019-06-06", "2019-06-06 00:00"
-and "2019-06-06 00:00:00".
+and "2019-06-06 00:00:00". Use a range expression such as
+"2019-06-06 10:00"-"2019-06-10 14:00" for matching a time range.
When an integer is given, it is assumed to be a UNIX timestamp.
|day|
Either a day of week ("Monday", "Tuesday", etc.), or an integer between 0 and 6.
Strings are matched case-insensitively, and a full match is not expected (e.g. "Mon" would match "Monday").
-When an integer is given, 0 is Sunday and 6 is Saturday.
+When an integer is given, 0 is Sunday and 6 is Saturday. Use a range expression
+such as "Monday"-"Wednesday" for matching a week day range.
|hour|
A string representing an hour in 24-hour format. Seconds can optionally be specified.
-For example, 17:00 and 17:00:00 would be equivalent.
+For example, 17:00 and 17:00:00 would be equivalent. Use a range expression such
+as "17:00"-"19:00" for matching a time range.
|=============================
.Using meta expressions
@@ -190,6 +193,9 @@ filter output oif eth0
# incoming packet was subject to ipsec processing
raw prerouting meta ipsec exists accept
+
+# match incoming packet from 03:00 to 14:00 local time
+raw prerouting meta hour "03:00"-"14:00" counter accept
-----------------------
SOCKET EXPRESSION
@@ -428,6 +434,10 @@ Destination address of the tunnel|
ipv4_addr/ipv6_addr
|=================================
+*Note:* When using xfrm_interface, this expression is not useable in output
+hook as the plain packet does not traverse it with IPsec info attached - use a
+chain in postrouting hook instead.
+
NUMGEN EXPRESSION
~~~~~~~~~~~~~~~~~
@@ -438,7 +448,7 @@ Create a number generator. The *inc* or *random* keywords control its
operation mode: In *inc* mode, the last returned value is simply incremented.
In *random* mode, a new random number is returned. The value after *mod*
keyword specifies an upper boundary (read: modulus) which is not reached by
-returned numbers. The optional *offset* allows to increment the returned value
+returned numbers. The optional *offset* allows one to increment the returned value
by a fixed offset.
A typical use-case for *numgen* is load-balancing:
@@ -468,7 +478,7 @@ header to apply the hashing, concatenations are possible as well. The value
after *mod* keyword specifies an upper boundary (read: modulus) which is
not reached by returned numbers. The optional *seed* is used to specify an
init value used as seed in the hashing function. The optional *offset*
-allows to increment the returned value by a fixed offset.
+allows one to increment the returned value by a fixed offset.
A typical use-case for *jhash* and *symhash* is load-balancing:
diff --git a/doc/stateful-objects.txt b/doc/stateful-objects.txt
index e3c79220..00d3c5f1 100644
--- a/doc/stateful-objects.txt
+++ b/doc/stateful-objects.txt
@@ -94,7 +94,7 @@ table ip filter {
ct timeout customtimeout {
protocol tcp;
l3proto ip
- policy = { established: 120, close: 20 }
+ policy = { established: 2m, close: 20s }
}
chain output {
diff --git a/doc/statements.txt b/doc/statements.txt
index 8675892a..39b31fd2 100644
--- a/doc/statements.txt
+++ b/doc/statements.txt
@@ -11,7 +11,7 @@ The verdict statement alters control flow in the ruleset and issues policy decis
[horizontal]
*accept*:: Terminate ruleset evaluation and accept the packet.
The packet can still be dropped later by another hook, for instance accept
-in the forward hook still allows to drop the packet later in the postrouting hook,
+in the forward hook still allows one to drop the packet later in the postrouting hook,
or another forward base chain that has a higher priority number and is evaluated
afterwards in the processing pipeline.
*drop*:: Terminate ruleset evaluation and drop the packet.
@@ -71,7 +71,7 @@ EXTENSION HEADER STATEMENT
The extension header statement alters packet content in variable-sized headers.
This can currently be used to alter the TCP Maximum segment size of packets,
-similar to TCPMSS.
+similar to the TCPMSS target in iptables.
.change tcp mss
---------------
@@ -80,6 +80,13 @@ tcp flags syn tcp option maxseg size set 1360
tcp flags syn tcp option maxseg size set rt mtu
---------------
+You can also remove tcp options via reset keyword.
+
+.remove tcp option
+---------------
+tcp flags syn reset tcp option sack-perm
+---------------
+
LOG STATEMENT
~~~~~~~~~~~~~
[verse]
@@ -164,37 +171,77 @@ REJECT STATEMENT
____
*reject* [ *with* 'REJECT_WITH' ]
-'REJECT_WITH' := *icmp* 'icmp_code' |
- *icmpv6* 'icmpv6_code' |
- *icmpx* 'icmpx_code' |
+'REJECT_WITH' := *icmp* 'icmp_reject_code' |
+ *icmpv6* 'icmpv6_reject_code' |
+ *icmpx* 'icmpx_reject_code' |
*tcp reset*
____
A reject statement is used to send back an error packet in response to the
matched packet otherwise it is equivalent to drop so it is a terminating
statement, ending rule traversal. This statement is only valid in base chains
-using the *input*,
+using the *prerouting*, *input*,
*forward* or *output* hooks, and user-defined chains which are only called from
those chains.
-.different ICMP reject variants are meant for use in different table families
+.Keywords may be used to reject when specifying the ICMP code
[options="header"]
|==================
-|Variant |Family | Type
-|icmp|
-ip|
-icmp_code
-|icmpv6|
-ip6|
-icmpv6_code
-|icmpx|
-inet|
-icmpx_code
+|Keyword | Value
+|net-unreachable |
+0
+|host-unreachable |
+1
+|prot-unreachable|
+2
+|port-unreachable|
+3
+|frag-needed|
+4
+|net-prohibited|
+9
+|host-prohibited|
+10
+|admin-prohibited|
+13
+|===================
+
+.keywords may be used to reject when specifying the ICMPv6 code
+[options="header"]
+|==================
+|Keyword |Value
+|no-route|
+0
+|admin-prohibited|
+1
+|addr-unreachable|
+3
+|port-unreachable|
+4
+|policy-fail|
+5
+|reject-route|
+6
|==================
-For a description of the different types and a list of supported keywords refer
-to DATA TYPES section above. The common default reject value is
-*port-unreachable*. +
+The ICMPvX Code type abstraction is a set of values which overlap between ICMP
+and ICMPv6 Code types to be used from the inet family.
+
+.keywords may be used when specifying the ICMPvX code
+[options="header"]
+|==================
+|Keyword |Value
+|no-route|
+0
+|port-unreachable|
+1
+|host-unreachable|
+2
+|admin-prohibited|
+3
+|=================
+
+The common default ICMP code to reject is *port-unreachable*.
Note that in bridge family, reject statement is only allowed in base chains
which hook into input or prerouting.
@@ -271,7 +318,7 @@ ct event set new,related,destroy
NOTRACK STATEMENT
~~~~~~~~~~~~~~~~~
-The notrack statement allows to disable connection tracking for certain
+The notrack statement allows one to disable connection tracking for certain
packets.
[verse]
@@ -289,7 +336,7 @@ A meta statement sets the value of a meta expression. The existing meta fields
are: priority, mark, pkttype, nftrace. +
[verse]
-*meta* {*mark* | *priority* | *pkttype* | *nftrace*} *set* 'value'
+*meta* {*mark* | *priority* | *pkttype* | *nftrace* | *broute*} *set* 'value'
A meta statement sets meta data associated with a packet. +
@@ -309,6 +356,9 @@ pkt_type
|nftrace |
ruleset packet tracing on/off. Use *monitor trace* command to watch traces|
0, 1
+|broute |
+broute on/off. packets are routed instead of being bridged|
+0, 1
|==========================
LIMIT STATEMENT
@@ -325,8 +375,13 @@ ____
A limit statement matches at a limited rate using a token bucket filter. A rule
using this statement will match until this limit is reached. It can be used in
combination with the log statement to give limited logging. The optional
-*over* keyword makes it match over the specified rate. Default *burst* is 5.
-if you specify *burst*, it must be non-zero value.
+*over* keyword makes it match over the specified rate.
+
+The *burst* value influences the bucket size, i.e. jitter tolerance. With
+packet-based *limit*, the bucket holds exactly *burst* packets, by default
+five. If you specify packet *burst*, it must be a non-zero value. With
+byte-based *limit*, the bucket's minimum size is the given rate's byte value
+and the *burst* value adds to that, by default zero bytes.
.limit statement values
[options="header"]
@@ -344,8 +399,8 @@ NAT STATEMENTS
~~~~~~~~~~~~~~
[verse]
____
-*snat* [[*ip* | *ip6*] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
-*dnat* [[*ip* | *ip6*] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*snat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*dnat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
*masquerade* [*to :*'PORT_SPEC'] ['FLAGS']
*redirect* [*to :*'PORT_SPEC'] ['FLAGS']
@@ -383,6 +438,9 @@ Before kernel 4.18 nat statements require both prerouting and postrouting base c
to be present since otherwise packets on the return path won't be seen by
netfilter and therefore no reverse translation will take place.
+The optional *prefix* keyword allows to map to map *n* source addresses to *n*
+destination addresses. See 'Advanced NAT examples' below.
+
.NAT statement values
[options="header"]
|==================
@@ -393,7 +451,7 @@ You may specify a mapping to relate a list of tuples composed of arbitrary
expression key with address value. |
ipv4_addr, ipv6_addr, e.g. abcd::1234, or you can use a mapping, e.g. meta mark map { 10 : 192.168.1.2, 20 : 192.168.1.3 }
|port|
-Specifies that the source/destination address of the packet should be modified. |
+Specifies that the source/destination port of the packet should be modified. |
port number (16 bit)
|===============================
@@ -442,6 +500,52 @@ add rule inet nat postrouting meta oif ppp0 masquerade
------------------------
+.Advanced NAT examples
+----------------------
+
+# map prefixes in one network to that of another, e.g. 10.141.11.4 is mangled to 192.168.2.4,
+# 10.141.11.5 is mangled to 192.168.2.5 and so on.
+add rule nat postrouting snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24 }
+
+# map a source address, source port combination to a pool of destination addresses and ports:
+add rule nat postrouting dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 }
+
+# The above example generates the following NAT expression:
+#
+# [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ]
+#
+# which expects to obtain the following tuple:
+# IP address (min), source port (min), IP address (max), source port (max)
+# to be obtained from the map. The given addresses and ports are inclusive.
+
+# This also works with named maps and in combination with both concatenations and ranges:
+table ip nat {
+ map ipportmap {
+ typeof ip saddr : interval ip daddr . tcp dport
+ flags interval
+ elements = { 192.168.1.2 : 10.141.10.1-10.141.10.3 . 8888-8999, 192.168.2.0/24 : 10.141.11.5-10.141.11.20 . 8888-8999 }
+ }
+
+ chain prerouting {
+ type nat hook prerouting priority dstnat; policy accept;
+ ip protocol tcp dnat ip to ip saddr map @ipportmap
+ }
+}
+
+@ipportmap maps network prefixes to a range of hosts and ports.
+The new destination is taken from the range provided by the map element.
+Same for the destination port.
+
+Note the use of the "interval" keyword in the typeof description.
+This is required so nftables knows that it has to ask for twice the
+amount of storage for each key-value pair in the map.
+
+": ipv4_addr . inet_service" would allow associating one address and one port
+with each key. But for this case, for each key, two addresses and two ports
+(The minimum and maximum values for both) have to be stored.
+
+------------------------
+
TPROXY STATEMENT
~~~~~~~~~~~~~~~~
Tproxy redirects the packet to a local socket without changing the packet header
@@ -601,7 +705,7 @@ ____
QUEUE_EXPRESSION can be used to compute a queue number
at run-time with the hash or numgen expressions. It also
-allows to use the map statement to assign fixed queue numbers
+allows one to use the map statement to assign fixed queue numbers
based on external inputs such as the source ip address or interface names.
.queue statement values
@@ -671,7 +775,24 @@ The fwd statement is used to redirect a raw packet to another interface. It is
only available in the netdev family ingress and egress hooks. It is similar to
the dup statement except that no copy is made.
+You can also specify the address of the next hop and the device to forward the
+packet to. This updates the source and destination MAC address of the packet by
+transmitting it through the neighboring layer. This also decrements the ttl
+field of the IP packet. This provides a way to effectively bypass the classical
+forwarding path, thus skipping the fib (forwarding information base) lookup.
+
+[verse]
*fwd to* 'device'
+*fwd* [*ip* | *ip6*] *to* 'address' *device* 'device'
+
+.Using the fwd statement
+------------------------
+# redirect raw packet to device
+netdev ingress fwd to "eth0"
+
+# forward packet to next hop 192.168.200.1 via eth0 device
+netdev ingress ether saddr set fwd ip to 192.168.200.1 device "eth0"
+-----------------------------------
SET STATEMENT
~~~~~~~~~~~~~
@@ -687,6 +808,10 @@ will not grow indefinitely) either from the set definition or from the statement
that adds or updates them. The set statement can be used to e.g. create dynamic
blacklists.
+Dynamic updates are also supported with maps. In this case, the *add* or
+*update* rule needs to provide both the key and the data element (value),
+separated via ':'.
+
[verse]
{*add* | *update*} *@*'setname' *{* 'expression' [*timeout* 'timeout'] [*comment* 'string'] *}*
@@ -771,3 +896,20 @@ ____
# jump to different chains depending on layer 4 protocol type:
nft add rule ip filter input ip protocol vmap { tcp : jump tcp-chain, udp : jump udp-chain , icmp : jump icmp-chain }
------------------------
+
+XT STATEMENT
+~~~~~~~~~~~~
+This represents an xt statement from xtables compat interface. It is a
+fallback if translation is not available or not complete.
+
+[verse]
+____
+*xt* 'TYPE' 'NAME'
+
+'TYPE' := *match* | *target* | *watcher*
+____
+
+Seeing this means the ruleset (or parts of it) were created by *iptables-nft*
+and one should use that to manage it.
+
+*BEWARE:* nftables won't restore these statements.