summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/Makefile.am30
-rw-r--r--doc/data-types.txt70
-rw-r--r--doc/libnftables-json.adoc71
-rw-r--r--doc/libnftables.adoc30
-rw-r--r--doc/nft.txt94
-rw-r--r--doc/payload-expression.txt148
-rw-r--r--doc/primary-expression.txt12
-rw-r--r--doc/stateful-objects.txt4
-rw-r--r--doc/statements.txt174
9 files changed, 464 insertions, 169 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am
deleted file mode 100644
index 21482320..00000000
--- a/doc/Makefile.am
+++ /dev/null
@@ -1,30 +0,0 @@
-if BUILD_MAN
-man_MANS = nft.8 libnftables-json.5 libnftables.3
-
-A2X_OPTS_MANPAGE = -L --doctype manpage --format manpage -D ${builddir}
-
-ASCIIDOC_MAIN = nft.txt
-ASCIIDOC_INCLUDES = \
- data-types.txt \
- payload-expression.txt \
- primary-expression.txt \
- stateful-objects.txt \
- statements.txt
-ASCIIDOCS = ${ASCIIDOC_MAIN} ${ASCIIDOC_INCLUDES}
-
-EXTRA_DIST = ${ASCIIDOCS} ${man_MANS} libnftables-json.adoc libnftables.adoc
-
-CLEANFILES = \
- *~
-
-nft.8: ${ASCIIDOCS}
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-.adoc.3:
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-.adoc.5:
- ${AM_V_GEN}${A2X} ${A2X_OPTS_MANPAGE} $<
-
-CLEANFILES += ${man_MANS}
-endif
diff --git a/doc/data-types.txt b/doc/data-types.txt
index 961fc624..6c0e2f94 100644
--- a/doc/data-types.txt
+++ b/doc/data-types.txt
@@ -242,35 +242,13 @@ integer
The ICMP Code type is used to conveniently specify the ICMP header's code field.
-.Keywords may be used when specifying the ICMP code
-[options="header"]
-|==================
-|Keyword | Value
-|net-unreachable |
-0
-|host-unreachable |
-1
-|prot-unreachable|
-2
-|port-unreachable|
-3
-|frag-needed|
-4
-|net-prohibited|
-9
-|host-prohibited|
-10
-|admin-prohibited|
-13
-|===================
-
ICMPV6 TYPE TYPE
~~~~~~~~~~~~~~~~
[options="header"]
|==================
|Name | Keyword | Size | Base type
|ICMPv6 Type |
-icmpx_code |
+icmpv6_type |
8 bit |
integer
|===================
@@ -340,52 +318,6 @@ integer
The ICMPv6 Code type is used to conveniently specify the ICMPv6 header's code field.
-.keywords may be used when specifying the ICMPv6 code
-[options="header"]
-|==================
-|Keyword |Value
-|no-route|
-0
-|admin-prohibited|
-1
-|addr-unreachable|
-3
-|port-unreachable|
-4
-|policy-fail|
-5
-|reject-route|
-6
-|==================
-
-ICMPVX CODE TYPE
-~~~~~~~~~~~~~~~~
-[options="header"]
-|==================
-|Name | Keyword | Size | Base type
-|ICMPvX Code |
-icmpv6_type |
-8 bit |
-integer
-|===================
-
-The ICMPvX Code type abstraction is a set of values which overlap between ICMP
-and ICMPv6 Code types to be used from the inet family.
-
-.keywords may be used when specifying the ICMPvX code
-[options="header"]
-|==================
-|Keyword |Value
-|no-route|
-0
-|port-unreachable|
-1
-|host-unreachable|
-2
-|admin-prohibited|
-3
-|=================
-
CONNTRACK TYPES
~~~~~~~~~~~~~~~
diff --git a/doc/libnftables-json.adoc b/doc/libnftables-json.adoc
index bb59945f..a8a6165f 100644
--- a/doc/libnftables-json.adoc
+++ b/doc/libnftables-json.adoc
@@ -175,7 +175,7 @@ kind, optionally filtered by *family* and for some, also *table*.
____
*{ "reset":* 'RESET_OBJECT' *}*
-'RESET_OBJECT' := 'COUNTER' | 'COUNTERS' | 'QUOTA' | 'QUOTAS'
+'RESET_OBJECT' := 'COUNTER' | 'COUNTERS' | 'QUOTA' | 'QUOTAS' | 'RULE' | 'RULES' | 'SET' | 'MAP' | 'ELEMENT'
____
Reset state in suitable objects, i.e. zero their internal counter.
@@ -202,12 +202,19 @@ Rename a chain. The new name is expected in a dedicated property named
=== TABLE
[verse]
+____
*{ "table": {
"family":* 'STRING'*,
"name":* 'STRING'*,
- "handle":* 'NUMBER'
+ "handle":* 'NUMBER'*,
+ "flags":* 'TABLE_FLAGS'
*}}*
+'TABLE_FLAGS' := 'TABLE_FLAG' | *[* 'TABLE_FLAG_LIST' *]*
+'TABLE_FLAG_LIST' := 'TABLE_FLAG' [*,* 'TABLE_FLAG_LIST' ]
+'TABLE_FLAG' := *"dormant"* | *"owner"* | *"persist"*
+____
+
This object describes a table.
*family*::
@@ -217,6 +224,8 @@ This object describes a table.
*handle*::
The table's handle. In input, it is used only in *delete* command as
alternative to *name*.
+*flags*::
+ The table's flags.
=== CHAIN
[verse]
@@ -312,7 +321,8 @@ ____
"elem":* 'SET_ELEMENTS'*,
"timeout":* 'NUMBER'*,
"gc-interval":* 'NUMBER'*,
- "size":* 'NUMBER'
+ "size":* 'NUMBER'*,
+ "auto-merge":* 'BOOLEAN'
*}}*
*{ "map": {
@@ -327,7 +337,8 @@ ____
"elem":* 'SET_ELEMENTS'*,
"timeout":* 'NUMBER'*,
"gc-interval":* 'NUMBER'*,
- "size":* 'NUMBER'
+ "size":* 'NUMBER'*,
+ "auto-merge":* 'BOOLEAN'
*}}*
'SET_TYPE' := 'STRING' | *[* 'SET_TYPE_LIST' *]*
@@ -366,6 +377,8 @@ that they translate a unique key to a value.
Garbage collector interval in seconds.
*size*::
Maximum number of elements supported.
+*auto-merge*::
+ Automatic merging of adjacent/overlapping set elements in interval sets.
==== TYPE
The set type might be a string, such as *"ipv4_addr"* or an array
@@ -682,11 +695,6 @@ processing continues with the next rule in the same chain.
==== OPERATORS
[horizontal]
-*&*:: Binary AND
-*|*:: Binary OR
-*^*:: Binary XOR
-*<<*:: Left shift
-*>>*:: Right shift
*==*:: Equal
*!=*:: Not equal
*<*:: Less than
@@ -1059,10 +1067,22 @@ Assign connection tracking expectation.
=== XT
[verse]
-*{ "xt": null }*
+____
+*{ "xt": {
+ "type":* 'TYPENAME'*,
+ "name":* 'STRING'
+*}}*
+
+'TYPENAME' := *match* | *target* | *watcher*
+____
-This represents an xt statement from xtables compat interface. Sadly, at this
-point, it is not possible to provide any further information about its content.
+This represents an xt statement from xtables compat interface. It is a
+fallback if translation is not available or not complete.
+
+Seeing this means the ruleset (or parts of it) were created by *iptables-nft*
+and one should use that to manage it.
+
+*BEWARE:* nftables won't restore these statements.
== EXPRESSIONS
Expressions are the building blocks of (most) statements. In their most basic
@@ -1214,6 +1234,17 @@ If the *field* property is not given, the expression is to be used as an SCTP
chunk existence check in a *match* statement with a boolean on the right hand
side.
+=== DCCP OPTION
+[verse]
+*{ "dccp option": {
+ "type":* 'NUMBER'*
+*}}*
+
+Create a reference to a DCCP option (*type*).
+
+The expression is to be used as a DCCP option existence check in a *match*
+statement with a boolean on the right hand side.
+
=== META
[verse]
____
@@ -1321,15 +1352,17 @@ Perform kernel Forwarding Information Base lookups.
=== BINARY OPERATION
[verse]
-*{ "|": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "^": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "&": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ "+<<+": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
-*{ ">>": [* 'EXPRESSION'*,* 'EXPRESSION' *] }*
+*{ "|": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "^": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "&": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ "+<<+": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+*{ ">>": [* 'EXPRESSION'*,* 'EXPRESSIONS' *] }*
+'EXPRESSIONS' := 'EXPRESSION' | 'EXPRESSION'*,* 'EXPRESSIONS'
-All binary operations expect an array of exactly two expressions, of which the
+All binary operations expect an array of at least two expressions, of which the
first element denotes the left hand side and the second one the right hand
-side.
+side. Extra elements are accepted in the given array and appended to the term
+accordingly.
=== VERDICT
[verse]
diff --git a/doc/libnftables.adoc b/doc/libnftables.adoc
index 7ea0d56e..2cf78d7a 100644
--- a/doc/libnftables.adoc
+++ b/doc/libnftables.adoc
@@ -18,6 +18,9 @@ void nft_ctx_free(struct nft_ctx* '\*ctx'*);
bool nft_ctx_get_dry_run(struct nft_ctx* '\*ctx'*);
void nft_ctx_set_dry_run(struct nft_ctx* '\*ctx'*, bool* 'dry'*);
+unsigned int nft_ctx_input_get_flags(struct nft_ctx* '\*ctx'*);
+unsigned int nft_ctx_input_set_flags(struct nft_ctx* '\*ctx'*, unsigned int* 'flags'*);
+
unsigned int nft_ctx_output_get_flags(struct nft_ctx* '\*ctx'*);
void nft_ctx_output_set_flags(struct nft_ctx* '\*ctx'*, unsigned int* 'flags'*);
@@ -78,6 +81,30 @@ The *nft_ctx_get_dry_run*() function returns the dry-run setting's value contain
The *nft_ctx_set_dry_run*() function sets the dry-run setting in 'ctx' to the value of 'dry'.
+=== nft_ctx_input_get_flags() and nft_ctx_input_set_flags()
+The flags setting controls the input format.
+
+----
+enum {
+ NFT_CTX_INPUT_NO_DNS = (1 << 0),
+ NFT_CTX_INPUT_JSON = (1 << 1),
+};
+----
+
+NFT_CTX_INPUT_NO_DNS::
+ Avoid resolving IP addresses with blocking getaddrinfo(). In that case,
+ only plain IP addresses are accepted.
+
+NFT_CTX_INPUT_JSON:
+ When parsing the input, first try to interpret the input as JSON before
+ falling back to the nftables format. This behavior is implied when setting
+ the NFT_CTX_OUTPUT_JSON flag.
+
+The *nft_ctx_input_get_flags*() function returns the input flags setting's value in 'ctx'.
+
+The *nft_ctx_input_set_flags*() function sets the input flags setting in 'ctx' to the value of 'val'
+and returns the previous flags.
+
=== nft_ctx_output_get_flags() and nft_ctx_output_set_flags()
The flags setting controls the output format.
@@ -118,7 +145,8 @@ NFT_CTX_OUTPUT_HANDLE::
NFT_CTX_OUTPUT_JSON::
If enabled at compile-time, libnftables accepts input in JSON format and is able to print output in JSON format as well.
See *libnftables-json*(5) for a description of the supported schema.
- This flag controls JSON output format, input is auto-detected.
+ This flag enables JSON output format. If the flag is set, the input will first be tried as JSON format,
+ before falling back to nftables format. This flag implies NFT_CTX_INPUT_JSON.
NFT_CTX_OUTPUT_ECHO::
The echo setting makes libnftables print the changes once they are committed to the kernel, just like a running instance of *nft monitor* would.
Amongst other things, this allows one to retrieve an added rule's handle atomically.
diff --git a/doc/nft.txt b/doc/nft.txt
index eb8df1d9..e4eb982e 100644
--- a/doc/nft.txt
+++ b/doc/nft.txt
@@ -321,10 +321,11 @@ Effectively, this is the nft-equivalent of *iptables-save* and
TABLES
------
[verse]
-{*add* | *create*} *table* ['family'] 'table' [ {*comment* 'comment' *;*'} *{ flags* 'flags' *; }*]
-{*delete* | *list* | *flush*} *table* ['family'] 'table'
+{*add* | *create*} *table* ['family'] 'table' [*{* [*comment* 'comment' *;*] [*flags* 'flags' *;*] *}*]
+{*delete* | *destroy* | *list* | *flush*} *table* ['family'] 'table'
*list tables* ['family']
*delete table* ['family'] *handle* 'handle'
+*destroy table* ['family'] *handle* 'handle'
Tables are containers for chains, sets and stateful objects. They are identified
by their address family and their name. The address family must be one of *ip*,
@@ -342,8 +343,17 @@ return an error.
|Flag | Description
|dormant |
table is not evaluated any more (base chains are unregistered).
+|owner |
+table is owned by the creating process.
+|persist |
+table shall outlive the owning process.
|=================
+Creating a table with flag *owner* excludes other processes from manipulating
+it or its contents. By default, it will be removed when the process exits.
+Setting flag *persist* will prevent this and the resulting orphaned table will
+accept a new owner, e.g. a restarting daemon maintaining the table.
+
.*Add, change, delete a table*
---------------------------------------
# start nft in interactive mode
@@ -368,16 +378,18 @@ add table inet mytable
[horizontal]
*add*:: Add a new table for the given family with the given name.
*delete*:: Delete the specified table.
+*destroy*:: Delete the specified table, it does not fail if it does not exist.
*list*:: List all chains and rules of the specified table.
*flush*:: Flush all chains and rules of the specified table.
CHAINS
------
[verse]
-{*add* | *create*} *chain* ['family'] 'table' 'chain' [*{ type* 'type' *hook* 'hook' [*device* 'device'] *priority* 'priority' *;* [*policy* 'policy' *;*] [*comment* 'comment' *;*'] *}*]
-{*delete* | *list* | *flush*} *chain* ['family'] 'table' 'chain'
+{*add* | *create*} *chain* ['family'] 'table' 'chain' [*{ type* 'type' *hook* 'hook' [*device* 'device'] *priority* 'priority' *;* [*policy* 'policy' *;*] [*comment* 'comment' *;*] *}*]
+{*delete* | *destroy* | *list* | *flush*} *chain* ['family'] 'table' 'chain'
*list chains* ['family']
*delete chain* ['family'] 'table' *handle* 'handle'
+*destroy chain* ['family'] 'table' *handle* 'handle'
*rename chain* ['family'] 'table' 'chain' 'newname'
Chains are containers for rules. They exist in two kinds, base chains and
@@ -390,6 +402,7 @@ organization.
are specified, the chain is created as a base chain and hooked up to the networking stack.
*create*:: Similar to the *add* command, but returns an error if the chain already exists.
*delete*:: Delete the specified chain. The chain must not contain any rules or be used as jump target.
+*destroy*:: Delete the specified chain, it does not fail if it does not exist. The chain must not contain any rules or be used as jump target.
*rename*:: Rename the specified chain.
*list*:: List all rules of the specified chain.
*flush*:: Flush all rules of the specified chain.
@@ -430,11 +443,19 @@ further quirks worth noticing:
*prerouting*, *input*, *forward*, *output*, *postrouting* and this *ingress*
hook.
+The *device* parameter accepts a network interface name as a string, and is
+required when adding a base chain that filters traffic on the ingress or
+egress hooks. Any ingress or egress chains will only filter traffic from the
+interface specified in the *device* parameter.
+
The *priority* parameter accepts a signed integer value or a standard priority
name which specifies the order in which chains with the same *hook* value are
traversed. The ordering is ascending, i.e. lower priority values have precedence
over higher ones.
+With *nat* type chains, there's a lower excluding limit of -200 for *priority*
+values, because conntrack hooks at this priority and NAT requires it.
+
Standard priority values can be replaced with easily memorizable names. Not all
names make sense in every family with every hook (see the compatibility matrices
below) but their numerical value can still be used for prioritizing chains.
@@ -481,7 +502,9 @@ RULES
[verse]
{*add* | *insert*} *rule* ['family'] 'table' 'chain' [*handle* 'handle' | *index* 'index'] 'statement' ... [*comment* 'comment']
*replace rule* ['family'] 'table' 'chain' *handle* 'handle' 'statement' ... [*comment* 'comment']
-*delete rule* ['family'] 'table' 'chain' *handle* 'handle'
+{*delete* | *reset*} *rule* ['family'] 'table' 'chain' *handle* 'handle'
+*destroy rule* ['family'] 'table' 'chain' *handle* 'handle'
+*reset rules* ['family'] ['table' ['chain']]
Rules are added to chains in the given table. If the family is not specified, the
ip family is used. Rules are constructed from two kinds of components according
@@ -509,6 +532,8 @@ case the rule is inserted after the specified rule.
beginning of the chain or before the specified rule.
*replace*:: Similar to *add*, but the rule replaces the specified rule.
*delete*:: Delete the specified rule.
+*destroy*:: Delete the specified rule, it does not fail if it does not exist.
+*reset*:: Reset rule-contained state, e.g. counter and quota statement values.
.*add a rule to ip table output chain*
-------------
@@ -559,10 +584,10 @@ section describes nft set syntax in more detail.
[verse]
*add set* ['family'] 'table' 'set' *{ type* 'type' | *typeof* 'expression' *;* [*flags* 'flags' *;*] [*timeout* 'timeout' *;*] [*gc-interval* 'gc-interval' *;*] [*elements = {* 'element'[*,* ...] *} ;*] [*size* 'size' *;*] [*comment* 'comment' *;*'] [*policy* 'policy' *;*] [*auto-merge ;*] *}*
-{*delete* | *list* | *flush*} *set* ['family'] 'table' 'set'
+{*delete* | *destroy* | *list* | *flush* | *reset* } *set* ['family'] 'table' 'set'
*list sets* ['family']
*delete set* ['family'] 'table' *handle* 'handle'
-{*add* | *delete*} *element* ['family'] 'table' 'set' *{* 'element'[*,* ...] *}*
+{*add* | *delete* | *destroy* } *element* ['family'] 'table' 'set' *{* 'element'[*,* ...] *}*
Sets are element containers of a user-defined data type, they are uniquely
identified by a user-defined name and attached to tables. Their behaviour can
@@ -571,8 +596,10 @@ be tuned with the flags that can be specified at set creation time.
[horizontal]
*add*:: Add a new set in the specified table. See the Set specification table below for more information about how to specify properties of a set.
*delete*:: Delete the specified set.
+*destroy*:: Delete the specified set, it does not fail if it does not exist.
*list*:: Display the elements in the specified set.
*flush*:: Remove all elements from the specified set.
+*reset*:: Reset state in all contained elements, e.g. counter and quota statement values.
.Set specifications
[options="header"]
@@ -585,8 +612,7 @@ string: ipv4_addr, ipv6_addr, ether_addr, inet_proto, inet_service, mark
data type of set element |
expression to derive the data type from
|flags |
-set flags |
-string: constant, dynamic, interval, timeout
+set flags | string: constant, dynamic, interval, timeout. Used to describe the sets properties.
|timeout |
time an element stays in the set, mandatory if set is added to from the packet path (ruleset)|
string, decimal followed by unit. Units are: d, h, m, s
@@ -612,7 +638,7 @@ MAPS
-----
[verse]
*add map* ['family'] 'table' 'map' *{ type* 'type' | *typeof* 'expression' [*flags* 'flags' *;*] [*elements = {* 'element'[*,* ...] *} ;*] [*size* 'size' *;*] [*comment* 'comment' *;*'] [*policy* 'policy' *;*] *}*
-{*delete* | *list* | *flush*} *map* ['family'] 'table' 'map'
+{*delete* | *destroy* | *list* | *flush* | *reset* } *map* ['family'] 'table' 'map'
*list maps* ['family']
Maps store data based on some specific key used as input. They are uniquely identified by a user-defined name and attached to tables.
@@ -620,10 +646,10 @@ Maps store data based on some specific key used as input. They are uniquely iden
[horizontal]
*add*:: Add a new map in the specified table.
*delete*:: Delete the specified map.
+*destroy*:: Delete the specified map, it does not fail if it does not exist.
*list*:: Display the elements in the specified map.
*flush*:: Remove all elements from the specified map.
-*add element*:: Comma-separated list of elements to add into the specified map.
-*delete element*:: Comma-separated list of element keys to delete from the specified map.
+*reset*:: Reset state in all contained elements, e.g. counter and quota statement values.
.Map specifications
[options="header"]
@@ -637,7 +663,7 @@ data type of set element |
expression to derive the data type from
|flags |
map flags |
-string: constant, interval
+string, same as set flags
|elements |
elements contained by the map |
map data type
@@ -649,12 +675,28 @@ map policy |
string: performance [default], memory
|=================
+Users can specifiy the properties/features that the set/map must support.
+This allows the kernel to pick an optimal internal representation.
+If a required flag is missing, the ruleset might still work, as
+nftables will auto-enable features if it can infer this from the ruleset.
+This may not work for all cases, however, so it is recommended to
+specify all required features in the set/map definition manually.
+
+.Set and Map flags
+[options="header"]
+|=================
+|Flag | Description
+|constant | Set contents will never change after creation
+|dynamic | Set must support updates from the packet path with the *add*, *update* or *delete* keywords.
+|interval | Set must be able to store intervals (ranges)
+|timeout | Set must support element timeouts (auto-removal of elements once they expire).
+|=================
ELEMENTS
--------
[verse]
____
-{*add* | *create* | *delete* | *get* } *element* ['family'] 'table' 'set' *{* 'ELEMENT'[*,* ...] *}*
+{*add* | *create* | *delete* | *destroy* | *get* | *reset* } *element* ['family'] 'table' 'set' *{* 'ELEMENT'[*,* ...] *}*
'ELEMENT' := 'key_expression' 'OPTIONS' [*:* 'value_expression']
'OPTIONS' := [*timeout* 'TIMESPEC'] [*expires* 'TIMESPEC'] [*comment* 'string']
@@ -674,6 +716,9 @@ listed elements may already exist.
be non-trivial in very large and/or interval sets. In the latter case, the
containing interval is returned instead of just the element itself.
+*reset* command resets state attached to the given element(s), e.g. counter and
+quota statement values.
+
.Element options
[options="header"]
|=================
@@ -692,7 +737,7 @@ FLOWTABLES
[verse]
{*add* | *create*} *flowtable* ['family'] 'table' 'flowtable' *{ hook* 'hook' *priority* 'priority' *; devices = {* 'device'[*,* ...] *} ; }*
*list flowtables* ['family']
-{*delete* | *list*} *flowtable* ['family'] 'table' 'flowtable'
+{*delete* | *destroy* | *list*} *flowtable* ['family'] 'table' 'flowtable'
*delete* *flowtable* ['family'] 'table' *handle* 'handle'
Flowtables allow you to accelerate packet forwarding in software. Flowtables
@@ -702,8 +747,8 @@ protocols. Each entry also caches the destination interface and the gateway
address - to update the destination link-layer address - to forward packets.
The ttl and hoplimit fields are also decremented. Hence, flowtables provides an
alternative path that allow packets to bypass the classic forwarding path.
-Flowtables reside in the ingress hook that is located before the prerouting
-hook. You can select which flows you want to offload through the flow
+Flowtables reside in the ingress *hook* that is located before the prerouting
+*hook*. You can select which flows you want to offload through the flow
expression from the forward chain. Flowtables are identified by their address
family and their name. The address family must be one of ip, ip6, or inet. The inet
address family is a dummy family which is used to create hybrid IPv4/IPv6
@@ -716,6 +761,7 @@ and subtraction can be used to set relative priority, e.g. filter + 5 equals to
[horizontal]
*add*:: Add a new flowtable for the given family with the given name.
*delete*:: Delete the specified flowtable.
+*destroy*:: Delete the specified flowtable, it does not fail if it does not exist.
*list*:: List all flowtables.
LISTING
@@ -732,19 +778,22 @@ kernel modules, such as nf_conntrack.
STATEFUL OBJECTS
----------------
[verse]
-{*add* | *delete* | *list* | *reset*} *counter* ['family'] 'table' 'object'
-{*add* | *delete* | *list* | *reset*} *quota* ['family'] 'table' 'object'
-{*add* | *delete* | *list*} *limit* ['family'] 'table' 'object'
+{*add* | *delete* | *destroy* | *list* | *reset*} *counter* ['family'] 'table' 'object'
+{*add* | *delete* | *destroy* | *list* | *reset*} *quota* ['family'] 'table' 'object'
+{*add* | *delete* | *destroy* | *list*} *limit* ['family'] 'table' 'object'
*delete* 'counter' ['family'] 'table' *handle* 'handle'
*delete* 'quota' ['family'] 'table' *handle* 'handle'
*delete* 'limit' ['family'] 'table' *handle* 'handle'
+*destroy* 'counter' ['family'] 'table' *handle* 'handle'
+*destroy* 'quota' ['family'] 'table' *handle* 'handle'
+*destroy* 'limit' ['family'] 'table' *handle* 'handle'
*list counters* ['family']
*list quotas* ['family']
*list limits* ['family']
*reset counters* ['family']
*reset quotas* ['family']
-*reset counters* ['family'] *table* 'table'
-*reset quotas* ['family'] *table* 'table'
+*reset counters* ['family'] 'table'
+*reset quotas* ['family'] 'table'
Stateful objects are attached to tables and are identified by a unique name.
They group stateful information from rules, to reference them in rules the
@@ -753,6 +802,7 @@ keywords "type name" are used e.g. "counter name".
[horizontal]
*add*:: Add a new stateful object in the specified table.
*delete*:: Delete the specified object.
+*destroy*:: Delete the specified object, it does not fail if it does not exist.
*list*:: Display stateful information the object holds.
*reset*:: List-and-reset stateful object.
diff --git a/doc/payload-expression.txt b/doc/payload-expression.txt
index 113f5bfc..7bc24a8a 100644
--- a/doc/payload-expression.txt
+++ b/doc/payload-expression.txt
@@ -134,6 +134,14 @@ Destination address |
ipv4_addr
|======================
+Careful with matching on *ip length*: If GRO/GSO is enabled, then the Linux
+kernel might aggregate several packets into one big packet that is larger than
+MTU. Moreover, if GRO/GSO maximum size is larger than 65535 (see man ip-link(8),
+specifically gro_ipv6_max_size and gso_ipv6_max_size), then *ip length* might
+be 0 for such jumbo packets. *meta length* allows you to match on the packet
+length including the IP header size. If you want to perform heuristics on the
+*ip length* field, then disable GRO/GSO.
+
ICMP HEADER EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~
[verse]
@@ -244,6 +252,14 @@ Destination address |
ipv6_addr
|=======================
+Careful with matching on *ip6 length*: If GRO/GSO is enabled, then the Linux
+kernel might aggregate several packets into one big packet that is larger than
+MTU. Moreover, if GRO/GSO maximum size is larger than 65535 (see man ip-link(8),
+specifically gro_ipv6_max_size and gso_ipv6_max_size), then *ip6 length* might
+be 0 for such jumbo packets. *meta length* allows you to match on the packet
+length including the IP header size. If you want to perform heuristics on the
+*ip6 length* field, then disable GRO/GSO.
+
.Using ip6 header expressions
-----------------------------
# matching if first extension header indicates a fragment
@@ -253,7 +269,7 @@ ip6 nexthdr ipv6-frag
ICMPV6 HEADER EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~~~
[verse]
-*icmpv6* {*type* | *code* | *checksum* | *parameter-problem* | *packet-too-big* | *id* | *sequence* | *max-delay*}
+*icmpv6* {*type* | *code* | *checksum* | *parameter-problem* | *packet-too-big* | *id* | *sequence* | *max-delay* | *taddr* | *daddr*}
This expression refers to ICMPv6 header fields. When using it in *inet*,
*bridge* or *netdev* families, it will cause an implicit dependency on IPv6 to
@@ -288,6 +304,12 @@ integer (16 bit)
|max-delay|
maximum response delay of MLD queries|
integer (16 bit)
+|taddr|
+target address of neighbor solicit/advert, redirect or MLD|
+ipv6_addr
+|daddr|
+destination address of redirect|
+ipv6_addr
|==============================
TCP HEADER EXPRESSION
@@ -532,6 +554,122 @@ compression Parameter Index |
integer (16 bit)
|============================
+GRE HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*gre* {*flags* | *version* | *protocol*}
+*gre* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*gre* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+
+The gre expression is used to match on the gre header fields. This expression
+also allows to match on the IPv4 or IPv6 packet within the gre header.
+
+.GRE header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|flags|
+checksum, routing, key, sequence and strict source route flags|
+integer (5 bit)
+|version|
+gre version field, 0 for GRE and 1 for PPTP|
+integer (3 bit)
+|protocol|
+EtherType of encapsulated packet|
+integer (16 bit)
+|==================
+
+.Matching inner IPv4 destination address encapsulated in gre
+------------------------------------------------------------
+netdev filter ingress gre ip daddr 9.9.9.9 counter
+------------------------------------------------------------
+
+GENEVE HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*geneve* {*vni* | *flags*}
+*geneve* *ether* {*daddr* | *saddr* | *type*}
+*geneve* *vlan* {*id* | *dei* | *pcp* | *type*}
+*geneve* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*geneve* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*geneve* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*geneve* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The geneve expression is used to match on the geneve header fields. The geneve
+header encapsulates a ethernet frame within a *udp* packet. This expression
+requires that you restrict the matching to *udp* packets (usually at
+port 6081 according to IANA-assigned ports).
+
+.GENEVE header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|protocol|
+EtherType of encapsulated packet|
+integer (16 bit)
+|vni|
+Virtual Network ID (VNI)|
+integer (24 bit)
+|==================
+
+.Matching inner TCP destination port encapsulated in geneve
+----------------------------------------------------------
+netdev filter ingress udp dport 4789 geneve tcp dport 80 counter
+----------------------------------------------------------
+
+GRETAP HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*gretap* {*vni* | *flags*}
+*gretap* *ether* {*daddr* | *saddr* | *type*}
+*gretap* *vlan* {*id* | *dei* | *pcp* | *type*}
+*gretap* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*gretap* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*gretap* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*gretap* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The gretap expression is used to match on the encapsulated ethernet frame
+within the gre header. Use the *gre* expression to match on the *gre* header
+fields.
+
+.Matching inner TCP destination port encapsulated in gretap
+----------------------------------------------------------
+netdev filter ingress gretap tcp dport 80 counter
+----------------------------------------------------------
+
+VXLAN HEADER EXPRESSION
+~~~~~~~~~~~~~~~~~~~~~~~
+[verse]
+*vxlan* {*vni* | *flags*}
+*vxlan* *ether* {*daddr* | *saddr* | *type*}
+*vxlan* *vlan* {*id* | *dei* | *pcp* | *type*}
+*vxlan* *ip* {*version* | *hdrlength* | *dscp* | *ecn* | *length* | *id* | *frag-off* | *ttl* | *protocol* | *checksum* | *saddr* | *daddr* }
+*vxlan* *ip6* {*version* | *dscp* | *ecn* | *flowlabel* | *length* | *nexthdr* | *hoplimit* | *saddr* | *daddr*}
+*vxlan* *tcp* {*sport* | *dport* | *sequence* | *ackseq* | *doff* | *reserved* | *flags* | *window* | *checksum* | *urgptr*}
+*vxlan* *udp* {*sport* | *dport* | *length* | *checksum*}
+
+The vxlan expression is used to match on the vxlan header fields. The vxlan
+header encapsulates a ethernet frame within a *udp* packet. This expression
+requires that you restrict the matching to *udp* packets (usually at
+port 4789 according to IANA-assigned ports).
+
+.VXLAN header expression
+[options="header"]
+|==================
+|Keyword| Description| Type
+|flags|
+vxlan flags|
+integer (8 bit)
+|vni|
+Virtual Network ID (VNI)|
+integer (24 bit)
+|==================
+
+.Matching inner TCP destination port encapsulated in vxlan
+----------------------------------------------------------
+netdev filter ingress udp dport 4789 vxlan tcp dport 80 counter
+----------------------------------------------------------
+
RAW PAYLOAD EXPRESSION
~~~~~~~~~~~~~~~~~~~~~~
[verse]
@@ -556,6 +694,8 @@ Link layer, for example the Ethernet header
Network header, for example IPv4 or IPv6
|th|
Transport Header, for example TCP
+|ih|
+Inner Header / Payload, i.e. after the L4 transport level header
|==============================
.Matching destination port of both UDP and TCP
@@ -597,6 +737,7 @@ The following syntaxes are valid only in a relational expression with boolean ty
*exthdr* {*hbh* | *frag* | *rt* | *dst* | *mh*}
*tcp option* {*eol* | *nop* | *maxseg* | *window* | *sack-perm* | *sack* | *sack0* | *sack1* | *sack2* | *sack3* | *timestamp*}
*ip option* { lsrr | ra | rr | ssrr }
+*dccp option* 'dccp_option_type'
.IPv6 extension headers
[options="header"]
@@ -699,6 +840,11 @@ ip6 filter input frag more-fragments 1 counter
filter input ip option lsrr exists counter
---------------------------------------
+.finding DCCP option
+------------------
+filter input dccp option 40 exists counter
+---------------------------------------
+
CONNTRACK EXPRESSIONS
~~~~~~~~~~~~~~~~~~~~~
Conntrack expressions refer to meta data of the connection tracking entry associated with a packet. +
diff --git a/doc/primary-expression.txt b/doc/primary-expression.txt
index e13970cf..782494bd 100644
--- a/doc/primary-expression.txt
+++ b/doc/primary-expression.txt
@@ -168,15 +168,18 @@ Either an integer or a date in ISO format. For example: "2019-06-06 17:00".
Hour and seconds are optional and can be omitted if desired. If omitted,
midnight will be assumed.
The following three would be equivalent: "2019-06-06", "2019-06-06 00:00"
-and "2019-06-06 00:00:00".
+and "2019-06-06 00:00:00". Use a range expression such as
+"2019-06-06 10:00"-"2019-06-10 14:00" for matching a time range.
When an integer is given, it is assumed to be a UNIX timestamp.
|day|
Either a day of week ("Monday", "Tuesday", etc.), or an integer between 0 and 6.
Strings are matched case-insensitively, and a full match is not expected (e.g. "Mon" would match "Monday").
-When an integer is given, 0 is Sunday and 6 is Saturday.
+When an integer is given, 0 is Sunday and 6 is Saturday. Use a range expression
+such as "Monday"-"Wednesday" for matching a week day range.
|hour|
A string representing an hour in 24-hour format. Seconds can optionally be specified.
-For example, 17:00 and 17:00:00 would be equivalent.
+For example, 17:00 and 17:00:00 would be equivalent. Use a range expression such
+as "17:00"-"19:00" for matching a time range.
|=============================
.Using meta expressions
@@ -190,6 +193,9 @@ filter output oif eth0
# incoming packet was subject to ipsec processing
raw prerouting meta ipsec exists accept
+
+# match incoming packet from 03:00 to 14:00 local time
+raw prerouting meta hour "03:00"-"14:00" counter accept
-----------------------
SOCKET EXPRESSION
diff --git a/doc/stateful-objects.txt b/doc/stateful-objects.txt
index e3c79220..5824d53a 100644
--- a/doc/stateful-objects.txt
+++ b/doc/stateful-objects.txt
@@ -94,7 +94,7 @@ table ip filter {
ct timeout customtimeout {
protocol tcp;
l3proto ip
- policy = { established: 120, close: 20 }
+ policy = { established: 2m, close: 20s }
}
chain output {
@@ -119,7 +119,7 @@ sport=41360 dport=22
CT EXPECTATION
~~~~~~~~~~~~~~
[verse]
-*add* *ct expectation* ['family'] 'table' 'name' *{ protocol* 'protocol' *; dport* 'dport' *; timeout* 'timeout' *; size* 'size' *; [*l3proto* 'family' *;*] *}*
+*add* *ct expectation* ['family'] 'table' 'name' *{ protocol* 'protocol' *; dport* 'dport' *; timeout* 'timeout' *; size* 'size' *;* [*l3proto* 'family' *;*] *}*
*delete* *ct expectation* ['family'] 'table' 'name'
*list* *ct expectations*
diff --git a/doc/statements.txt b/doc/statements.txt
index 8076c21c..39b31fd2 100644
--- a/doc/statements.txt
+++ b/doc/statements.txt
@@ -171,37 +171,77 @@ REJECT STATEMENT
____
*reject* [ *with* 'REJECT_WITH' ]
-'REJECT_WITH' := *icmp* 'icmp_code' |
- *icmpv6* 'icmpv6_code' |
- *icmpx* 'icmpx_code' |
+'REJECT_WITH' := *icmp* 'icmp_reject_code' |
+ *icmpv6* 'icmpv6_reject_code' |
+ *icmpx* 'icmpx_reject_code' |
*tcp reset*
____
A reject statement is used to send back an error packet in response to the
matched packet otherwise it is equivalent to drop so it is a terminating
statement, ending rule traversal. This statement is only valid in base chains
-using the *input*,
+using the *prerouting*, *input*,
*forward* or *output* hooks, and user-defined chains which are only called from
those chains.
-.different ICMP reject variants are meant for use in different table families
+.Keywords may be used to reject when specifying the ICMP code
[options="header"]
|==================
-|Variant |Family | Type
-|icmp|
-ip|
-icmp_code
-|icmpv6|
-ip6|
-icmpv6_code
-|icmpx|
-inet|
-icmpx_code
+|Keyword | Value
+|net-unreachable |
+0
+|host-unreachable |
+1
+|prot-unreachable|
+2
+|port-unreachable|
+3
+|frag-needed|
+4
+|net-prohibited|
+9
+|host-prohibited|
+10
+|admin-prohibited|
+13
+|===================
+
+.keywords may be used to reject when specifying the ICMPv6 code
+[options="header"]
|==================
+|Keyword |Value
+|no-route|
+0
+|admin-prohibited|
+1
+|addr-unreachable|
+3
+|port-unreachable|
+4
+|policy-fail|
+5
+|reject-route|
+6
+|==================
+
+The ICMPvX Code type abstraction is a set of values which overlap between ICMP
+and ICMPv6 Code types to be used from the inet family.
+
+.keywords may be used when specifying the ICMPvX code
+[options="header"]
+|==================
+|Keyword |Value
+|no-route|
+0
+|port-unreachable|
+1
+|host-unreachable|
+2
+|admin-prohibited|
+3
+|=================
-For a description of the different types and a list of supported keywords refer
-to DATA TYPES section above. The common default reject value is
-*port-unreachable*. +
+The common default ICMP code to reject is *port-unreachable*.
Note that in bridge family, reject statement is only allowed in base chains
which hook into input or prerouting.
@@ -296,7 +336,7 @@ A meta statement sets the value of a meta expression. The existing meta fields
are: priority, mark, pkttype, nftrace. +
[verse]
-*meta* {*mark* | *priority* | *pkttype* | *nftrace*} *set* 'value'
+*meta* {*mark* | *priority* | *pkttype* | *nftrace* | *broute*} *set* 'value'
A meta statement sets meta data associated with a packet. +
@@ -316,6 +356,9 @@ pkt_type
|nftrace |
ruleset packet tracing on/off. Use *monitor trace* command to watch traces|
0, 1
+|broute |
+broute on/off. packets are routed instead of being bridged|
+0, 1
|==========================
LIMIT STATEMENT
@@ -356,8 +399,8 @@ NAT STATEMENTS
~~~~~~~~~~~~~~
[verse]
____
-*snat* [[*ip* | *ip6*] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
-*dnat* [[*ip* | *ip6*] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*snat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
+*dnat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS']
*masquerade* [*to :*'PORT_SPEC'] ['FLAGS']
*redirect* [*to :*'PORT_SPEC'] ['FLAGS']
@@ -395,6 +438,9 @@ Before kernel 4.18 nat statements require both prerouting and postrouting base c
to be present since otherwise packets on the return path won't be seen by
netfilter and therefore no reverse translation will take place.
+The optional *prefix* keyword allows to map to map *n* source addresses to *n*
+destination addresses. See 'Advanced NAT examples' below.
+
.NAT statement values
[options="header"]
|==================
@@ -405,7 +451,7 @@ You may specify a mapping to relate a list of tuples composed of arbitrary
expression key with address value. |
ipv4_addr, ipv6_addr, e.g. abcd::1234, or you can use a mapping, e.g. meta mark map { 10 : 192.168.1.2, 20 : 192.168.1.3 }
|port|
-Specifies that the source/destination address of the packet should be modified. |
+Specifies that the source/destination port of the packet should be modified. |
port number (16 bit)
|===============================
@@ -454,6 +500,52 @@ add rule inet nat postrouting meta oif ppp0 masquerade
------------------------
+.Advanced NAT examples
+----------------------
+
+# map prefixes in one network to that of another, e.g. 10.141.11.4 is mangled to 192.168.2.4,
+# 10.141.11.5 is mangled to 192.168.2.5 and so on.
+add rule nat postrouting snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24 }
+
+# map a source address, source port combination to a pool of destination addresses and ports:
+add rule nat postrouting dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 }
+
+# The above example generates the following NAT expression:
+#
+# [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ]
+#
+# which expects to obtain the following tuple:
+# IP address (min), source port (min), IP address (max), source port (max)
+# to be obtained from the map. The given addresses and ports are inclusive.
+
+# This also works with named maps and in combination with both concatenations and ranges:
+table ip nat {
+ map ipportmap {
+ typeof ip saddr : interval ip daddr . tcp dport
+ flags interval
+ elements = { 192.168.1.2 : 10.141.10.1-10.141.10.3 . 8888-8999, 192.168.2.0/24 : 10.141.11.5-10.141.11.20 . 8888-8999 }
+ }
+
+ chain prerouting {
+ type nat hook prerouting priority dstnat; policy accept;
+ ip protocol tcp dnat ip to ip saddr map @ipportmap
+ }
+}
+
+@ipportmap maps network prefixes to a range of hosts and ports.
+The new destination is taken from the range provided by the map element.
+Same for the destination port.
+
+Note the use of the "interval" keyword in the typeof description.
+This is required so nftables knows that it has to ask for twice the
+amount of storage for each key-value pair in the map.
+
+": ipv4_addr . inet_service" would allow associating one address and one port
+with each key. But for this case, for each key, two addresses and two ports
+(The minimum and maximum values for both) have to be stored.
+
+------------------------
+
TPROXY STATEMENT
~~~~~~~~~~~~~~~~
Tproxy redirects the packet to a local socket without changing the packet header
@@ -683,7 +775,24 @@ The fwd statement is used to redirect a raw packet to another interface. It is
only available in the netdev family ingress and egress hooks. It is similar to
the dup statement except that no copy is made.
+You can also specify the address of the next hop and the device to forward the
+packet to. This updates the source and destination MAC address of the packet by
+transmitting it through the neighboring layer. This also decrements the ttl
+field of the IP packet. This provides a way to effectively bypass the classical
+forwarding path, thus skipping the fib (forwarding information base) lookup.
+
+[verse]
*fwd to* 'device'
+*fwd* [*ip* | *ip6*] *to* 'address' *device* 'device'
+
+.Using the fwd statement
+------------------------
+# redirect raw packet to device
+netdev ingress fwd to "eth0"
+
+# forward packet to next hop 192.168.200.1 via eth0 device
+netdev ingress ether saddr set fwd ip to 192.168.200.1 device "eth0"
+-----------------------------------
SET STATEMENT
~~~~~~~~~~~~~
@@ -699,6 +808,10 @@ will not grow indefinitely) either from the set definition or from the statement
that adds or updates them. The set statement can be used to e.g. create dynamic
blacklists.
+Dynamic updates are also supported with maps. In this case, the *add* or
+*update* rule needs to provide both the key and the data element (value),
+separated via ':'.
+
[verse]
{*add* | *update*} *@*'setname' *{* 'expression' [*timeout* 'timeout'] [*comment* 'string'] *}*
@@ -783,3 +896,20 @@ ____
# jump to different chains depending on layer 4 protocol type:
nft add rule ip filter input ip protocol vmap { tcp : jump tcp-chain, udp : jump udp-chain , icmp : jump icmp-chain }
------------------------
+
+XT STATEMENT
+~~~~~~~~~~~~
+This represents an xt statement from xtables compat interface. It is a
+fallback if translation is not available or not complete.
+
+[verse]
+____
+*xt* 'TYPE' 'NAME'
+
+'TYPE' := *match* | *target* | *watcher*
+____
+
+Seeing this means the ruleset (or parts of it) were created by *iptables-nft*
+and one should use that to manage it.
+
+*BEWARE:* nftables won't restore these statements.