diff options
Diffstat (limited to 'doc/statements.txt')
-rw-r--r-- | doc/statements.txt | 306 |
1 files changed, 223 insertions, 83 deletions
diff --git a/doc/statements.txt b/doc/statements.txt index 9155f286..39b31fd2 100644 --- a/doc/statements.txt +++ b/doc/statements.txt @@ -11,7 +11,7 @@ The verdict statement alters control flow in the ruleset and issues policy decis [horizontal] *accept*:: Terminate ruleset evaluation and accept the packet. The packet can still be dropped later by another hook, for instance accept -in the forward hook still allows to drop the packet later in the postrouting hook, +in the forward hook still allows one to drop the packet later in the postrouting hook, or another forward base chain that has a higher priority number and is evaluated afterwards in the processing pipeline. *drop*:: Terminate ruleset evaluation and drop the packet. @@ -71,7 +71,7 @@ EXTENSION HEADER STATEMENT The extension header statement alters packet content in variable-sized headers. This can currently be used to alter the TCP Maximum segment size of packets, -similar to TCPMSS. +similar to the TCPMSS target in iptables. .change tcp mss --------------- @@ -80,6 +80,13 @@ tcp flags syn tcp option maxseg size set 1360 tcp flags syn tcp option maxseg size set rt mtu --------------- +You can also remove tcp options via reset keyword. + +.remove tcp option +--------------- +tcp flags syn reset tcp option sack-perm +--------------- + LOG STATEMENT ~~~~~~~~~~~~~ [verse] @@ -93,10 +100,11 @@ packets, such as header fields, via the kernel log (where it can be read with dmesg(1) or read in the syslog). In the second form of invocation (if 'nflog_group' is specified), the Linux -kernel will pass the packet to nfnetlink_log which will multicast the packet -through a netlink socket to the specified multicast group. One or more userspace -processes may subscribe to the group to receive the packets, see -libnetfilter_queue documentation for details. +kernel will pass the packet to nfnetlink_log which will send the log through a +netlink socket to the specified group. One userspace process may subscribe to +the group to receive the logs, see man(8) ulogd for the Netfilter userspace log +daemon and libnetfilter_log documentation for details in case you would like to +develop a custom program to digest your logs. In the third form of invocation (if level audit is specified), the Linux kernel writes a message into the audit buffer suitably formatted for reading @@ -163,37 +171,77 @@ REJECT STATEMENT ____ *reject* [ *with* 'REJECT_WITH' ] -'REJECT_WITH' := *icmp type* 'icmp_code' | - *icmpv6 type* 'icmpv6_code' | - *icmpx type* 'icmpx_code' | +'REJECT_WITH' := *icmp* 'icmp_reject_code' | + *icmpv6* 'icmpv6_reject_code' | + *icmpx* 'icmpx_reject_code' | *tcp reset* ____ A reject statement is used to send back an error packet in response to the matched packet otherwise it is equivalent to drop so it is a terminating statement, ending rule traversal. This statement is only valid in base chains -using the *input*, +using the *prerouting*, *input*, *forward* or *output* hooks, and user-defined chains which are only called from those chains. -.different ICMP reject variants are meant for use in different table families +.Keywords may be used to reject when specifying the ICMP code [options="header"] |================== -|Variant |Family | Type -|icmp| -ip| -icmp_code -|icmpv6| -ip6| -icmpv6_code -|icmpx| -inet| -icmpx_code +|Keyword | Value +|net-unreachable | +0 +|host-unreachable | +1 +|prot-unreachable| +2 +|port-unreachable| +3 +|frag-needed| +4 +|net-prohibited| +9 +|host-prohibited| +10 +|admin-prohibited| +13 +|=================== + +.keywords may be used to reject when specifying the ICMPv6 code +[options="header"] |================== +|Keyword |Value +|no-route| +0 +|admin-prohibited| +1 +|addr-unreachable| +3 +|port-unreachable| +4 +|policy-fail| +5 +|reject-route| +6 +|================== + +The ICMPvX Code type abstraction is a set of values which overlap between ICMP +and ICMPv6 Code types to be used from the inet family. + +.keywords may be used when specifying the ICMPvX code +[options="header"] +|================== +|Keyword |Value +|no-route| +0 +|port-unreachable| +1 +|host-unreachable| +2 +|admin-prohibited| +3 +|================= -For a description of the different types and a list of supported keywords refer -to DATA TYPES section above. The common default reject value is -*port-unreachable*. + +The common default ICMP code to reject is *port-unreachable*. Note that in bridge family, reject statement is only allowed in base chains which hook into input or prerouting. @@ -216,7 +264,7 @@ The conntrack statement can be used to set the conntrack mark and conntrack labe The ct statement sets meta data associated with a connection. The zone id has to be assigned before a conntrack lookup takes place, i.e. this has to be done in prerouting and possibly output (if locally generated packets need to be -placed in a distinct zone), with a hook priority of -300. +placed in a distinct zone), with a hook priority of *raw* (-300). Unlike iptables, where the helper assignment happens in the raw table, the helper needs to be assigned after a conntrack entry has been @@ -253,11 +301,11 @@ ct mark set meta mark ------------------------------ table inet raw { chain prerouting { - type filter hook prerouting priority -300; + type filter hook prerouting priority raw; ct zone set iif map { "eth1" : 1, "veth1" : 2 } } chain output { - type filter hook output priority -300; + type filter hook output priority raw; ct zone set oif map { "eth1" : 1, "veth1" : 2 } } } @@ -270,7 +318,7 @@ ct event set new,related,destroy NOTRACK STATEMENT ~~~~~~~~~~~~~~~~~ -The notrack statement allows to disable connection tracking for certain +The notrack statement allows one to disable connection tracking for certain packets. [verse] @@ -278,7 +326,7 @@ packets. Note that for this statement to be effective, it has to be applied to packets before a conntrack lookup happens. Therefore, it needs to sit in a chain with -either prerouting or output hook and a hook priority of -300 or less. +either prerouting or output hook and a hook priority of -300 (*raw*) or less. See SYNPROXY STATEMENT for an example usage. @@ -288,7 +336,7 @@ A meta statement sets the value of a meta expression. The existing meta fields are: priority, mark, pkttype, nftrace. + [verse] -*meta* {*mark* | *priority* | *pkttype* | *nftrace*} *set* 'value' +*meta* {*mark* | *priority* | *pkttype* | *nftrace* | *broute*} *set* 'value' A meta statement sets meta data associated with a packet. + @@ -308,6 +356,9 @@ pkt_type |nftrace | ruleset packet tracing on/off. Use *monitor trace* command to watch traces| 0, 1 +|broute | +broute on/off. packets are routed instead of being bridged| +0, 1 |========================== LIMIT STATEMENT @@ -326,6 +377,12 @@ using this statement will match until this limit is reached. It can be used in combination with the log statement to give limited logging. The optional *over* keyword makes it match over the specified rate. +The *burst* value influences the bucket size, i.e. jitter tolerance. With +packet-based *limit*, the bucket holds exactly *burst* packets, by default +five. If you specify packet *burst*, it must be a non-zero value. With +byte-based *limit*, the bucket's minimum size is the given rate's byte value +and the *burst* value adds to that, by default zero bytes. + .limit statement values [options="header"] |================== @@ -342,21 +399,16 @@ NAT STATEMENTS ~~~~~~~~~~~~~~ [verse] ____ -*snat to* 'address' [*:*'port'] ['PRF_FLAGS'] -*snat to* 'address' *-* 'address' [*:*'port' *-* 'port'] ['PRF_FLAGS'] -*snat* { *ip* | *ip6* } *to* 'address' *-* 'address' [*:*'port' *-* 'port'] ['PR_FLAGS'] -*dnat to* 'address' [*:*'port'] ['PRF_FLAGS'] -*dnat to* 'address' [*:*'port' *-* 'port'] ['PR_FLAGS'] -*dnat* { *ip* | *ip6* } *to* 'address' [*:*'port' *-* 'port'] ['PR_FLAGS'] -*masquerade to* [*:*'port'] ['PRF_FLAGS'] -*masquerade to* [*:*'port' *-* 'port'] ['PRF_FLAGS'] -*redirect to* [*:*'port'] ['PRF_FLAGS'] -*redirect to* [*:*'port' *-* 'port'] ['PRF_FLAGS'] - -'PRF_FLAGS' := 'PRF_FLAG' [*,* 'PRF_FLAGS'] -'PR_FLAGS' := 'PR_FLAG' [*,* 'PR_FLAGS'] -'PRF_FLAG' := 'PR_FLAG' | *fully-random* -'PR_FLAG' := *persistent* | *random* +*snat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS'] +*dnat* [[*ip* | *ip6*] [ *prefix* ] *to*] 'ADDR_SPEC' [*:*'PORT_SPEC'] ['FLAGS'] +*masquerade* [*to :*'PORT_SPEC'] ['FLAGS'] +*redirect* [*to :*'PORT_SPEC'] ['FLAGS'] + +'ADDR_SPEC' := 'address' | 'address' *-* 'address' +'PORT_SPEC' := 'port' | 'port' *-* 'port' + +'FLAGS' := 'FLAG' [*,* 'FLAGS'] +'FLAG' := *persistent* | *random* | *fully-random* ____ The nat statements are only valid from nat chain types. + @@ -386,6 +438,9 @@ Before kernel 4.18 nat statements require both prerouting and postrouting base c to be present since otherwise packets on the return path won't be seen by netfilter and therefore no reverse translation will take place. +The optional *prefix* keyword allows to map to map *n* source addresses to *n* +destination addresses. See 'Advanced NAT examples' below. + .NAT statement values [options="header"] |================== @@ -396,7 +451,7 @@ You may specify a mapping to relate a list of tuples composed of arbitrary expression key with address value. | ipv4_addr, ipv6_addr, e.g. abcd::1234, or you can use a mapping, e.g. meta mark map { 10 : 192.168.1.2, 20 : 192.168.1.3 } |port| -Specifies that the source/destination address of the packet should be modified. | +Specifies that the source/destination port of the packet should be modified. | port number (16 bit) |=============================== @@ -419,8 +474,8 @@ If used then port mapping is generated based on a 32-bit pseudo-random algorithm --------------------- # create a suitable table/chain setup for all further examples add table nat -add chain nat prerouting { type nat hook prerouting priority 0; } -add chain nat postrouting { type nat hook postrouting priority 100; } +add chain nat prerouting { type nat hook prerouting priority dstnat; } +add chain nat postrouting { type nat hook postrouting priority srcnat; } # translate source addresses of all packets leaving via eth0 to address 1.2.3.4 add rule nat postrouting oif eth0 snat to 1.2.3.4 @@ -445,6 +500,52 @@ add rule inet nat postrouting meta oif ppp0 masquerade ------------------------ +.Advanced NAT examples +---------------------- + +# map prefixes in one network to that of another, e.g. 10.141.11.4 is mangled to 192.168.2.4, +# 10.141.11.5 is mangled to 192.168.2.5 and so on. +add rule nat postrouting snat ip prefix to ip saddr map { 10.141.11.0/24 : 192.168.2.0/24 } + +# map a source address, source port combination to a pool of destination addresses and ports: +add rule nat postrouting dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 } + +# The above example generates the following NAT expression: +# +# [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ] +# +# which expects to obtain the following tuple: +# IP address (min), source port (min), IP address (max), source port (max) +# to be obtained from the map. The given addresses and ports are inclusive. + +# This also works with named maps and in combination with both concatenations and ranges: +table ip nat { + map ipportmap { + typeof ip saddr : interval ip daddr . tcp dport + flags interval + elements = { 192.168.1.2 : 10.141.10.1-10.141.10.3 . 8888-8999, 192.168.2.0/24 : 10.141.11.5-10.141.11.20 . 8888-8999 } + } + + chain prerouting { + type nat hook prerouting priority dstnat; policy accept; + ip protocol tcp dnat ip to ip saddr map @ipportmap + } +} + +@ipportmap maps network prefixes to a range of hosts and ports. +The new destination is taken from the range provided by the map element. +Same for the destination port. + +Note the use of the "interval" keyword in the typeof description. +This is required so nftables knows that it has to ask for twice the +amount of storage for each key-value pair in the map. + +": ipv4_addr . inet_service" would allow associating one address and one port +with each key. But for this case, for each key, two addresses and two ports +(The minimum and maximum values for both) have to be stored. + +------------------------ + TPROXY STATEMENT ~~~~~~~~~~~~~~~~ Tproxy redirects the packet to a local socket without changing the packet header @@ -481,21 +582,21 @@ this case the rule will match for both families. ------------------------------------- table ip x { chain y { - type filter hook prerouting priority -150; policy accept; + type filter hook prerouting priority mangle; policy accept; tcp dport ntp tproxy to 1.1.1.1 udp dport ssh tproxy to :2222 } } table ip6 x { chain y { - type filter hook prerouting priority -150; policy accept; + type filter hook prerouting priority mangle; policy accept; tcp dport ntp tproxy to [dead::beef] udp dport ssh tproxy to :2222 } } table inet x { chain y { - type filter hook prerouting priority -150; policy accept; + type filter hook prerouting priority mangle; policy accept; tcp dport 321 tproxy to :ssh tcp dport 99 tproxy ip to 1.1.1.1:999 udp dport 155 tproxy ip6 to [dead::beef]:smux @@ -566,28 +667,13 @@ drop incorrect cookies. Flags combinations not expected during 3WHS will not match and continue (e.g. SYN+FIN, SYN+ACK). Finally, drop invalid packets, this will be out-of-flow packets that were not matched by SYNPROXY. - table ip foo { + table ip x { chain z { type filter hook input priority filter; policy accept; - ct state { invalid, untracked } synproxy mss 1460 wscale 9 timestamp sack-perm + ct state invalid, untracked synproxy mss 1460 wscale 9 timestamp sack-perm ct state invalid drop } } - -The outcome ruleset of the steps above should be similar to the one below. - - table ip x { - chain y { - type filter hook prerouting priority raw; policy accept; - tcp flags syn notrack - } - - chain z { - type filter hook input priority filter; policy accept; - ct state { invalid, untracked } synproxy mss 1460 wscale 9 timestamp sack-perm - ct state invalid drop - } - } --------------------------------------- FLOW STATEMENT @@ -608,13 +694,19 @@ for details. [verse] ____ -*queue* [*num* 'queue_number'] [*bypass*] -*queue* [*num* 'queue_number_from' - 'queue_number_to'] ['QUEUE_FLAGS'] +*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'queue_number'] +*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'queue_number_from' - 'queue_number_to'] +*queue* [*flags* 'QUEUE_FLAGS'] [*to* 'QUEUE_EXPRESSION' ] 'QUEUE_FLAGS' := 'QUEUE_FLAG' [*,* 'QUEUE_FLAGS'] 'QUEUE_FLAG' := *bypass* | *fanout* +'QUEUE_EXPRESSION' := *numgen* | *hash* | *symhash* | *MAP STATEMENT* ____ +QUEUE_EXPRESSION can be used to compute a queue number +at run-time with the hash or numgen expressions. It also +allows one to use the map statement to assign fixed queue numbers +based on external inputs such as the source ip address or interface names. .queue statement values [options="header"] @@ -670,7 +762,7 @@ string ip filter forward dup to 10.2.3.4 device "eth0" # copy raw frame to another interface -netdetv ingress dup to "eth0" +netdev ingress dup to "eth0" dup to "eth0" # combine with map dst addr to gateways @@ -680,10 +772,27 @@ dup to ip daddr map { 192.168.7.1 : "eth0", 192.168.7.2 : "eth1" } FWD STATEMENT ~~~~~~~~~~~~~ The fwd statement is used to redirect a raw packet to another interface. It is -only available in the netdev family ingress hook. It is similar to the dup -statement except that no copy is made. +only available in the netdev family ingress and egress hooks. It is similar to +the dup statement except that no copy is made. +You can also specify the address of the next hop and the device to forward the +packet to. This updates the source and destination MAC address of the packet by +transmitting it through the neighboring layer. This also decrements the ttl +field of the IP packet. This provides a way to effectively bypass the classical +forwarding path, thus skipping the fib (forwarding information base) lookup. + +[verse] *fwd to* 'device' +*fwd* [*ip* | *ip6*] *to* 'address' *device* 'device' + +.Using the fwd statement +------------------------ +# redirect raw packet to device +netdev ingress fwd to "eth0" + +# forward packet to next hop 192.168.200.1 via eth0 device +netdev ingress ether saddr set fwd ip to 192.168.200.1 device "eth0" +----------------------------------- SET STATEMENT ~~~~~~~~~~~~~ @@ -699,13 +808,26 @@ will not grow indefinitely) either from the set definition or from the statement that adds or updates them. The set statement can be used to e.g. create dynamic blacklists. +Dynamic updates are also supported with maps. In this case, the *add* or +*update* rule needs to provide both the key and the data element (value), +separated via ':'. + [verse] {*add* | *update*} *@*'setname' *{* 'expression' [*timeout* 'timeout'] [*comment* 'string'] *}* .Example for simple blacklist ----------------------------- -# declare a set, bound to table "filter", in family "ip". Timeout and size are mandatory because we will add elements from packet path. -nft add set ip filter blackhole "{ type ipv4_addr; flags timeout; size 65536; }" +# declare a set, bound to table "filter", in family "ip". +# Timeout and size are mandatory because we will add elements from packet path. +# Entries will timeout after one minute, after which they might be +# re-added if limit condition persists. +nft add set ip filter blackhole \ + "{ type ipv4_addr; flags dynamic; timeout 1m; size 65536; }" + +# declare a set to store the limit per saddr. +# This must be separate from blackhole since the timeout is different +nft add set ip filter flood \ + "{ type ipv4_addr; flags dynamic; timeout 10s; size 128000; }" # whitelist internal interface. nft add rule ip filter input meta iifname "internal" accept @@ -713,17 +835,18 @@ nft add rule ip filter input meta iifname "internal" accept # drop packets coming from blacklisted ip addresses. nft add rule ip filter input ip saddr @blackhole counter drop -# add source ip addresses to the blacklist if more than 10 tcp connection requests occurred per second and ip address. -# entries will timeout after one minute, after which they might be re-added if limit condition persists. -nft add rule ip filter input tcp flags syn tcp dport ssh meter flood size 128000 { ip saddr timeout 10s limit rate over 10/second} add @blackhole { ip saddr timeout 1m } drop +# add source ip addresses to the blacklist if more than 10 tcp connection +# requests occurred per second and ip address. +nft add rule ip filter input tcp flags syn tcp dport ssh \ + add @flood { ip saddr limit rate over 10/second } \ + add @blackhole { ip saddr } \ + drop -# inspect state of the rate limit meter: -nft list meter ip filter flood - -# inspect content of blackhole: +# inspect state of the sets. +nft list set ip filter flood nft list set ip filter blackhole -# manually add two addresses to the set: +# manually add two addresses to the blackhole. nft add element filter blackhole { 10.2.3.4, 10.23.1.42 } ----------------------------------------------- @@ -773,3 +896,20 @@ ____ # jump to different chains depending on layer 4 protocol type: nft add rule ip filter input ip protocol vmap { tcp : jump tcp-chain, udp : jump udp-chain , icmp : jump icmp-chain } ------------------------ + +XT STATEMENT +~~~~~~~~~~~~ +This represents an xt statement from xtables compat interface. It is a +fallback if translation is not available or not complete. + +[verse] +____ +*xt* 'TYPE' 'NAME' + +'TYPE' := *match* | *target* | *watcher* +____ + +Seeing this means the ruleset (or parts of it) were created by *iptables-nft* +and one should use that to manage it. + +*BEWARE:* nftables won't restore these statements. |