nftables - nft command line tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	src: implement add/create/delete for ct helper objects	Florian Westphal	2017-03-16	1	-0/+4
\| \| \| \| \|	Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: allow listing all ct helpers	Florian Westphal	2017-03-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	this implements nft list ct helpers table filter table ip filter { ct helper ftp-standard { .. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: refactor CMD_OBJ_QUOTA/COUNTER handling	Florian Westphal	2017-03-16	1	-12/+20
\| \| \| \| \| \| \|	... to make adding CMD_OBJ_CT_HELPER support easier. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	fib: Support existence check	Phil Sutter	2017-03-13	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows to check whether a FIB entry exists for a given packet by comparing the expression with a boolean keyword like so: \| fib daddr oif exists The implementation requires introduction of a generic expression flag EXPR_F_BOOLEAN which allows relational expression to signal it's LHS that a boolean comparison is being done (indicated by boolean type on RHS). In contrast to exthdr existence checks, fib expression can't know this in beforehand because the LHS syntax is absolutely identical to a non-boolean comparison. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	exthdr: Implement existence check	Phil Sutter	2017-03-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	This allows to check for existence of an IPv6 extension or TCP option header by using the following syntax: \| exthdr frag exists \| tcpopt window exists Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: hash: support of symmetric hash	Laura Garcia Liebana	2017-03-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch provides symmetric hash support according to source ip address and port, and destination ip address and port. The new attribute NFTA_HASH_TYPE has been included to support different types of hashing functions. Currently supported NFT_HASH_JENKINS through jhash and NFT_HASH_SYM through symhash. The main difference between both types are: - jhash requires an expression with sreg, symhash doesn't. - symhash supports modulus and offset, but not seed. Examples: nft add rule ip nat prerouting ct mark set jhash ip saddr mod 2 nft add rule ip nat prerouting ct mark set symhash mod 2 Signed-off-by: Laura Garcia Liebana <laura.garcia@zevenet.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: store byteorder for set data	Pablo Neira Ayuso	2017-02-28	1	-1/+3
\| \| \| \| \| \| \| \| \|	Add new UDATA_SET_DATABYTEORDER attribute for NFTA_SET_UDATA to store the datatype byteorder. This is required if integer_type is used on the rhs of the mapping given that this datatype comes with no specific byteorder. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: rename set_keytype_alloc() to set_datatype_alloc()	Pablo Neira Ayuso	2017-02-28	1	-1/+1
\| \| \| \| \| \| \|	This function can be used either side of the map, so rename it to something generic. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: set byteorder as lhs expression context in stmt_evaluate_arg()	Pablo Neira Ayuso	2017-02-28	1	-9/+15
\| \| \| \| \| \| \|	stmt_evaluate_arg() needs to take the lhs map expression byteorder in order to evaluate the lhs of mappings accordingly. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: store byteorder for set keys	Pablo Neira Ayuso	2017-02-25	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Selectors that rely on the integer type and expect host endian byteorder don't work properly. We need to keep the byteorder around based on the left hand size expression that provides the context, so store the byteorder when evaluating the map. Before this patch. # nft --debug=netlink add rule x y meta mark set meta cpu map { 0 : 1, 1 : 2 } __map%d x b __map%d x 0 element 00000000 : 00000001 0 [end] element 01000000 : 00000002 0 [end] ^^^^^^^^ This is expressed in network byteorder, because the invalid byteorder defaults on this. After this patch: # nft --debug=netlink add rule x y meta mark set meta cpu map { 0 : 1, 1 : 2 } __map%d x b __map%d x 0 element 00000000 : 00000001 0 [end] element 00000001 : 00000002 0 [end] ^^^^^^^^ This is in host byteorder, as the key selector in the map mandates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add TCP option matching	Manuel Messner	2017-02-12	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables nft to match against TCP options. Currently these TCP options are supported: * End of Option List (eol) * No-Operation (noop) * Maximum Segment Size (maxseg) * Window Scale (window) * SACK Permitted (sack_permitted) * SACK (sack) * Timestamps (timestamp) Syntax: tcp options $option_name [$offset] $field_name Example: # count all incoming packets with a specific maximum segment size `x` # nft add rule filter input tcp option maxseg size x counter # count all incoming packets with a SACK TCP option where the third # (counted from zero) left field is greater `x`. # nft add rule filter input tcp option sack 2 left \> x counter If the offset (the `2` in the example above) is zero, it can optionally be omitted. For all non-SACK TCP options it is always zero, thus can be left out. Option names and field names are parsed from templates, similar to meta and ct options rather than via keywords to prevent adding more keywords than necessary. Signed-off-by: Manuel Messner <mm@skelett.io> Signed-off-by: Florian Westphal <fw@strlen.de>
*	exthdr: prepare exthdr_gen_dependency for tcp support	Manuel Messner	2017-02-12	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \|	currently exthdr always needs ipv6 dependency (i.e. link layer), but with upcomming TCP option matching we also need to include TCP at the network layer. This patch prepares this change by adding two parameters to exthdr_gen_dependency. Signed-off-by: Manuel Messner <mm@skelett.io> Signed-off-by: Florian Westphal <fw@strlen.de>
*	evaluate: fix typo	Manuel Messner	2017-02-02	1	-1/+1
\| \| \| \| \|	Signed-off-by: Manuel Messner <mm@skelett.io> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: Allow list stateful objects in a table	Elise Lennion	2017-01-27	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, stateful objects can be listed by: listing all objects in all tables; listing a single object in a table. Now it's allowed to list all objects in a table. $ nft list counters table filter table ip filter { counter https-traffic { packets 14825 bytes 950063 } counter http-traffic { packets 117 bytes 9340 } } $ nft list quotas table filter table ip filter { quota https-quota { 25 mbytes used 2 mbytes } quota http-quota { 25 mbytes used 10 kbytes } } Signed-off-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Evaluate table name before reset stateful objects in a table	Elise Lennion	2017-01-27	1	-3/+4
\| \| \| \| \| \| \| \|	Reseting stateful objects in a single table is already implemented and cmd_evaluate_reset() now tests for the table name. Signed-off-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Allow list single stateful object	Elise Lennion	2017-01-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the stateful objects can only be listed in groups. With this patch listing a single object is allowed: $ nft list counter filter https-traffic table ip filter { counter https-traffic { packets 4014 bytes 228948 } } $ nft list quota filter https-quota table ip filter { quota https-quota { 25 mbytes used 278 kbytes } } Signed-off-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Allow reset single stateful object	Elise Lennion	2017-01-27	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the stateful objects can only be reseted in groups. With this patch reseting a single object is allowed: $ nft reset counter filter https-traffic table ip filter { counter https-traffic { packets 8774 bytes 542668 } } $ nft list counter filter https-traffic table ip filter { counter https-traffic { packets 0 bytes 0 } } Heavily based on work from Pablo Neira Ayuso <pablo@netfilter.org>. Signed-off-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: fix export length and data corruption	Florian Westphal	2017-01-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pablo reported that ipv6 tests would fail on some systems: WARNING: 'add rule --debug=netlink ip6 test-ip6 input ip6 flowlabel set 0': '[ bitwise reg 1 = (reg=1 & 0x000000f0 ) ^ 0x00000000 ]' mismatches '[ bitwise reg 1 = (reg=1 & 0x00000000 ) ^ 0x00000000 ]' ^ should be 'f' Problem is that mpz_export_data expects the size of the output buffer in bytes, but this gave bit-based size. Then, when mpz_export_data clears the output buffer it will also clear 8 extra bytes on stack; depending on compiler version (stack layout) this will then clear the bitmask value that we want to export. Fixes: 78936d50f306c ("evaluate: add support to set IPv6 non-byte header fields") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add support for stateful object maps	Pablo Neira Ayuso	2017-01-03	1	-1/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	You can create these maps using explicit map declarations: # nft add table filter # nft add chain filter input { type filter hook input priority 0\; } # nft add map filter badguys { type ipv4_addr : counter \; } # nft add rule filter input counter name ip saddr map @badguys # nft add counter filter badguy1 # nft add counter filter badguy2 # nft add element filter badguys { 192.168.2.3 : "badguy1" } # nft add element filter badguys { 192.168.2.4 : "badguy2" } Or through implicit map definitions: table ip filter { counter http-traffic { packets 8 bytes 672 } chain input { type filter hook input priority 0; policy accept; counter name tcp dport map { 80 : "http-traffic", 443 : "http-traffic"} } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add stateful object reference expression	Pablo Neira Ayuso	2017-01-03	1	-0/+16
\| \| \| \| \| \| \| \| \|	This patch adds a new objref statement to refer to existing stateful objects from rules, eg. # nft add rule filter input counter name test counter Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: reset internal stateful objects	Pablo Neira Ayuso	2017-01-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to atomically dump and reset stateful objects, eg. # nft list counters table ip filter { counter test { packets 1024 bytes 100000 } } # nft reset quotas table filter counter test { packets 1024 bytes 100000 } # nft reset quotas table filter counter test { packets 0 bytes 0 } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add/create/delete stateful objects	Pablo Neira Ayuso	2017-01-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to add and to delete objects, eg. # nft add quota filter test 1234567 bytes # nft list quotas table ip filter { quota test { 1234567 bytes } } # nft delete quota filter test Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: listing of stateful objects	Pablo Neira Ayuso	2017-01-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows you to dump existing stateful objects, eg. # nft list ruleset table ip filter { counter test { packets 64 bytes 1268 } quota test { over 1 mbytes used 1268 bytes } chain input { type filter hook input priority 0; policy accept; quota name test drop counter name test } } # nft list quotas table ip filter { quota test { over 1 mbytes used 1268 bytes } } # nft list counters table ip filter { counter test { packets 64 bytes 1268 } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: remove SET_F_* flag definitions	Pablo Neira Ayuso	2017-01-03	1	-17/+17
\| \| \| \| \| \| \|	They map exactly one to one to we have in the kernel headers, so use kernel definitions instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add support to flush sets	Pablo Neira Ayuso	2016-12-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	You can use this new command to remove all existing elements in a set: # nft flush set filter xyz After this command, the set 'xyz' in table 'filter' becomes empty. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: return ctx->table from table_lookup_global()	Pablo Neira Ayuso	2016-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Instead of returning ctx->cmd->table. Note that ctx->cmd->table and ctx->table points to the same object when all commands are embedded into the table definition. But this is not true if we mix table definitions with linear list commands in one file that we load via nft -f. Reported-by: Martin Bednar <martin@serafean.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Update cache on flush ruleset	Anatole Denis	2016-12-01	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	After a flush, the cache should be empty, otherwise the cache and the expected state are desynced, causing unwarranted errors. See tests/shell/testcases/cache/0002_interval_0. `flush table` and `flush chain` don't empty sets or destroy chains, so the cache does not need an update in those cases, since only chain names and set contents are held in cache for commands other than "list" Reported-by: Leon Merten Lohse <leon@green-side.de> Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Interpret OP_NEQ against a set as OP_LOOKUP	Anatole Denis	2016-11-29	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	Now that the support for inverted matching is in the kernel and in libnftnl, add it to nftables too. This fixes bug #888 Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Revert "evaluate: check for NULL datatype in rhs in lookup expr"	Anatole Denis	2016-11-29	1	-23/+8
\| \| \| \| \| \| \| \| \|	This reverts commit 5afa5a164ff1c066af1ec56d875b91562882bd50. This commit is obsoleted by removing the possibility for a NULL right->dtype in the first place, at set declaration. Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Add set to cache only when well-formed	Anatole Denis	2016-11-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When creating a set (in set_evaluate), it is added to the table cache before being checked for correctness. When the set is ill-formed, the function returns without removing the (non-existent, since the function returned) set. Further references to this set will not result in an error (since the set is in the lookup table), but the malformed set will probably cause a segfault. The symptom (the segfault) was fixed by checking for NULL when evaluating a reference to the set (commit 5afa5a164ff1c066af1ec56d875b91562882bd50), this should fix the root cause. Signed-off-by: Anatole Denis <anatole@rezel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add log flags syntax support	Liping Zhang	2016-11-24	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now NF_LOG_XXX is exposed to the userspace, we can set it explicitly. Like iptables LOG target, we can log TCP sequence numbers, TCP options, IP options, UID owning local socket and decode MAC header. Note the log flags are mutually exclusive with group. Some examples are listed below: # nft add rule t c log flags tcp sequence,options # nft add rule t c log flags ip options # nft add rule t c log flags skuid # nft add rule t c log flags ether # nft add rule t c log flags all # nft add rule t c log flags all group 1 <cmdline>:1:14-16: Error: flags and group are mutually exclusive add rule t c log flags all group 1 ^^^ Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add notrack support	Pablo Neira Ayuso	2016-11-14	1	-0/+1
\| \| \| \| \| \| \|	This patch adds the notrack statement, to skip connection tracking for certain packets. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Allow concatenation of rt nexthop etc.	Anders K. Pedersen	2016-10-31	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Concatenations of rt nexthop or ct {orignal \| reply} {saddr \| daddr} fail due to # nft add rule ip filter postrouting flow table acct \{ ip saddr . rt nexthop counter \} <cmdline>:1:61-70: Error: can not use variable sized data types (invalid) in concat expressions add rule ip filter postrouting flow table acct { ip saddr . rt nexthop counter } ~~~~~~~~~~~^^^^^^^^^^ Fix this by reordering the check for variable size data types in expr_evaluate_concat() to happen after expr_evaluate() has been called (via list_member_evaluate()) for the sub expression. This allows expr_evaluate_[cr]t() to call [cr]t_expr_update_type() and set the data type before the check. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	Replace tests/files/expr-rt with Python based tests, and replace ether type	Anders K. Pedersen	2016-10-29	1	-2/+2
\| \| \| \| \| \| \|	with meta nfproto, which generates a bit fewer instructions. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add fib expression	Florian Westphal	2016-10-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the 'fib' expression which can be used to obtain the output interface from the route table based on either source or destination address of a packet. This can be used to e.g. add reverse path filtering: # drop if not coming from the same interface packet # arrived on # nft add rule x prerouting fib saddr . iif oif eq 0 drop # accept only if from eth0 # nft add rule x prerouting fib saddr . iif oif eq "eth0" accept # accept if from any valid interface # nft add rule x prerouting fib saddr oif accept Querying of address type is also supported. This can be used to e.g. only accept packets to addresses configured in the same interface: # fib daddr . iif type local Its also possible to use mark and verdict map, e.g.: # nft add rule x prerouting meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : drop, unicast : accept } Signed-off-by: Florian Westphal <fw@strlen.de>
*	rt: introduce routing expression	Anders K. Pedersen	2016-10-28	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce rt expression for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. # nft add rule inet filter postrouting \ ether type ip \ flow table acct { rt nexthop timeout 600s counter } # nft add rule inet filter postrouting \ ether type ip6 \ flow table acct { rt nexthop timeout 600s counter } Same as above, but via the inet family, where the ether type must be specified explicitly. "rt classid" is also implemented identical to "meta rtclassid", since it is more logical to have this match in the routing expression going forward. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: display expression, statement and command name on debug	Pablo Neira Ayuso	2016-09-05	1	-3/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend debugging knob for evaluation to display the command, the expression and statement names. # nft --debug=eval add rule x y ip saddr 1.1.1.1 counter <cmdline>:1:1-37: Evaluate add add rule x y ip saddr 1.1.1.1 counter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <cmdline>:1:14-29: Evaluate expression add rule x y ip saddr 1.1.1.1 counter ^^^^^^^^^^^^^^^^ ip saddr $1.1.1.1 <cmdline>:1:14-29: Evaluate relational add rule x y ip saddr 1.1.1.1 counter ^^^^^^^^^^^^^^^^ ip saddr $1.1.1.1 <cmdline>:1:14-21: Evaluate payload add rule x y ip saddr 1.1.1.1 counter ^^^^^^^^ ip saddr <cmdline>:1:23-29: Evaluate symbol add rule x y ip saddr 1.1.1.1 counter ^^^^^^^ <cmdline>:1:23-29: Evaluate value add rule x y ip saddr 1.1.1.1 counter ^^^^^^^ 1.1.1.1 <cmdline>:1:31-37: Evaluate counter add rule x y ip saddr 1.1.1.1 counter ^^^^^^^ counter packets 0 bytes 0 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Avoid undefined behaviour in concat_subtype_id()	Phil Sutter	2016-09-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	For the left side of a concat expression, dtype is NULL and therefore off is 0. In that case the code expects to get a datatype of TYPE_INVALID, but this is fragile as the output of concat_subtype_id() is undefined for n > 32 / TYPE_BITS. To fix this, call datatype_lookup() directly passing the expected TYPE_INVALID as argument if off is 0. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: reject: Have a generic fix for missing network context	Phil Sutter	2016-09-05	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 17b495957b29e ("evaluate: reject: fix crash if we have transport protocol conflict from inet") took care of a crash when using inet or bridge families, but since then netdev family has been added which also does not implicitly define the network context. Therefore the crash can be reproduced again using the following example: nft add rule netdev filter e1000-ingress \ meta l4proto udp reject with tcp reset In order to fix this in a more generic way, have stmt_evaluate_reset() fall back to the generic proto_inet_service irrespective of the actual proto context. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: Fix datalen checks in expr_evaluate_string()	Phil Sutter	2016-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have been told that the flex scanner won't return empty strings, so strlen(data) should always be greater 0. To avoid a hard to debug issue though, add an assert() to make sure this is always the case before risking an unsigned variable underrun. A real issue though is the check for 'datalen - 1 >= 0', which will never fail due to datalen being unsigned. Fix this by incrementing both sides by one, hence checking 'datalen >= 1'. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: validate maximum hash and numgen value	Pablo Neira Ayuso	2016-08-29	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can validate that values don't get over the maximum datatype length, this is expressed in number of bits, so the maximum value is always power of 2. However, since we got the hash and numgen expressions, the user should not set a value higher that what the specified modulus option, which may not be power of 2. This patch extends the expression context with a new optional field to store the maximum value. After this patch, nft bails out if the user specifies non-sense rules like those below: # nft add rule x y jhash ip saddr mod 10 seed 0xa 10 <cmdline>:1:45-46: Error: Value 10 exceeds valid range 0-9 add rule x y jhash ip saddr mod 10 seed 0xa 10 ^^ The modulus sets a valid value range of [0, n), so n is out of the valid value range. # nft add rule x y numgen inc mod 10 eq 12 <cmdline>:1:35-36: Error: Value 12 exceeds valid range 0-9 add rule x y numgen inc mod 10 eq 12 ^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: add expr_evaluate_integer()	Pablo Neira Ayuso	2016-08-29	1	-15/+23
\| \| \| \| \| \|	Add a helper function to wrap the integer evaluation code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add hash expression	Pablo Neira Ayuso	2016-08-29	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is special expression that transforms an input expression into a 32-bit unsigned integer. This expression takes a modulus parameter to scale the result and the random seed so the hash result becomes harder to predict. You can use it to set the packet mark, eg. # nft add rule x y meta mark set jhash ip saddr . ip daddr mod 2 seed 0xdeadbeef You can combine this with maps too, eg. # nft add rule x y dnat to jhash ip saddr mod 2 seed 0xdeadbeef map { \ 0 : 192.168.20.100, \ 1 : 192.168.30.100 \ } Currently, this expression implements the jenkins hash implementation available in the Linux kernel: http://lxr.free-electrons.com/source/include/linux/jhash.h But it should be possible to extend it to support any other hash function type. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add numgen expression	Pablo Neira Ayuso	2016-08-29	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new expression allows us to generate incremental and random numbers bound to a specified modulus value. The following rule sets the conntrack mark of 0 to the first packet seen, then 1 to second packet, then 0 again to the third packet and so on: # nft add rule x y ct mark set numgen inc mod 2 A more useful example is a simple load balancing scenario, where you can also use maps to set the destination NAT address based on this new numgen expression: # nft add rule nat prerouting \ dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 } So this is distributing new connections in a round-robin fashion between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT chain semantics: Only the first packet evaluates the rule, follow up packets rely on conntrack to apply the NAT information. You can also emulate flow distribution with different backend weights using intervals: # nft add rule nat prerouting \ dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200 } So 192.168.10.100 gets 60% of the workload, while 192.168.20.200 gets 40%. We can also be mixed with dynamic sets, thus weight can be updated in runtime. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: add quota statement	Pablo Neira Ayuso	2016-08-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This new statement is stateful, so it can be used from flow tables, eg. # nft add rule filter input \ flow table http { ip saddr timeout 60s quota over 50 mbytes } drop This basically sets a quota per source IP address of 50 mbytes after which packets are dropped. Note that the timeout releases the entry if no traffic is seen from this IP after 60 seconds. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	src: Simplify parser rule_spec tree	Carlos Falgueras García	2016-08-23	1	-67/+1
\| \| \| \| \| \| \| \| \| \|	This patch separates the rule identification from the rule localization, so the logic moves from the evaluator to the parser. This allows to revert the patch "evaluate: improve rule managment checks" (4176c7d30c2ff1b3f52468fc9c08b8df83f979a8) and saves a lot of code. Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netlink: make checksum fixup work with odd-sized header fields	Florian Westphal	2016-08-01	1	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel checksum functions want even-sized lengths except for the last block at the end of the data. This means that nft --debug=netlink add rule filter output ip ecn set 1 must generate a two byte read and a two byte write: [ payload load 2b @ network header + 0 => reg 1 ] [ bitwise reg 1 = (reg=1 & 0x0000fcff ) ^ 0x00000100 ] [ payload write reg 1 => 2b @ network header + 0 csum_type 1 csum_off 10 ] Otherwise, while a one-byte write is enough, the kernel will generate invalid checksums (unless checksum is offloaded). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: add support to set IPv6 non-byte header fields	Florian Westphal	2016-08-01	1	-4/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'ip6 ecn set 1' will generate a zero-sized write operation. Just like when matching on bit-sized header fields we need to round up to a byte-sized quantity and add a mask to retain those bits outside of the header bits that we want to change. Example: ip6 ecn set ce [ payload load 1b @ network header + 1 => reg 1 ] [ bitwise reg 1 = (reg=1 & 0x000000cf ) ^ 0x00000030 ] [ payload write reg 1 => 1b @ network header + 1 csum_type 0 csum_off 0 ] 1. Load the full byte containing the ecn bits 2. Mask out everything BUT the ecn bits 3. Set the CE mark This patch only works if the protocol doesn't need a checksum fixup. Will address this in a followup patch. This also doesn't yet include the needed reverse translation. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	evaluate: add small helper to check if payload expr needs binop adjustment	Florian Westphal	2016-08-01	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \|	kernel can only deal with byte-sized and byte-aligned payload expressions. If the payload expression doesn't fit this requirement userspace has to add explicit binop masks to remove the unwanted part(s). Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
*	src: add xt compat support	Pablo Neira Ayuso	2016-07-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At compilation time, you have to pass this option. # ./configure --with-xtables And libxtables needs to be installed in your system. This patch allows to list a ruleset containing xt extensions loaded through iptables-compat-restore tool. Example: $ iptables-save > ruleset $ cat ruleset *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -p tcp -m multiport --dports 80,81 -j REJECT COMMIT $ sudo iptables-compat-restore ruleset $ sudo nft list rulseset table ip filter { chain INPUT { type filter hook input priority 0; policy accept; ip protocol tcp tcp dport { 80,81} counter packets 0 bytes 0 reject } chain FORWARD { type filter hook forward priority 0; policy drop; } chain OUTPUT { type filter hook output priority 0; policy accept; } } A translation of the extension is shown if this is available. In other case, match or target definition is preceded by a hash. For example, classify target has not translation: $ sudo nft list chain mangle POSTROUTING table ip mangle { chain POSTROUTING { type filter hook postrouting priority -150; policy accept; ip protocol tcp tcp dport 80 counter packets 0 bytes 0 # CLASSIFY set 20:10 ^^^ } } If the whole ruleset is translatable, the users can (re)load it using "nft -f" and get nft native support for all their rules. This patch is joint work by the authors listed below. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>