summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* cache: add hashtable cache for setsPablo Neira Ayuso2021-04-037-72/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a hashtable for set lookups. This patch also splits table->sets in two: - Sets that reside in the cache are stored in the new tables->cache_set and tables->cache_set_ht. - Set that defined via command line / ruleset file reside in tables->set. Sets in the cache (already in the kernel) are not placed in the table->sets list. By keeping separated lists, sets defined via command line / ruleset file can be added to cache. Adding 10000 sets, before: # time nft -f x real 0m6,415s user 0m3,126s sys 0m3,284s After: # time nft -f x real 0m3,949s user 0m0,743s sys 0m3,205s Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: check for NULL chain in cache_init()Pablo Neira Ayuso2021-04-031-0/+5
| | | | | | | | | | | | | Another process might race to add chains after chain_cache_init(). The generation check does not help since it comes after cache_init(). NLM_F_DUMP_INTR only guarantees consistency within one single netlink dump operation, so it does not help either (cache population requires several netlink dump commands). Let's be safe and do not assume the chain exists in the cache when populating the rule cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: statify chain_cache_dump()Pablo Neira Ayuso2021-04-031-1/+2
| | | | | | Only used internally in cache.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: use chain hashtable for lookupsPablo Neira Ayuso2021-04-033-17/+6
| | | | | | | | | | | | | | | | | | | | | Instead of the linear list lookup. Before this patch: real 0m21,735s user 0m20,329s sys 0m1,384s After: real 0m10,910s user 0m9,448s sys 0m1,434s chain_lookup() is removed since linear list lookups are only used by the fuzzy chain name matching for error reporting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: split chain list in tablePablo Neira Ayuso2021-04-033-13/+17
| | | | | | | | | | | | | | | | | | | This patch splits table->lists in two: - Chains that reside in the cache are stored in the new tables->cache_chain and tables->cache_chain_ht. The hashtable chain cache allows for fast chain lookups. - Chains that defined via command line / ruleset file reside in tables->chains. Note that chains in the cache (already in the kernel) are not placed in the table->chains. By keeping separated lists, chains defined via command line / ruleset file can be added to cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: rename chain_htable to cache_chain_htPablo Neira Ayuso2021-04-032-6/+6
| | | | | | Rename the hashtable chain that is used for fast cache lookups. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* proto: replace vlan ether type with 8021qFlorian Westphal2021-04-032-1/+5
| | | | | | | | | | | | | Previous patches added "8021ad" mnemonic for IEEE 802.1AD frame type. This adds the 8021q shorthand for the existing 'vlan' frame type. nft will continue to recognize 'ether type vlan', but listing will now print 8021q. Adjust all test cases accordingly. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* payload: be careful on vlan dependency removalFlorian Westphal2021-04-031-3/+26
| | | | | | | 'vlan ...' implies 8021Q frame. In case the expression tests something else (802.1AD for example) its not an implictly added one, so keep it. Signed-off-by: Florian Westphal <fw@strlen.de>
* proto: add 8021ad as mnemonic for IEEE 802.1AD (0x88a8) ether typeFlorian Westphal2021-04-032-0/+2
| | | | | Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: vlan: allow matching vlan id insider 802.1ad frameFlorian Westphal2021-04-031-0/+3
| | | | | | | | | This makes "ether type 0x88a8 vlan id 342" work. Before this change, nft would still insert a dependency on 802.1q so the rule would never match. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink: don't crash when set elements are not evaluated as expectedFlorian Westphal2021-04-012-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | define foo = 2001:db8:123::/48 table inet filter { set foo { typeof ip6 saddr elements = $foo } } gives crash. This now exits with: stdin:1:14-30: Error: Unexpected initial set type prefix define foo = 2001:db8:123::/48 ^^^^^^^^^^^^^^^^^ For literals, bison parser protects us, as it enforces 'elements = { 2001:... '. For 'elements = $foo' we can't detect it at parsing stage as the '$foo' symbol might as well evaluate to "{ 2001, ...}" (i.e. we can't do a set element allocation). So at least detect this from set instantiaton. Signed-off-by: Florian Westphal <fw@strlen.de>
* parser_bison: simplify flowtable offload flag parserPablo Neira Ayuso2021-03-311-7/+4
| | | | | | Remove ft_flags_spec rule. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: do not set flowtable flags twicePablo Neira Ayuso2021-03-311-5/+0
| | | | | | | | Flags are already set on from mnl_nft_flowtable_add(), remove duplicated code. Fixes: e6cc9f37385 ("nftables: add flags offload to flowtable") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* rule: remove semicolon in flowtable offloadPablo Neira Ayuso2021-03-251-1/+1
| | | | | | opts->stmt_separator already prints the semicolon when needed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: fix scope closure of COUNTER tokenFlorian Westphal2021-03-251-3/+3
| | | | | | | | | It is closed after allocation, which is too early: this stopped 'packets' and 'bytes' from getting parsed correctly. Also add a test case for this. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add datatype->describe()Pablo Neira Ayuso2021-03-252-0/+17
| | | | | | | | | | | | As an alternative to print the datatype values when no symbol table is available. Use it to print protocols available via getprotobynumber() which actually refers to /etc/protocols. Not very efficient, getprotobynumber() causes a series of open()/close() calls on /etc/protocols, but this is called from a non-critical path. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1503 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* nftables: add flags offload to flowtableFrank Wunderlich2021-03-254-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allow flags (currently only offload) in flowtables like it is stated here: https://lwn.net/Articles/804384/ tested on mt7622/Bananapi-R64 table ip filter { flowtable f { hook ingress priority filter + 1 devices = { lan3, lan0, wan } flags offload; } chain forward { type filter hook forward priority filter; policy accept; ip protocol { tcp, udp } flow add @f } } table ip nat { chain post { type nat hook postrouting priority filter; policy accept; oifname "wan" masquerade } } Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* segtree: release single element already contained in an intervalPablo Neira Ayuso2021-03-241-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch: table ip x { chain y { ip saddr { 1.1.1.1-1.1.1.2, 1.1.1.1 } } } results in: table ip x { chain y { ip saddr { 1.1.1.1 } } } due to incorrect interval merge logic. If the element 1.1.1.1 is already contained in an existing interval 1.1.1.1-1.1.1.2, release it. Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1512 Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: add missing scope_close annotation for RT keywordFlorian Westphal2021-03-241-1/+1
| | | | Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: log: move to own scopeFlorian Westphal2021-03-242-5/+11
| | | | | | | GROUP and PREFIX are used by igmp and nat, so they can't be moved out of INITIAL scope yet. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: counter: move to own scopeFlorian Westphal2021-03-242-18/+20
| | | | | | move bytes/packets away from initial state. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: add support for scope nestingFlorian Westphal2021-03-241-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a COUNTER scope introduces parsing errors. Example: add rule ... counter ip saddr 1.2.3.4 This is supposed to be COUNTER IP SADDR SYMBOL but it will be parsed as COUNTER IP STRING SYMBOL ... and rule fails with unknown saddr. This is because IP state change gets popped right after it was pushed. bison parser invokes scanner_pop_start_cond() helper via 'close_scope_counter' rule after it has processed the entire 'counter' rule. But that happens *after* flex has executed the 'IP' rule. IOW, the sequence of events is not the exepcted "COUNTER close_scope_counter IP SADDR SYMBOL close_scope_ip", it is "COUNTER IP close_scope_counter". close_scope_counter pops the just-pushed SCANSTATE_IP and returns the scanner to SCANSTATE_COUNTER, so next input token (saddr) gets parsed as a string, which gets then rejected from bison. To resolve this, defer the pop operation until the current state is done. scanner_pop_start_cond() already gets the scope that it has been completed as an argument, so we can compare it to the active state. If those are not the same, just defer the pop operation until the bison reports its done with the active flex scope. This leads to following sequence of events: 1. flex switches to SCANSTATE_COUNTER 2. flex switches to SCANSTATE_IP 3. bison calls scanner_pop_start_cond(SCANSTATE_COUNTER) 4. flex remains in SCANSTATE_IP, bison continues 5. bison calls scanner_pop_start_cond(SCANSTATE_IP) once the entire ip rule has completed: this pops both IP and COUNTER. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: avoid -fasan heap overflow warningsFlorian Westphal2021-03-181-1/+1
| | | | | Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: secmark: move to own scopeFlorian Westphal2021-03-162-10/+12
| | | | Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: move until,over,used keywords away from init stateFlorian Westphal2021-03-161-3/+5
| | | | | | Only applicable for limit and quota. "ct count" also needs 'over'. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: quota: move to own scopeFlorian Westphal2021-03-162-12/+14
| | | | | | ... and move "used" keyword to it. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: limit: move to own scopeFlorian Westphal2021-03-162-15/+19
| | | | | | Moves rate and burst out of INITIAL. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: vlan: move to own scopeFlorian Westphal2021-03-162-5/+9
| | | | | | ID needs to remain exposed as its used by ct, icmp, icmp6 and so on. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: remove saddr/daddr from initial stateFlorian Westphal2021-03-161-2/+4
| | | | | | This can now be reduced to expressions that can expect saddr/daddr tokens. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: arp: move to own scopeFlorian Westphal2021-03-162-9/+13
| | | | | | allows to move the arp specific tokens out of the INITIAL scope. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: add ether scopeFlorian Westphal2021-03-162-6/+8
| | | | | | | just like previous change: useless as-is, but prepares for removal of saddr/daddr from INITIAL scope. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: add fib scopeFlorian Westphal2021-03-162-2/+4
| | | | | | | | | makes no sense as-is because all keywords need to stay in the INITIAL scope. This can be changed after all saddr/daddr users have been scoped. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: ip6: move to own scopeFlorian Westphal2021-03-162-13/+17
| | | | | | move flowlabel and hoplimit. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: ip: move to own scopeFlorian Westphal2021-03-162-18/+22
| | | | | | Move the ip option names (rr, lsrr, ...) out of INITIAL scope. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: ct: move to own scopeFlorian Westphal2021-03-162-38/+42
| | | | | | | | | | | | This allows moving multiple ct specific keywords out of INITIAL scope. Next few patches follow same pattern: 1. add a scope_close_XXX rule 2. add a SCANSTATE_XXX & make flex switch to it when encountering XXX keyword 3. make bison leave SCANSTATE_XXXX when it has seen the complete expression. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: move remaining cache functions in rule.c to cache.cPablo Neira Ayuso2021-03-112-203/+205
| | | | | | Move all the cache logic to src/cache.c Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* scanner: socket: move to own scopeFlorian Westphal2021-03-112-5/+8
| | | | Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: rt: move to own scopeFlorian Westphal2021-03-112-6/+10
| | | | | | | | classid and nexthop can be moved out of INIT scope. Rest are still needed because tehy are used by other expressions as well. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: ipsec: move to own scopeFlorian Westphal2021-03-112-9/+13
| | | | | | ... and hide the ipsec specific tokens from the INITITAL scope. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: queue: move to own scopeFlorian Westphal2021-03-112-7/+10
| | | | | | allows to remove 3 queue specific keywords from INITIAL scope. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: introduce start condition stackFlorian Westphal2021-03-112-11/+36
| | | | | | | | | | | | | | | | | | | | Add a small initial chunk of flex start conditionals. This starts with two low-hanging fruits, numgen and j/symhash. NUMGEN and HASH start conditions are entered from flex when the corresponding expression token is encountered. Flex returns to the INIT condition when the bison parser has seen a complete numgen/hash statement. This intentionally uses a stack rather than BEGIN() to eventually support nested states. The scanner_pop_start_cond() function argument is not used yet, but will need to be used later to deal with nesting. Signed-off-by: Florian Westphal <fw@strlen.de>
* scanner: remove unused tokensFlorian Westphal2021-03-092-12/+0
| | | | Signed-off-by: Florian Westphal <fw@strlen.de>
* nftables: xt: fix misprint in nft_xt_compatible_revisionPavel Tikhomirov2021-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rev variable is used here instead of opt obviously by mistake. Please see iptables:nft_compatible_revision() for an example how it should be. This breaks revision compatibility checks completely when reading compat-target rules from nft utility. That's why nftables can't work on "old" kernels which don't support new revisons. That's a problem for containers. E.g.: 0 and 1 is supported but not 2: https://git.sw.ru/projects/VZS/repos/vzkernel/browse/net/netfilter/xt_nat.c#111 Reproduce of the problem on Virtuozzo 7 kernel 3.10.0-1160.11.1.vz7.172.18 in centos 8 container: iptables-nft -t nat -N TEST iptables-nft -t nat -A TEST -j DNAT --to-destination 172.19.0.2 nft list ruleset > nft.ruleset nft -f - < nft.ruleset #/dev/stdin:19:67-81: Error: Range has zero or negative size # meta l4proto tcp tcp dport 81 counter packets 0 bytes 0 dnat to 3.0.0.0-0.0.0.0 # ^^^^^^^^^^^^^^^ nft -v #nftables v0.9.3 (Topsy) iptables-nft -v #iptables v1.8.7 (nf_tables) Kernel returns ip range in rev 0 format: crash> p *((struct nf_nat_ipv4_multi_range_compat *) 0xffff8ca2fabb3068) $5 = { rangesize = 1, range = {{ flags = 3, min_ip = 33559468, max_ip = 33559468, But nft reads this as rev 2 format (nf_nat_range2) which does not have rangesize, and thus flugs 3 is treated as ip 3.0.0.0, which is wrong and can't be restored later. (Should probably be the same on Centos 7 kernel 3.10.0-1160.11.1) Fixes: fbc0768cb696 ("nftables: xt: don't use hard-coded AF_INET") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: Set NFTNL_SET_DATA_TYPE before dumping set elementsPhil Sutter2021-03-091-0/+3
| | | | | | | | In combination with libnftnl's commit "set_elem: Fix printing of verdict map elements", This adds the vmap target to netlink dumps. Adjust dumps in tests/py accordingly. Signed-off-by: Phil Sutter <phil@nwl.cc>
* parser: compact ct obj list typesFlorian Westphal2021-03-061-11/+8
| | | | | | Add new ct_cmd_type and avoid copypaste of the ct cmd_list rules. Signed-off-by: Florian Westphal <fw@strlen.de>
* parser: compact map RHS typeFlorian Westphal2021-03-061-29/+9
| | | | | | Similar to previous patch, we can avoid duplication. Signed-off-by: Florian Westphal <fw@strlen.de>
* parser: squash duplicated spec/specid rulesFlorian Westphal2021-03-061-44/+38
| | | | | | | No need to have duplicate CMD rules for spec and specid: add and use a common rule for those cases. Signed-off-by: Florian Westphal <fw@strlen.de>
* expression: memleak in verdict_expr_parse_udata()Pablo Neira Ayuso2021-03-051-1/+1
| | | | | | | Remove unnecessary verdict_expr_alloc() invocation. Fixes: 4ab1e5e60779 ("src: allow use of 'verdict' in typeof definitions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: memleak list of chainPablo Neira Ayuso2021-03-051-13/+26
| | | | | | Release chain list from the error path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: remove nft_mnl_socket_reopen()Pablo Neira Ayuso2021-03-052-16/+19
| | | | | | | | | | | | | | nft_mnl_socket_reopen() was introduced to deal with the EINTR case. By reopening the netlink socket, pending netlink messages that are part of a stale netlink dump are implicitly drop. This patch replaces the nft_mnl_socket_reopen() strategy by pulling out all of the remaining netlink message to restart in a clean state. This is implicitly fixing up a bug in the table ownership support, which assumes that the netlink socket remains open until nft_ctx_free() is invoked. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>