summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* cache: Support filtering for a specific flowtablePhil Sutter2021-12-033-18/+54
| | | | | | | | | | Extend nft_cache_filter to hold a flowtable name so 'list flowtable' command causes fetching the requested flowtable only. Dump flowtables just once instead of for each table, merely assign fetched data to tables inside the loop. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cache: Filter set list on server sidePhil Sutter2021-12-032-30/+48
| | | | | | | | | Fetch either all tables' sets at once, a specific table's sets or even a specific set if needed instead of iterating over the list of previously fetched tables and fetching for each, then ignoring anything returned that doesn't match the filter. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cache: Filter chain list on kernel sidePhil Sutter2021-12-032-22/+37
| | | | | | | | | | | | | When operating on a specific chain, add payload to NFT_MSG_GETCHAIN so kernel returns only relevant data. Since ENOENT is an expected return code, do not treat this as error. While being at it, improve code in chain_cache_cb() a bit: - Check chain's family first, it is a less expensive check than comparing table names. - Do not extract chain name of uninteresting chains. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cache: Filter rule list on kernel sidePhil Sutter2021-12-032-22/+22
| | | | | | | | | | | Instead of fetching all existing rules in kernel's ruleset and filtering in user space, add payload to the dump request specifying the table and chain to filter for. Since list_rule_cb() no longer needs the filter, pass only netlink_ctx to the callback and drop struct rule_cache_dump_ctx. Signed-off-by: Phil Sutter <phil@nwl.cc>
* cache: Filter tables on kernel sidePhil Sutter2021-12-033-13/+30
| | | | | | | | | | | Instead of requesting a dump of all tables and filtering the data in user space, construct a non-dump request if filter contains a table so kernel returns only that single table. This should improve nft performance in rulesets with many tables present. Signed-off-by: Phil Sutter <phil@nwl.cc>
* exthdr: fix tcpopt_find_template to use length after mask adjustmentFlorian Westphal2021-12-012-28/+25
| | | | | | | | | | | | | | | | | | | | | | Unify binop handling for ipv6 extension header, ip option and tcp option processing. Pass the real offset and length expected, not the one used in the kernel. This was already done for extension headers and ip options, but tcp option parsing did not do this. This was fine before because no existing tcp option template had a non-byte sized member. With mptcp addition this isn't the case anymore, subtype field is only 4 bits wide, but tcp option delinearization passed 8bits instead. Pass the offset and mask delta, just like ip option/ipv6 exthdr. This makes nft show 'tcp option mptcp subtype 1' instead of 'tcp option mptcp unknown & 240 == 16'. Signed-off-by: Florian Westphal <fw@strlen.de>
* mptcp: add subtype matchingFlorian Westphal2021-12-013-1/+12
| | | | | | | | | | | | | | | | | | | | | | MPTCP multiplexes the various mptcp signalling data using the first 4 bits of the mptcp option. This allows to match on the mptcp subtype via: tcp option mptcp subtype 1 This misses delinearization support. mptcp subtype is the first tcp option field that has a length of less than one byte. Serialization processing will add a binop for this, but netlink delinearization can't remove them, yet. Also misses a new datatype/symbol table to allow to use mnemonics like 'mp_join' instead of raw numbers. For this reason, no tests are added yet. Signed-off-by: Florian Westphal <fw@strlen.de>
* tcpopt: add md5sig, fastopen and mptcp optionsFlorian Westphal2021-12-013-2/+41
| | | | | | | | | Allow to use "fastopen", "md5sig" and "mptcp" mnemonics rather than the raw option numbers. These new keywords are only recognized while scanner is in tcp state. Signed-off-by: Florian Westphal <fw@strlen.de>
* parser: split tcp option rulesFlorian Westphal2021-12-011-19/+61
| | | | | | | | | | | | | | | At this time the parser will accept nonsensical input like tcp option mss left 2 which will be treated as 'tcp option maxseg size 2'. This is because the enum space overlaps. Split the rules so that 'tcp option mss' will only accept field names specific to the mss/maxseg option kind. Signed-off-by: Florian Westphal <fw@strlen.de> (cherry picked from commit 46168852c03d73c29b557c93029dc512ca6e233a)
* scanner: add tcp flex scopeFlorian Westphal2021-12-012-11/+17
| | | | | | | | This moves tcp options not used anywhere else (e.g. in synproxy) to a distinct scope. This will also allow to avoid exposing new option keywords in the ruleset context. Signed-off-by: Florian Westphal <fw@strlen.de>
* tcpopt: remove KIND keywordFlorian Westphal2021-12-012-4/+1
| | | | | | | | | | | | | | | | tcp option <foo> kind ... never makes any sense, as "tcp option <foo>" already tells the kernel to look for the foo <kind>. "tcp option sack kind 5" matches if the sack option is present; its a more complicated form of the simpler "tcp option sack exists". "tcp option sack kind 1" (or any other value than 5) will never match. So remove this. Test cases are converted to "exists". Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: binop: make accesses to expr->left/right conditionalFlorian Westphal2021-12-011-19/+31
| | | | | | | | | | | This function can be called for different expression types, including some (EXPR_MAP) where expr->left/right alias to different member variables. This makes accesses to those members conditional by checking the expression type ahead of the access. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: rename misleading variableFlorian Westphal2021-12-011-12/+12
| | | | | | | | | | | | relational_binop_postprocess() is called for EXPR_RELATIONAL, so "expr->right" is safe to use. But the RHS can be something other than a value. This has been extended to handle other types, so rename to 'right'. No code changes intended. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_delinearize: use correct member typeFlorian Westphal2021-12-011-1/+1
| | | | | | | expr is a map, so this should use expr->map, not expr->left. These fields are aliased, so this would break if that is ever changed. Signed-off-by: Florian Westphal <fw@strlen.de>
* cli: save history on ctrl-d with editlinePablo Neira Ayuso2021-11-301-7/+14
| | | | | | | | | | | Missing call to cli_exit() to save the history when ctrl-d is pressed in nft -i. Moreover, remove call to rl_callback_handler_remove() in cli_exit() for editline cli since it does not call rl_callback_handler_install(). Fixes: bc2d5f79c2ea ("cli: use plain readline() interface with libedit") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_delinearize: Fix for escaped asterisk strings on Big EndianPhil Sutter2021-11-301-40/+17
| | | | | | | | | | | The original nul-char detection was not functional on Big Endian. Instead, go a simpler route by exporting the string and working on the exported data to check for a nul-char and escape a trailing asterisk if present. With the data export already happening in the caller, fold escaped_string_wildcard_expr_alloc() into it as well. Fixes: b851ba4731d9f ("src: add interface wildcard matching") Signed-off-by: Phil Sutter <phil@nwl.cc>
* ct: Fix ct label value parserPhil Sutter2021-11-301-3/+2
| | | | | | | | Size of array to export the bit value into was eight times too large, so on Big Endian the data written into the data reg was always zero. Fixes: 2fcce8b0677b3 ("ct: connlabel matching support") Signed-off-by: Phil Sutter <phil@nwl.cc>
* datatype: Fix size of time_typePhil Sutter2021-11-301-2/+4
| | | | | | | | | Used by 'ct expiration', time_type is supposed to be 32bits. Passing a 64bits variable to constant_expr_alloc() causes the value to be always zero on Big Endian. Fixes: 0974fa84f162a ("datatype: seperate time parsing/printing from time_type") Signed-off-by: Phil Sutter <phil@nwl.cc>
* meta: Fix hour_type sizePhil Sutter2021-11-301-4/+5
| | | | | | | | | | | | In kernel as well as when parsing, hour_type is assumed to be 32bits. Having the struct datatype field set to 64bits breaks Big Endian and so does passing a 64bit value and 32 as length to constant_expr_alloc() as it makes it import the upper 32bits. Fix this by turning 'result' into a uint32_t and introduce a temporary uint64_t just for the call to time_parse() which expects that. Fixes: f8f32deda31df ("meta: Introduce new conditions 'time', 'day' and 'hour'") Signed-off-by: Phil Sutter <phil@nwl.cc>
* meta: Fix {g,u}id_type on Big EndianPhil Sutter2021-11-301-6/+10
| | | | | | | | | | | | Using a 64bit variable to temporarily hold the parsed value works only on Little Endian. uid_t and gid_t (and therefore also pw->pw_uid and gr->gr_gid) are 32bit. To fix this, use uid_t/gid_t for the temporary variable but keep the 64bit one for numeric parsing so values exceeding 32bits are still detected. Fixes: e0ed4c45d9ad2 ("meta: relax restriction on UID/GID parsing") Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: Fix payload statement mask on Big EndianPhil Sutter2021-11-301-2/+2
| | | | | | | | The mask used to select bits to keep must be exported in the same byteorder as the payload statement itself, also the length of the exported data must match the number of bytes extracted earlier. Signed-off-by: Phil Sutter <phil@nwl.cc>
* mnl: Fix for missing info in rule dumpsPhil Sutter2021-11-301-1/+12
| | | | | | | | | | | Commit 0e52cab1e64ab improved error reporting by adding rule's table and chain names to netlink message directly, prefixed by their location info. This in turn caused netlink dumps of the rule to not contain table and chain name anymore. Fix this by inserting the missing info before dumping and remove it afterwards to not cause duplicated entries in netlink message. Signed-off-by: Phil Sutter <phil@nwl.cc>
* exthdr: Fix for segfault with unknown exthdrPhil Sutter2021-11-301-5/+7
| | | | | | | | | Unknown exthdr type with NFT_EXTHDR_F_PRESENT flag set caused NULL-pointer deref. Fix this by moving the conditional exthdr.desc deref atop the function and use the result in all cases. Fixes: e02bd59c4009b ("exthdr: Implement existence check") Signed-off-by: Phil Sutter <phil@nwl.cc>
* exthdr: fix type number saved in udataFlorian Westphal2021-11-301-3/+1
| | | | | | | | | This should store the index of the protocol template, but &x[i] - &x[0] is always i, so remove the divide. Also add test case. Fixes: 01fbc1574b9e ("exthdr: add parse and build userdata interface") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc>
* cli: remove #include <editline/history.h>Pablo Neira Ayuso2021-11-221-1/+0
| | | | | | | | This header is not required to compile nftables with editline, remove it, this unbreak compilation in several distros which have no symlink from history.h to editline.h Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: different signedness compilation warningPablo Neira Ayuso2021-11-191-1/+1
| | | | | | | | | mnl.c: In function ‘mnl_batch_talk’: mnl.c:417:17: warning: comparison of integer expressions of different signedness: ‘unsigned in’ and ‘long int’ [-Wsign-compare] if (rcvbufsiz < NFT_MNL_ECHO_RCVBUFF_DEFAULT) ^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: do not skip populating anonymous set with -tPablo Neira Ayuso2021-11-181-4/+7
| | | | | | | | | | | | | | | | | | | --terse does not apply to anonymous set, add a NFT_CACHE_TERSE bit to skip named sets only. Moreover, prioritize specific listing filter over --terse to avoid a bogus: netlink: Error: Unknown set '__set0' in lookup expression when invoking: # nft -ta list set inet filter example Extend existing test to improve coverage. Fixes: 9628d52e46ac ("cache: disable NFT_CACHE_SETELEM_BIT on --terse listing only") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* monitor: do not call interval_map_decompose() for concat intervalsFlorian Westphal2021-11-171-1/+6
| | | | | | | | | | | | | | Without this, nft monitor will either print garbage or even segfault when encountering a concat set because we pass expr->value to libgmp helpers for concat (non-value) expressions. Also, for concat case, we need to call concat_range_aggregate() helper. Add a test case for this. Without this patch, it gives: tests/monitor/run-tests.sh: line 98: 1163 Segmentation fault (core dumped) $nft -nn -e -f $command_file > $echo_output Signed-off-by: Florian Westphal <fw@strlen.de>
* parser_json: add raw payload inner header match supportPablo Neira Ayuso2021-11-171-0/+2
| | | | | | | Add missing "ih" base raw payload and extend tests/py to cover this new usecase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: allow for string raw payload basePablo Neira Ayuso2021-11-162-3/+11
| | | | | | | Remove new 'ih' token, allow to represent the raw payload base with a string instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: filter out rules by chainPablo Neira Ayuso2021-11-112-45/+81
| | | | | | | | | | | | | | | | | | | | With an autogenerated ruleset with ~20k chains. # time nft list ruleset &> /dev/null real 0m1,712s user 0m1,258s sys 0m0,454s Speed up listing of a specific chain: # time nft list chain nat MWDG-UGR-234PNG3YBUOTS5QD &> /dev/null real 0m0,542s user 0m0,251s sys 0m0,292s Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: missing family in cache filteringPablo Neira Ayuso2021-11-111-4/+8
| | | | | | | | Check family when filtering out listing of tables and sets. Fixes: 3f1d3912c3a6 ("cache: filter out tables that are not requested") Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: do not populate cache if it is going to be flushedPablo Neira Ayuso2021-11-112-5/+77
| | | | | | | Skip set element netlink dump if set is flushed, this speeds up set flush + add element operation in a batch file for an existing set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: move list filter under structPablo Neira Ayuso2021-11-111-11/+11
| | | | | | | Wrap the table and set fields for list filtering to prepare for the introduction element filters. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* doc: update ct timeout section with the state namesFlorian Westphal2021-11-081-1/+1
| | | | | | | | docs are too terse and did not have the list of valid timeout states. While at it, adjust default stream timeout of udp to 120, this is the current kernel default. Signed-off-by: Florian Westphal <fw@strlen.de>
* evaluate: grab reference in set expression evaluationPablo Neira Ayuso2021-11-081-2/+2
| | | | | | | Do not clone expression when evaluation a set expression, grabbing the reference counter to reuse the object is sufficient. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: clone variable expression if there is more than one referencePablo Neira Ayuso2021-11-081-1/+10
| | | | | | | | | Clone the expression that defines the variable value if there are multiple references to it in the ruleset. This saves heap memory consumption in case the variable defines a set with a huge number of elements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* mnl: do not build nftnl_set element listPablo Neira Ayuso2021-11-082-25/+91
| | | | | | | | | | | | Do not call alloc_setelem_cache() to build the set element list in nftnl_set. Instead, translate one single set element expression to nftnl_set_elem object at a time and use this object to build the netlink header. Using a huge test set containing 1.1 million element blocklist, this patch is reducing userspace memory consumption by 40%. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: raw payload match and mangle on inner header / payload dataPablo Neira Ayuso2021-11-085-2/+11
| | | | | | | | | | | | | | | This patch adds support to match on inner header / payload data: # nft add rule x y @ih,32,32 0x14000000 counter you can also mangle payload data: # nft add rule x y @ih,32,32 set 0x14000000 counter This update triggers a checksum update at the layer 4 header via csum_flags, mangling odd bytes is also aligned to 16-bits. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* datatype: add xinteger_type alias to print in hexadecimalPablo Neira Ayuso2021-11-032-1/+17
| | | | | | | | | Add an alias of the integer type to print raw payload expressions in hexadecimal. Update tests/py. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* evaluate: postpone transport protocol match check after nat expression ↵Pablo Neira Ayuso2021-11-031-6/+7
| | | | | | | | | evaluation Fix bogus error report when using transport protocol as map key. Fixes: 50780456a01a ("evaluate: check for missing transport protocol match in nat map with concatenations") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: extend limit syntaxJeremy Sowden2021-11-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation describes the syntax of limit statements thus: limit rate [over] packet_number / TIME_UNIT [burst packet_number packets] limit rate [over] byte_number BYTE_UNIT / TIME_UNIT [burst byte_number BYTE_UNIT] TIME_UNIT := second | minute | hour | day BYTE_UNIT := bytes | kbytes | mbytes From this one might infer that a limit may be specified by any of the following: limit rate 1048576/second limit rate 1048576 mbytes/second limit rate 1048576 / second limit rate 1048576 mbytes / second However, the last does not currently parse: $ sudo /usr/sbin/nft add filter input limit rate 1048576 mbytes / second Error: wrong rate format add filter input limit rate 1048576 mbytes / second ^^^^^^^^^^^^^^^^^^^^^^^^^ Extend the `limit_rate_bytes` parser rule to support it, and add some new Python test-cases. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: add `limit_rate_pkts` and `limit_rate_bytes` rulesJeremy Sowden2021-11-031-62/+59
| | | | | | | | | Factor the `N / time-unit` and `N byte-unit / time-unit` expressions from limit expressions out into separate `limit_rate_pkts` and `limit_rate_bytes` rules respectively. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* parser: add new `limit_bytes` ruleJeremy Sowden2021-11-031-6/+9
| | | | | | | | Refactor the `N byte-unit` expression out of the `limit_bytes_burst` rule into a separate `limit_bytes` rule. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Support netdev egress hookLukas Wunner2021-10-282-0/+5
| | | | | | | | | Add userspace support for the netdev egress hook which is queued up for v5.16-rc1, complete with documentation and tests. Usage is identical to the ingress hook. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: disable NFT_CACHE_SETELEM_BIT on --terse listing onlyPablo Neira Ayuso2021-10-281-2/+2
| | | | | | | Instead of NFT_CACHE_SETELEM which also disables set dump. Fixes: 6bcd0d576a60 ("cache: unset NFT_CACHE_SETELEM with --terse listing") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: ensure evaluate_cache_list flags are set correctlyChris Arges2021-10-271-0/+1
| | | | | | | | | This change ensures that when listing rulesets with the terse flag that the terse flag is maintained. Fixes: 6bcd0d576a60 ("cache: unset NFT_CACHE_SETELEM with --terse listing") Signed-off-by: Chris Arges <carges@cloudflare.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: honor table in set filteringPablo Neira Ayuso2021-10-271-1/+2
| | | | | | | | Check if table mismatch, in case the same set name is used in different tables. Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: honor filter in set listing commandsPablo Neira Ayuso2021-10-271-0/+2
| | | | | | | | Fetch table, set and set elements only for set listing commands, e.g. nft list set inet filter ipv4_bogons. Fixes: 635ee1cad8aa ("cache: filter out sets and maps that are not requested") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* cache: always set on NFT_CACHE_REFRESH for listingPablo Neira Ayuso2021-10-271-6/+7
| | | | | | | | This flag forces a refresh of the cache on list commands, several object types are missing this flag, this fixes nft --interactive mode. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>