summaryrefslogtreecommitdiffstats
path: root/src/netlink_linearize.c
Commit message (Collapse)AuthorAgeFilesLines
* netlink_linearize: add assertion to catch for buggy byteorderPablo Neira Ayuso2024-02-091-0/+2
| | | | | | | Add assertion to catch buggy bytecode where unary expression is present with 1-byte selectors, where no byteorder conversion is required. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: do not allow to chain more than 16 binopsFlorian Westphal2023-12-221-2/+5
| | | | | | | | | | | | | | | | | | | | | netlink_linearize.c has never supported more than 16 chained binops. Adding more is possible but overwrites the stack in netlink_gen_bitwise(). Add a recursion counter to catch this at eval stage. Its not enough to just abort once the counter hits NFT_MAX_EXPR_RECURSION. This is because there are valid test cases that exceed this. For example, evaluation of 1 | 2 will merge the constans, so even if there are a dozen recursive eval calls this will not end up with large binop chain post-evaluation. v2: allow more than 16 binops iff the evaluation function did constant-merging. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink: don't crash if prefix for < byte is requestedFlorian Westphal2023-12-141-1/+2
| | | | | | | | | | | | | | If prefix is used with a datatype that has less than 8 bits an assertion is triggered: src/netlink.c:243: netlink_gen_raw_data: Assertion `len > 0' failed. This is esoteric, the alternative would be to restrict prefixes to ipv4/ipv6 addresses. Simpler fix is to use round_up instead of divide. Signed-off-by: Florian Westphal <fw@strlen.de>
* netlink_linearize: avoid strict-overflow warning in netlink_gen_bitwise()Thomas Haller2023-12-121-4/+3
| | | | | | | | | | | | | | | | | | With gcc-13.2.1-1.fc38.x86_64: $ gcc -Iinclude -c -o tmp.o src/netlink_linearize.c -Werror -Wstrict-overflow=5 -O3 src/netlink_linearize.c: In function ‘netlink_gen_bitwise’: src/netlink_linearize.c:1790:1: error: assuming signed overflow does not occur when changing X +- C1 cmp C2 to X cmp C2 -+ C1 [-Werror=strict-overflow] 1790 | } | ^ cc1: all warnings being treated as errors It also makes more sense this way, where "n" is the hight of the "binops" stack, and we check for a non-empty stack with "n > 0" and pop the last element with "binops[--n]". Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: remove xfree() and use plain free()Thomas Haller2023-11-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xmalloc() (and similar x-functions) are used for allocation. They wrap malloc()/realloc() but will abort the program on ENOMEM. The meaning of xmalloc() is that it wraps malloc() but aborts on failure. I don't think x-functions should have the notion, that this were potentially a different memory allocator that must be paired with a particular xfree(). Even if the original intent was that the allocator is abstracted (and possibly not backed by standard malloc()/free()), then that doesn't seem a good idea. Nowadays libc allocators are pretty good, and we would need a very special use cases to switch to something else. In other words, it will never happen that xmalloc() is not backed by malloc(). Also there were a few places, where a xmalloc() was already "wrongly" paired with free() (for example, iface_cache_release(), exit_cookie(), nft_run_cmd_from_buffer()). Or note how pid2name() returns an allocated string from fscanf(), which needs to be freed with free() (and not xfree()). This requirement bubbles up the callers portid2name() and name_by_portid(). This case was actually handled correctly and the buffer was freed with free(). But it shows that mixing different allocators is cumbersome to get right. Of course, we don't actually have different allocators and whether to use free() or xfree() makes no different. The point is that xfree() serves no actual purpose except raising irrelevant questions about whether x-functions are correctly paired with xfree(). Note that xfree() also used to accept const pointers. It is bad to unconditionally for all deallocations. Instead prefer to use plain free(). To free a const pointer use free_const() which obviously wraps free, as indicated by the name. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* include: include <string.h> in <nft.h>Thomas Haller2023-09-281-1/+0
| | | | | | | | <string.h> provides strcmp(), as such it's very basic and used everywhere. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_linearize: skip set element expression in map statement keyPablo Neira Ayuso2023-09-271-3/+3
| | | | | | | | | | | | | | | | | This fix is similar to 22d201010919 ("netlink_linearize: skip set element expression in set statement key") to fix map statement. netlink_gen_map_stmt() relies on the map key, that is expressed as a set element. Use the set element key instead to skip the set element wrap, otherwise get_register() abort execution: nft: netlink_linearize.c:650: netlink_gen_expr: Assertion `dreg < ctx->reg_low' failed. This includes JSON support to make this feature complete and it updates tests/shell to cover for this support. Reported-by: Luci Stanescu <luci@cnix.ro> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add <nft.h> header and include it as firstThomas Haller2023-08-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <config.h> is generated by the configure script. As it contains our feature detection, it want to use it everywhere. Likewise, in some of our sources, we define _GNU_SOURCE. This defines the C variant we want to use. Such a define need to come before anything else, and it would be confusing if different source files adhere to a different C variant. It would be good to use autoconf's AC_USE_SYSTEM_EXTENSIONS, in which case we would also need to ensure that <config.h> is always included as first. Instead of going through all source files and include <config.h> as first, add a new header "include/nft.h", which is supposed to be included in all our sources (and as first). This will also allow us later to prepare some common base, like include <stdbool.h> everywhere. We aim that headers are self-contained, so that they can be included in any order. Which, by the way, already didn't work because some headers define _GNU_SOURCE, which would only work if the header gets included as first. <nft.h> is however an exception to the rule: everything we compile shall rely on having <nft.h> header included as first. This applies to source files (which explicitly include <nft.h>) and to internal header files (which are only compiled indirectly, by being included from a source file). Note that <config.h> has no include guards, which is at least ugly to include multiple times. It doesn't cause problems in practice, because it only contains defines and the compiler doesn't warn about redefining a macro with the same value. Still, <nft.h> also ensures to include <config.h> exactly once. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_linearize: use div_round_up in byteorder lengthPablo Neira Ayuso2023-07-061-1/+1
| | | | | | | | | Use div_round_up() to calculate the byteorder length, otherwise fields that take % BITS_PER_BYTE != 0 are not considered by the byteorder expression. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add last statementPablo Neira Ayuso2023-02-281-0/+14
| | | | | | | | | | | | | | | | | | | | | This new statement allows you to know how long ago there was a matching packet. # nft list ruleset table ip x { chain y { [...] ip protocol icmp last used 49m54s884ms counter packets 1 bytes 64 } } if this statement never sees a packet, then the listing says: ip protocol icmp last used never counter packets 0 bytes 0 Add tests/py in this patch too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add vxlan matching supportPablo Neira Ayuso2023-01-021-6/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the initial infrastructure to support for inner header tunnel matching and its first user: vxlan. A new struct proto_desc field for payload and meta expression to specify that the expression refers to inner header matching is used. The existing codebase to generate bytecode is fully reused, allowing for reusing existing supported layer 2, 3 and 4 protocols. Syntax requires to specify vxlan before the inner protocol field: ... vxlan ip protocol udp ... vxlan ip saddr 1.2.3.0/24 This also works with concatenations and anonymous sets, eg. ... vxlan ip saddr . vxlan ip daddr { 1.2.3.4 . 4.3.2.1 } You have to restrict vxlan matching to udp traffic, otherwise it complains on missing transport protocol dependency, e.g. ... udp dport 4789 vxlan ip daddr 1.2.3.4 The bytecode that is generated uses the new inner expression: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type 1 hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type 1 hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] JSON support is not included in this patch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_linearize: fix timeout with map updatesFlorian Westphal2022-12-121-0/+4
| | | | | | | | | | | | | | | | Map updates can use timeouts, just like with sets, but the linearization step did not pass this info to the kernel. meta l4proto tcp update @pinned { ip saddr . ct original proto-src timeout 90s : ip daddr . tcp dport Listing this won't show the "timeout 90s" because kernel never saw it to begin with. Also update evaluation step to reject a timeout that was set on the data part: Timeouts are only allowed for the key-value pair as a whole. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add tcp option reset supportFlorian Westphal2022-02-281-1/+15
| | | | | | | This allows to replace a tcp option with nops, similar to the TCPOPTSTRIP feature of iptables. Signed-off-by: Florian Westphal <fw@strlen.de>
* mnl: Fix for missing info in rule dumpsPhil Sutter2021-11-301-1/+12
| | | | | | | | | | | Commit 0e52cab1e64ab improved error reporting by adding rule's table and chain names to netlink message directly, prefixed by their location info. This in turn caused netlink dumps of the rule to not contain table and chain name anymore. Fix this by inserting the missing info before dumping and remove it afterwards to not cause duplicated entries in netlink message. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: raw payload match and mangle on inner header / payload dataPablo Neira Ayuso2021-11-081-2/+3
| | | | | | | | | | | | | | | This patch adds support to match on inner header / payload data: # nft add rule x y @ih,32,32 0x14000000 counter you can also mangle payload data: # nft add rule x y @ih,32,32 set 0x14000000 counter This update triggers a checksum update at the layer 4 header via csum_flags, mangling odd bytes is also aligned to 16-bits. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Optimize prefix match only if is big-endianXiao Liang2021-08-231-1/+2
| | | | | | | | | | | A prefix of integer type is big-endian in nature. Prefix match can be optimized to truncated 'cmp' only if it is big-endian. [ Add one tests/py for this use-case --pablo ] Fixes: 25338cdb6c77 ("src: Optimize prefix matches on byte-boundaries") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink_linearize: incorrect netlink bytecode with binary operation and flagsPablo Neira Ayuso2021-07-271-15/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | nft generates incorrect bytecode when combining flag datatype and binary operations: # nft --debug=netlink add rule meh tcp_flags 'tcp flags & (fin | syn | rst | ack) syn' ip [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000006 ] [ payload load 1b @ transport header + 13 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x00000017 ) ^ 0x00000000 ] [ bitwise reg 1 = ( reg 1 & 0x00000002 ) ^ 0x00000000 ] [ cmp neq reg 1 0x00000000 ] Note the double bitwise expression. The last two expressions are not correct either since it should match on the syn flag, ie. 0x2. After this patch, netlink bytecode generation looks correct: # nft --debug=netlink add rule meh tcp_flags 'tcp flags & (fin | syn | rst | ack) syn' ip [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000006 ] [ payload load 1b @ transport header + 13 => reg 1 ] [ bitwise reg 1 = ( reg 1 & 0x00000017 ) ^ 0x00000000 ] [ cmp eq reg 1 0x00000002 ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for nat with interval concatenationPablo Neira Ayuso2021-07-131-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to combine concatenation and interval in NAT mappings, e.g. add rule x y dnat to ip saddr . tcp dport map { 192.168.1.2 . 80 : 10.141.10.2-10.141.10.5 . 8888-8999 } This generates the following NAT expression: [ nat dnat ip addr_min reg 1 addr_max reg 10 proto_min reg 9 proto_max reg 11 ] which expects to obtain the following tuple: IP address (min), source port (min), IP address (max), source port (max) to be obtained from the map. This representation simplifies the delinearize path, since the datatype is specified as: ipv4_addr . inet_service. A few more notes on this update: - alloc_nftnl_setelem() needs a variant netlink_gen_data() to deal with the representation of the range on the rhs of the mapping. In contrast to interval concatenation in the key side, where the range is expressed as two netlink attributes, the data side of the set element mapping stores the interval concatenation in a contiguos memory area, see __netlink_gen_concat_expand() for reference. - add range_expr_postprocess() to postprocess the data mapping range. If either one single IP address or port is used, then the minimum and maximum value in the range is the same value, e.g. to avoid listing 80-80, this round simplify the range. This also invokes the range to prefix conversion routine. - add concat_elem_expr() helper function to consolidate code to build the concatenation expression on the rhs element data side. This patch also adds tests/py and tests/shell. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: queue: allow use of arbitrary queue expressionsFlorian Westphal2021-06-211-5/+23
| | | | | | | | | | | | | | | | | | | | | back in 2016 Liping Zhang added support to kernel and libnftnl to specify a source register containing the queue number to use. This was never added to nft itself, so allow this. On linearization side, check if attached expression is a range. If its not, allocate a new register and set NFTNL_EXPR_QUEUE_SREG_QNUM attribute after generating the lowlevel expressions for the kernel. On delinarization we need to check for presence of NFTNL_EXPR_QUEUE_SREG_QNUM and decode the expression(s) when present. Also need to do postprocessing for STMT_QUEUE so that the protocol context is set correctly, without this only raw payload expressions will be shown (@nh,32,...) instead of 'ip ...'. Next patch adds test cases. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: add cgroupsv2 supportPablo Neira Ayuso2021-05-031-0/+1
| | | | | | Add support for matching on the cgroups version 2. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add negation match on singleton bitmask valuePablo Neira Ayuso2021-02-051-2/+7
| | | | | | | | | | | | | | | | | This patch provides a shortcut for: ct status and dnat == 0 which allows to check for the packet whose dnat bit is unset: # nft add rule x y ct status ! dnat counter This operation is only available for expression with a bitmask basetype, eg. # nft describe ct status ct expression, datatype ct_status (conntrack status) (basetype bitmask, integer), 32 bits Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: set on flags to request multi-statement supportPablo Neira Ayuso2021-01-041-0/+2
| | | | | | | Old kernel reject requests for element with multiple statements because userspace sets on the flags for multi-statements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add support for multi-statement in dynamic sets and mapsPablo Neira Ayuso2020-12-171-8/+33
| | | | | | | | This patch allows for two statements for dynamic set updates, e.g. nft rule x y add @y { ip daddr limit rate 1/second counter } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* tcpopt: allow to check for presence of any tcp optionFlorian Westphal2020-11-091-1/+1
| | | | | | | | | | | | | nft currently doesn't allow to check for presence of arbitrary tcp options. Only known options where nft provides a template can be tested for. This allows to test for presence of raw protocol values as well. Example: tcp option 42 exists Signed-off-by: Florian Westphal <fw@strlen.de>
* tcpopt: split tcpopt_hdr_fields into per-option enumFlorian Westphal2020-11-091-2/+2
| | | | | | | | | | | | | | | | | | Currently we're limited to ten template fields in exthdr_desc struct. Using a single enum for all tpc option fields thus won't work indefinitely (TCPOPTHDR_FIELD_TSECR is 9) when new option templates get added. Fortunately we can just use one enum per tcp option to avoid this. As a side effect this also allows to simplify the sack offset calculations. Rather than computing that on-the-fly, just add extra fields to the SACK template. expr->exthdr.offset now holds the 'raw' value, filled in from the option template. This would ease implementation of 'raw option matching' using offset and length to load from the option. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: Optimize prefix matches on byte-boundariesPhil Sutter2020-11-041-1/+3
| | | | | | | | | | | | | | | | If a prefix expression's length is on a byte-boundary, it is sufficient to just reduce the length passed to "cmp" expression. No need for explicit bitwise modification of data on LHS. The relevant code is already there, used for string prefix matches. There is one exception though, namely zero-length prefixes: Kernel doesn't accept zero-length "cmp" expressions, so keep them in the old code-path for now. This patch depends upon the previous one to correctly parse odd-sized payload matches but has to extend support for non-payload LHS as well. In practice, this is needed for "ct" expressions as they allow matching against IP address prefixes, too. Signed-off-by: Phil Sutter <phil@nwl.cc>
* src: improve rule error reportingPablo Neira Ayuso2020-10-201-58/+113
| | | | | | | | | | | | | | | | | | | | | Kernel provides information regarding expression since 83d9dcba06c5 ("netfilter: nf_tables: extended netlink error reporting for expressions"). A common mistake is to refer a chain which does not exist, e.g. # nft add rule x y jump test Error: Could not process rule: No such file or directory add rule x y jump test ^^^^ Use the existing netlink extended error reporting infrastructure to provide better error reporting as in the example above. Requires Linux kernel patch 83d9dcba06c5 ("netfilter: nf_tables: extended netlink error reporting for expressions"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* proto: add sctp crc32 checksum fixupFlorian Westphal2020-10-151-1/+1
| | | | | | | | | | Stateless SCTP header mangling doesn't work reliably. This tells the kernel to update the checksum field using the sctp crc32 algorithm. Note that this needs additional kernel support to work. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: support for implicit chain bindingsPablo Neira Ayuso2020-07-151-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to group rules in a subchain, e.g. table inet x { chain y { type filter hook input priority 0; tcp dport 22 jump { ip saddr { 127.0.0.0/8, 172.23.0.0/16, 192.168.13.0/24 } accept ip6 saddr ::1/128 accept; } } } This also supports for the `goto' chain verdict. This patch adds a new chain binding list to avoid a chain list lookup from the delinearize path for the usual chains. This can be simplified later on with a single hashtable per table for all chains. From the shell, you have to use the explicit separator ';', in bash you have to escape this: # nft add rule inet x y tcp dport 80 jump { ip saddr 127.0.0.1 accept\; ip6 saddr ::1 accept \; } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: use expression to store the log prefixPablo Neira Ayuso2020-07-081-2/+5
| | | | | | Intsead of using an array of char. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add STMT_NAT_F_CONCAT flag and use itPablo Neira Ayuso2020-04-281-3/+3
| | | | | | Replace ipportmap boolean field by flags. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: NAT support for intervals in mapsPablo Neira Ayuso2020-04-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to specify an interval of IP address in maps. table ip x { chain y { type nat hook postrouting priority srcnat; policy accept; snat ip interval to ip saddr map { 10.141.11.4 : 192.168.2.2-192.168.2.4 } } } The example above performs SNAT to packets that comes from 10.141.11.4 to an interval of IP addresses from 192.168.2.2 to 192.168.2.4 (both included). You can also combine this with dynamic maps: table ip x { map y { type ipv4_addr : interval ipv4_addr flags interval elements = { 10.141.10.0/24 : 192.168.2.2-192.168.2.4 } } chain y { type nat hook postrouting priority srcnat; policy accept; snat ip interval to ip saddr map @y } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: support for restoring element countersPablo Neira Ayuso2020-03-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows you to restore counters in dynamic sets: table ip test { set test { type ipv4_addr size 65535 flags dynamic,timeout timeout 30d gc-interval 1d elements = { 192.168.10.13 expires 19d23h52m27s576ms counter packets 51 bytes 17265 } } chain output { type filter hook output priority 0; update @test { ip saddr } } } You can also add counters to elements from the control place, ie. table ip test { set test { type ipv4_addr size 65535 elements = { 192.168.2.1 counter packets 75 bytes 19043 } } chain output { type filter hook output priority filter; policy accept; ip daddr @test } } Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: remove unused parameter from netlink_gen_stmt_stateful()Pablo Neira Ayuso2020-03-181-23/+13
| | | | | | Remove context from netlink_gen_stmt_stateful(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: allow nat maps containing both ip(6) address and portFlorian Westphal2020-02-241-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nft will now be able to handle map destinations { type ipv4_addr . inet_service : ipv4_addr . inet_service } chain f { dnat to ip daddr . tcp dport map @destinations } Something like this won't work though: meta l4proto tcp dnat ip6 to numgen inc mod 4 map { 0 : dead::f001 . 8080, .. as we lack the type info to properly dissect "dead::f001" as an ipv6 address. For the named map case, this info is available in the map definition, but for the anon case we'd need to resort to guesswork. Support is added by peeking into the map definition when evaluating a nat statement with a map. Right now, when a map is provided as address, we will only check that the mapped-to data type matches the expected size (of an ipv4 or ipv6 address). After this patch, if the mapped-to type is a concatenation, it will take a peek at the individual concat expressions. If its a combination of address and service, nft will translate this so that the kernel nat expression looks at the returned register that would store the inet_service part of the octet soup returned from the lookup expression. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: add support for handling shift expressions.Jeremy Sowden2020-01-281-3/+47
| | | | | | | | | The kernel supports bitwise shift operations, so add support to the netlink linearization and delinearization code. The number of bits (the righthand operand) is expected to be a 32-bit value in host endianness. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: white-space fixes.Jeremy Sowden2020-01-281-1/+1
| | | | | | | Remove some trailing white-space and fix some indentation. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netlink: Avoid potential NULL-pointer deref in netlink_gen_payload_stmt()Phil Sutter2020-01-221-1/+1
| | | | | | | | | | With payload_needs_l4csum_update_pseudohdr() unconditionally dereferencing passed 'desc' parameter and a previous check for it to be non-NULL, make sure to call the function only if input is sane. Fixes: 68de70f2b3fc6 ("netlink_linearize: fix IPv6 layer 4 checksum mangling") Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: introduce SYNPROXY matchingFernando Fernandez Mancera2019-07-171-0/+17
| | | | | | | | | | | | | | | | | | | | Add support for "synproxy" statement. For example (for TCP port 8888): table ip x { chain y { type filter hook prerouting priority raw; policy accept; tcp dport 8888 tcp flags syn notrack } chain z { type filter hook input priority filter; policy accept; tcp dport 8888 ct state invalid,untracked synproxy mss 1460 wscale 7 timestamp sack-perm ct state invalid drop } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: use UDATA defines from libnftnlPhil Sutter2019-05-031-1/+1
| | | | | | | | | | | | | Userdata attribute names have been added to libnftnl, use them instead of the local copy. While being at it, rename udata_get_comment() in netlink_delinearize.c and the callback it uses since the function is specific to rules. Also integrate the existence check for NFTNL_RULE_USERDATA into it along with the call to nftnl_rule_get_data(). Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add nat support for the inet familyFlorian Westphal2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | consider a simple ip6 nat table: table ip6 nat { chain output { type nat hook output priority 0; policy accept; dnat to dead:2::99 } Now consider same ruleset, but using 'table inet nat': nft now lacks context to determine address family to parse 'to $address'. This adds code to make the following work: table inet nat { [ .. ] # detect af from network protocol context: ip6 daddr dead::2::1 dnat to dead:2::99 # use new dnat ip6 keyword: dnat ip6 to dead:2::99 } On list side, the keyword is only shown in the inet family, else the short version (dnat to ...) is used as the family is redundant when the table already mandates the ip protocol version supported. Address mismatches such as table ip6 { .. dnat ip to 1.2.3.4 are detected/handled during the evaluation phase. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* osf: add version fingerprint supportFernando Fernandez Mancera2019-04-081-0/+1
| | | | | | | | | | | | | | | Add support for version fingerprint in "osf" expression. Example: table ip foo { chain bar { type filter hook input priority filter; policy accept; osf ttl skip name "Linux" osf ttl skip version "Linux:4.20" } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: expr: add expression etypeFlorian Westphal2019-02-081-13/+13
| | | | | | | | Temporary kludge to remove all the expr->ops->type == ... patterns. Followup patch will remove expr->ops, and make expr_ops() lookup the correct expr_ops struct instead to reduce struct expr size. Signed-off-by: Florian Westphal <fw@strlen.de>
* src: expr: add and use expr_name helperFlorian Westphal2019-02-081-1/+1
| | | | | | | | Currently callers use expr->ops->name, but follouwp patch will remove the ops pointer from struct expr. So add this helper and use it everywhere. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
* osf: add ttl option supportFernando Fernandez Mancera2018-10-231-0/+1
| | | | | | | | | | | | | | Add support for ttl option in "osf" expression. Example: table ip foo { chain bar { type filter hook input priority filter; policy accept; osf ttl skip name "Linux" } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: add ipsec (xfrm) expressionMáté Eckl2018-09-211-0/+16
| | | | | | | | | | | | | | | | This allows matching on ipsec tunnel/beet addresses in xfrm state associated with a packet, ipsec request id and the SPI. Examples: ipsec in ip saddr 192.168.1.0/24 ipsec out ip6 daddr @endpoints ipsec in spi 1-65536 Joint work with Florian Westphal. Cc: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
* src: integrate stateful expressions into sets and mapsPablo Neira Ayuso2018-08-241-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following example shows how to populate a set from the packet path using the destination IP address, for each entry there is a counter. The entry expires after the 1 hour timeout if no packets matching this entry are seen. table ip x { set xyz { type ipv4_addr size 65535 flags dynamic,timeout timeout 1h } chain y { type filter hook output priority filter; policy accept; update @xyz { ip daddr counter } counter } } Similar example, that creates a mapping better IP address and mark, where the mark is assigned using an incremental sequence generator from 0 to 1 inclusive. table ip x { map xyz { type ipv4_addr : mark size 65535 flags dynamic,timeout timeout 1h } chain y { type filter hook input priority filter; policy accept; update @xyz { ip saddr counter : numgen inc mod 2 } } } Supported stateful statements are: limit, quota, counter and connlimit. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: simplify map statementPablo Neira Ayuso2018-08-241-10/+11
| | | | | | | Instead of using the map expression, store dynamic key and data separately since they need special handling than constant maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: introduce passive OS fingerprint matchingFernando Fernandez Mancera2018-08-041-0/+13
| | | | | | | | | | | | | | Add support for "osf" expression. Example: table ip foo { chain bar { type filter hook input priority 0; policy accept; osf name "Linux" counter packets 3 bytes 132 } } Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* src: Add tproxy supportMáté Eckl2018-08-031-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for transparent proxy functionality which is supported in ip, ip6 and inet tables. The syntax is the following: tproxy [{|ip|ip6}] to {<ip address>|:<port>|<ip address>:<port>} It looks for a socket listening on the specified address or port and assigns it to the matching packet. In an inet table, a packet matches for both families until address is specified. Network protocol family has to be specified **only** in inet tables if address is specified. As transparent proxy support is implemented for sockets with layer 4 information, a transport protocol header criterion has to be set in the same rule. eg. 'meta l4proto tcp' or 'udp dport 4444' Example ruleset: table ip x { chain y { type filter hook prerouting priority -150; policy accept; tcp dport ntp tproxy to 1.1.1.1 udp dport ssh tproxy to :2222 } } table ip6 x { chain y { type filter hook prerouting priority -150; policy accept; tcp dport ntp tproxy to [dead::beef] udp dport ssh tproxy to :2222 } } table inet x { chain y { type filter hook prerouting priority -150; policy accept; tcp dport 321 tproxy to :ssh tcp dport 99 tproxy ip to 1.1.1.1:999 udp dport 155 tproxy ip6 to [dead::beef]:smux } } Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>