summaryrefslogtreecommitdiffstats
path: root/doc/primary-expression.txt
blob: e13970cfb6504d590db25fba5eb06f28d6957684 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
META EXPRESSIONS
~~~~~~~~~~~~~~~~
[verse]
*meta* {*length* | *nfproto* | *l4proto* | *protocol* | *priority*}
[*meta*] {*mark* | *iif* | *iifname* | *iiftype* | *oif* | *oifname* | *oiftype* | *skuid* | *skgid* | *nftrace* | *rtclassid* | *ibrname* | *obrname* | *pkttype* | *cpu* | *iifgroup* | *oifgroup* | *cgroup* | *random* | *ipsec* | *iifkind* | *oifkind* | *time* | *hour* | *day* }

A meta expression refers to meta data associated with a packet.

There are two types of meta expressions: unqualified and qualified meta
expressions. Qualified meta expressions require the meta keyword before the meta
key, unqualified meta expressions can be specified by using the meta key
directly or as qualified meta expressions. Meta l4proto is useful to match a
particular transport protocol that is part of either an IPv4 or IPv6 packet. It
will also skip any IPv6 extension headers present in an IPv6 packet.

meta iif, oif, iifname and oifname are used to match the interface a packet
arrived on or is about to be sent out on.

iif and oif are used to match on the interface index, whereas iifname and
oifname are used to match on the interface name.
This is not the same -- assuming the rule

  filter input meta iif "foo"

Then this rule can only be added if the interface "foo" exists.
Also, the rule will continue to match even if the
interface "foo" is renamed to "bar".

This is because internally the interface index is used.
In case of dynamically created interfaces, such as tun/tap or dialup
interfaces (ppp for example), it might be better to use iifname or oifname
instead.

In these cases, the name is used so the interface doesn't have to exist to
add such a rule, it will stop matching if the interface gets renamed and it
will match again in case interface gets deleted and later a new interface
with the same name is created.

Like with iptables, wildcard matching on interface name prefixes is available for
*iifname* and *oifname* matches by appending an asterisk (*) character. Note
however that unlike iptables, nftables does not accept interface names
consisting of the wildcard character only - users are supposed to just skip
those always matching expressions. In order to match on literal asterisk
character, one may escape it using backslash (\).

.Meta expression types
[options="header"]
|==================
|Keyword | Description | Type
|length|
Length of the packet in bytes|
integer (32-bit)
|nfproto|
real hook protocol family, useful only in inet table|
integer (32 bit)
|l4proto|
layer 4 protocol, skips ipv6 extension headers|
integer (8 bit)
|protocol|
EtherType protocol value|
ether_type
|priority|
TC packet priority|
tc_handle
|mark|
Packet mark |
mark
|iif|
Input interface index |
iface_index
|iifname|
Input interface name |
ifname
|iiftype|
Input interface type|
iface_type
|oif|
Output interface index|
iface_index
|oifname|
Output interface name|
ifname
|oiftype|
Output interface hardware type|
iface_type
|sdif|
Slave device input interface index |
iface_index
|sdifname|
Slave device interface name|
ifname
|skuid|
UID associated with originating socket|
uid
|skgid|
GID associated with originating socket|
gid
|rtclassid|
Routing realm|
realm
|ibrname|
Input bridge interface name|
ifname
|obrname|
Output bridge interface name|
ifname
|pkttype|
packet type|
pkt_type
|cpu|
cpu number processing the packet|
integer (32 bit)
|iifgroup|
incoming device group|
devgroup
|oifgroup|
outgoing device group|
devgroup
|cgroup|
control group id |
integer (32 bit)
|random|
pseudo-random number|
integer (32 bit)
|ipsec|
true if packet was ipsec encrypted |
boolean (1 bit)
|iifkind|
Input interface kind |
|oifkind|
Output interface kind|
|time|
Absolute time of packet reception|
Integer (32 bit) or string
|day|
Day of week|
Integer (8 bit) or string
|hour|
Hour of day|
String
|====================

.Meta expression specific types
[options="header"]
|==================
|Type | Description
|iface_index |
Interface index (32 bit number). Can be specified numerically or as name of an existing interface.
|ifname|
Interface name (16 byte string). Does not have to exist.
|iface_type|
Interface type (16 bit number).
|uid|
User ID (32 bit number). Can be specified numerically or as user name.
|gid|
Group ID (32 bit number). Can be specified numerically or as group name.
|realm|
Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.
|devgroup_type|
Device group (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/group.
|pkt_type|
Packet type: *host* (addressed to local host), *broadcast* (to all),
*multicast* (to group), *other* (addressed to another host).
|ifkind|
Interface kind (16 byte string). See TYPES in ip-link(8) for a list.
|time|
Either an integer or a date in ISO format. For example: "2019-06-06 17:00".
Hour and seconds are optional and can be omitted if desired. If omitted,
midnight will be assumed.
The following three would be equivalent: "2019-06-06", "2019-06-06 00:00"
and "2019-06-06 00:00:00".
When an integer is given, it is assumed to be a UNIX timestamp.
|day|
Either a day of week ("Monday", "Tuesday", etc.), or an integer between 0 and 6.
Strings are matched case-insensitively, and a full match is not expected (e.g. "Mon" would match "Monday").
When an integer is given, 0 is Sunday and 6 is Saturday.
|hour|
A string representing an hour in 24-hour format. Seconds can optionally be specified.
For example, 17:00 and 17:00:00 would be equivalent.
|=============================

.Using meta expressions
-----------------------
# qualified meta expression
filter output meta oif eth0
filter forward meta iifkind { "tun", "veth" }

# unqualified meta expression
filter output oif eth0

# incoming packet was subject to ipsec processing
raw prerouting meta ipsec exists accept
-----------------------

SOCKET EXPRESSION
~~~~~~~~~~~~~~~~~
[verse]
*socket* {*transparent* | *mark* | *wildcard*}
*socket* *cgroupv2* *level* 'NUM'

Socket expression can be used to search for an existing open TCP/UDP socket and
its attributes that can be associated with a packet. It looks for an established
or non-zero bound listening socket (possibly with a non-local address). You can
also use it to match on the socket cgroupv2 at a given ancestor level, e.g. if
the socket belongs to cgroupv2 'a/b', ancestor level 1 checks for a matching on
cgroup 'a' and ancestor level 2 checks for a matching on cgroup 'b'.

.Available socket attributes
[options="header"]
|==================
|Name |Description| Type
|transparent|
Value of the IP_TRANSPARENT socket option in the found socket. It can be 0 or 1.|
boolean (1 bit)
|mark| Value of the socket mark (SOL_SOCKET, SO_MARK). | mark
|wildcard|
Indicates whether the socket is wildcard-bound (e.g. 0.0.0.0 or ::0). |
boolean (1 bit)
|cgroupv2|
cgroup version 2 for this socket (path from /sys/fs/cgroup)|
cgroupv2
|==================

.Using socket expression
------------------------
# Mark packets that correspond to a transparent socket. "socket wildcard 0"
# means that zero-bound listener sockets are NOT matched (which is usually
# exactly what you want).
table inet x {
    chain y {
        type filter hook prerouting priority mangle; policy accept;
        socket transparent 1 socket wildcard 0 mark set 0x00000001 accept
    }
}

# Trace packets that corresponds to a socket with a mark value of 15
table inet x {
    chain y {
        type filter hook prerouting priority mangle; policy accept;
        socket mark 0x0000000f nftrace set 1
    }
}

# Set packet mark to socket mark
table inet x {
    chain y {
        type filter hook prerouting priority mangle; policy accept;
        tcp dport 8080 mark set socket mark
    }
}

# Count packets for cgroupv2 "user.slice" at level 1
table inet x {
    chain y {
        type filter hook input priority filter; policy accept;
        socket cgroupv2 level 1 "user.slice" counter
    }
}
----------------------

OSF EXPRESSION
~~~~~~~~~~~~~~
[verse]
*osf* [*ttl* {*loose* | *skip*}] {*name* | *version*}

The osf expression does passive operating system fingerprinting. This
expression compares some data (Window Size, MSS, options and their order, DF,
and others) from packets with the SYN bit set.

.Available osf attributes
[options="header"]
|==================
|Name |Description| Type
|ttl|
Do TTL checks on the packet to determine the operating system.|
string
|version|
Do OS version checks on the packet.|
|name|
Name of the OS signature to match. All signatures can be found at pf.os file.
Use "unknown" for OS signatures that the expression could not detect.|
string
|==================

.Available ttl values
---------------------
If no TTL attribute is passed, make a true IP header and fingerprint TTL true comparison. This generally works for LANs.

* loose: Check if the IP header's TTL is less than the fingerprint one. Works for globally-routable addresses.
* skip: Do not compare the TTL at all.
---------------------

.Using osf expression
---------------------
# Accept packets that match the "Linux" OS genre signature without comparing TTL.
table inet x {
    chain y {
        type filter hook input priority filter; policy accept;
        osf ttl skip name "Linux"
    }
}
-----------------------

FIB EXPRESSIONS
~~~~~~~~~~~~~~~
[verse]
*fib* {*saddr* | *daddr* | *mark* | *iif* | *oif*} [*.* ...] {*oif* | *oifname* | *type*}

A fib expression queries the fib (forwarding information base) to obtain
information such as the output interface index a particular address would use.
The input is a tuple of elements that is used as input to the fib lookup
functions.

.fib expression specific types
[options="header"]
|==================
|Keyword| Description| Type
|oif|
Output interface index|
integer (32 bit)
|oifname|
Output interface name|
string
|type|
Address type |
fib_addrtype
|=======================

Use *nft* *describe* *fib_addrtype* to get a list of all address types.

.Using fib expressions
----------------------
# drop packets without a reverse path
filter prerouting fib saddr . iif oif missing drop

In this example, 'saddr . iif' looks up routing information based on the source address and the input interface.
oif picks the output interface index from the routing information.
If no route was found for the source address/input interface combination, the output interface index is zero.
In case the input interface is specified as part of the input key, the output interface index is always the same as the input interface index or zero.
If only 'saddr oif' is given, then oif can be any interface index or zero.

# drop packets to address not configured on incoming interface
filter prerouting fib daddr . iif type != { local, broadcast, multicast } drop

# perform lookup in a specific 'blackhole' table (0xdead, needs ip appropriate ip rule)
filter prerouting meta mark set 0xdead fib daddr . mark type vmap { blackhole : drop, prohibit : jump prohibited, unreachable : drop }
----------------------

ROUTING EXPRESSIONS
~~~~~~~~~~~~~~~~~~~
[verse]
*rt* [*ip* | *ip6*] {*classid* | *nexthop* | *mtu* | *ipsec*}

A routing expression refers to routing data associated with a packet.

.Routing expression types
[options="header"]
|=======================
|Keyword| Description| Type
|classid|
Routing realm|
realm
|nexthop|
Routing nexthop|
ipv4_addr/ipv6_addr
|mtu|
TCP maximum segment size of route |
integer (16 bit)
|ipsec|
route via ipsec tunnel or transport |
boolean
|=================================

.Routing expression specific types
[options="header"]
|=======================
|Type| Description
|realm|
Routing Realm (32 bit number). Can be specified numerically or as symbolic name defined in /etc/iproute2/rt_realms.
|========================

.Using routing expressions
--------------------------
# IP family independent rt expression
filter output rt classid 10

# IP family dependent rt expressions
ip filter output rt nexthop 192.168.0.1
ip6 filter output rt nexthop fd00::1
inet filter output rt ip nexthop 192.168.0.1
inet filter output rt ip6 nexthop fd00::1

# outgoing packet will be encapsulated/encrypted by ipsec
filter output rt ipsec exists
-------------------------- 

IPSEC EXPRESSIONS
~~~~~~~~~~~~~~~~~

[verse]
*ipsec* {*in* | *out*} [ *spnum* 'NUM' ]  {*reqid* | *spi*}
*ipsec* {*in* | *out*} [ *spnum* 'NUM' ]  {*ip* | *ip6*} {*saddr* | *daddr*}

An ipsec expression refers to ipsec data associated with a packet.

The 'in' or 'out' keyword needs to be used to specify if the expression should
examine inbound or outbound policies. The 'in' keyword can be used in the
prerouting, input and forward hooks.  The 'out' keyword applies to forward,
output and postrouting hooks.
The optional keyword spnum can be used to match a specific state in a chain,
it defaults to 0.

.Ipsec expression types
[options="header"]
|=======================
|Keyword| Description| Type
|reqid|
Request ID|
integer (32 bit)
|spi|
Security Parameter Index|
integer (32 bit)
|saddr|
Source address of the tunnel|
ipv4_addr/ipv6_addr
|daddr|
Destination address of the tunnel|
ipv4_addr/ipv6_addr
|=================================

*Note:* When using xfrm_interface, this expression is not useable in output
hook as the plain packet does not traverse it with IPsec info attached - use a
chain in postrouting hook instead.

NUMGEN EXPRESSION
~~~~~~~~~~~~~~~~~

[verse]
*numgen* {*inc* | *random*} *mod* 'NUM' [ *offset* 'NUM' ]

Create a number generator. The *inc* or *random* keywords control its
operation mode: In *inc* mode, the last returned value is simply incremented.
In *random* mode, a new random number is returned. The value after *mod*
keyword specifies an upper boundary (read: modulus) which is not reached by
returned numbers. The optional *offset* allows one to increment the returned value
by a fixed offset.

A typical use-case for *numgen* is load-balancing:

.Using numgen expression
------------------------
# round-robin between 192.168.10.100 and 192.168.20.200:
add rule nat prerouting dnat to numgen inc mod 2 map \
	{ 0 : 192.168.10.100, 1 : 192.168.20.200 }

# probability-based with odd bias using intervals:
add rule nat prerouting dnat to numgen random mod 10 map \
        { 0-2 : 192.168.10.100, 3-9 : 192.168.20.200 }
------------------------

HASH EXPRESSIONS
~~~~~~~~~~~~~~~~

[verse]
*jhash* {*ip saddr* | *ip6 daddr* | *tcp dport* | *udp sport* | *ether saddr*} [*.* ...] *mod* 'NUM' [ *seed* 'NUM' ] [ *offset* 'NUM' ]
*symhash* *mod* 'NUM' [ *offset* 'NUM' ]

Use a hashing function to generate a number. The functions available are
*jhash*, known as Jenkins Hash, and *symhash*, for Symmetric Hash. The
*jhash* requires an expression to determine the parameters of the packet
header to apply the hashing, concatenations are possible as well. The value
after *mod* keyword specifies an upper boundary (read: modulus) which is
not reached by returned numbers. The optional *seed* is used to specify an
init value used as seed in the hashing function. The optional *offset*
allows one to increment the returned value by a fixed offset.

A typical use-case for *jhash* and *symhash* is load-balancing:

.Using hash expressions
------------------------
# load balance based on source ip between 2 ip addresses:
add rule nat prerouting dnat to jhash ip saddr mod 2 map \
	{ 0 : 192.168.10.100, 1 : 192.168.20.200 }

# symmetric load balancing between 2 ip addresses:
add rule nat prerouting dnat to symhash mod 2 map \
        { 0 : 192.168.10.100, 1 : 192.168.20.200 }
------------------------