summaryrefslogtreecommitdiffstats
path: root/docs/how_it_works.html
blob: dcf85c0ed3cc569a52fa15299d3cadbc06ad5159 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML><HEAD><TITLE>How bridge/ebtables/iptables interaction works</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-15">
<STYLE type=text/css>H1 {
	FONT: bold 25pt Times, serif; TEXT-ALIGN: center; TEXT-DECORATION: underline
}
P {
	FONT: 20pt Times, serif
}
LI {
	MARGIN-BOTTOM: 2em; FONT: 22pt 'Times New Roman', serif
}
PRE {
	FONT: 18pt Courier, monospace
}
.statement {
	TEXT-DECORATION: underline
}
.section {
	FONT: bold 22pt Times
}
.case {
	FONT-STYLE: italic
}
</STYLE>

<META content="MSHTML 6.00.2505.0" name=GENERATOR></HEAD>
<BODY>
<H1>How bridge/ebtables/iptables interaction works</H1>

<P class=section>1. How frames traverse the <EM>ebtables</EM> chains:</P>
<P>This section only considers <EM>ebtables</EM>, _not_ <EM>iptables</EM>.</P>
<PRE>
     Route
       ^
       |
I  +--------+ Bridge  +----------+                     +-------+      +-----------+   O
N->|BROUTING|-------->|PREROUTING|----->[BRIDGING]---->|FORWARD| ---->|POSTROUTING|-->U
   +--------+         +----------+      [DECISION]     +-------+      +-----------+   T
                                             |                              ^ 
                                             v                              |
                                          +-----+                      +----------+
                                          |INPUT|                      |OUTPUT (2)|
                                          +-----+                      +----------+
                                             |                              ^
                                             |                              |
                                             |                         +----------+
                                             |                         +OUTPUT (1)+
                                             |                         +----------+
                                             |                              ^
                                             +------->Local Process---------+
</PRE>
<P>
First thing to keep in mind is that we are talking about the ethernet layer here,
so the OSI layer 2. A packet destined for the local computer according to the bridge
(which works on the ethernet layer) isn't necessarily destined for the local computer
according to the ip layer. That's how routing works (MAC destination is the router, ip
destination is the actual box you want to communicate with).</P>
<P>
<EM>Ebtables</EM> currently has three tables: filter, nat and broute. The filter table has a
FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain.
The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2)
appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not
the same (and have a different intended use).</P>
<P>
When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING
chain. In this special chain one can choose whether to route or bridge frames. The default
is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame
passes through the PREROUTING chain. This chain is intended for you to be able to alter the
destination MAC address of
frames (DNAT). If the frame passes this chain, the bridging code will decide where the
frame should be sent. The bridge does this by looking at the destination MAC address, it
doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in
on non-forwarding ports of a bridge will not be seen by <EM>ebtables</EM>, not even by the BROUTING
chain.</P>
<P>
If the bridge decides the frame is for the bridging computer, the frame will go through the
INPUT chain. In this chain you can filter frames destined for the bridge box. After passing
the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up),
e.g. to the ip code. So, a routed ip packet will go through the <EM>ebtables</EM> INPUT chain, not
through the <EM>ebtables</EM> FORWARD chain. This is logical.</P>
<P>
Else the frame should possibly be sent onto another side of the bridge. If it should, the
frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one
can filter frames that will be bridged, the POSTROUTING chain is intended to be able to
change the MAC source address (SNAT).</P>
<P>
Frames that originate from the bridge box itself will go, after the bridging decision, through the
nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The
nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain
allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is
traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING
chain is the same one as described above. Note that it is also possible for routed frames to go
through these chains, this is when the destination device is a logical bridge device.</P>
<P class=section>
2. A machine used as a bridge and a router (not a brouter):</P>
<P>
It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT
and POSTROUTING <EM>ebtables</EM> chains.</P>
<P>
This can happen when the bridge is also used as a router. The ethernet frame(s) containing that
ip packet will have the bridge's destination MAC address, while the destination ip address is not
that of the bridge. Including the <EM>iptables</EM> chains, this is how the ip packet runs through the
bridge/router (eb=ebtables , ip=iptables ):</P>
<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
<P>
This assumes that the routing decision sends the packet to a bridge interface. If the routing
decision sends the packet to a physical network card, this is what happens:</P>
<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet</PRE>
<P>
What is obviously "asymmetric" here is that the <EM>iptables</EM> PREROUTING chain is traversed before
the <EM>ebtables</EM> INPUT chain, however this can not be helped. See the next section.</P>
<P class=section>
3. DNATing bridged packets:</P>
<P>
Take an ip packet received by the bridge,  it enters the bridge code. Lets assume we want to do
some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address)
has to happen before the bridge code decides what to do with the packet. The bridge code can decide
to bridge it (if the destination MAC address is on another side of the bridge), flood it over all
the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge),
give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the
bridge, or ignore it (the destination MAC address is located on the same side of the bridge).</P>
<P>
So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code
actually does anything. This is at the same place as where the <EM>ebtables</EM> PREROUTING chain will
be traversed (for the same reason).</P>
<P class=section>
4. Chain traversal for bridged ip packets:</P>
<P>
A bridged packet never enters any network code above layer 2. So a bridged ip packet will never
enter the ip code. Therefore all <EM>iptables</EM> chains will be traversed while the ip packet is in the
bridge code. The chain traversal will look like this:</P>
<PRE>
ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING</PRE>
<P>
Once again note that there is a certain form of asymmetry here that cannot be helped.</P>
<P class=section>
5. Using a bridge port in <EM>iptables</EM> rules:</P>
<P>
The wish to be able to use physical devices belonging to a bridge (bridge ports) in <EM>iptables</EM> rules
is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If <EM>iptables</EM>
rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip
address to that of a box on the eth1 side, except by looking at the MAC source address (and then
still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your
<EM>iptables</EM> rules and therefore catch these attempts.</P>
<P class=case>
1. <EM>iptables</EM> wants to use bridge ports:<P>
<P>
To make this possible the <EM>iptables</EM> chains have to be traversed after the bridge code decided where
the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented
in section 2 (so, we are looking at routed traffic here). It actually looks like this:</P>
<PRE>
ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet</PRE>
<P>
Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel,
the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to
filter.</P>
<PRE>ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
<P>
Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change
the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.</P>
<P class=case>
2. IP DNAT for locally generated packets (so in the <EM>iptables</EM> nat OUTPUT chain):</P>
<P>
The 'normal' way locally generated packets would go through the chains looks like this:</P>
<PRE>
ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING</PRE>
<P>
From the section 5.1 we know that this actually looks like this:</P>
<PRE>
ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING</PRE>
<P>
Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the <EM>iptables</EM> nat (resp. filter) OUTPUT chain. Note that
the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when
the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets
one use the bridge ports in the ipOUTPUT(2) chain.</P>
<P class=section>
6. Two possible ways for frames/packets to pass through the <EM>iptables</EM> PREROUTING, FORWARD and POSTROUTING
chains:</P>
<P>
With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given <EM>iptables</EM>
chains. The first way is when the frame is bridged, so the <EM>iptables</EM> chains are called by the bridge code.
The second way is when the packet is routed. So special care has to be taken to distinguish between those
two, especially in the <EM>iptables</EM> FORWARD chain. Here's an example of strange things to look out for:</P>
<P>
Consider the following situation (my personal setup)</P>
<PRE>
         +-----------------+
         |   cable modem   |
         +-------+---------+
                 |
                 |
             eth0|IP via DHCP from ISP
         +-------+---------+
         |bridge/router/fw |
         +--+-----------+--+
        eth1| 172.16.1.1|eth2
            |   (br0)   |
            |           |
  172.16.1.4|           |172.16.1.2
 +----------+---+    +--+------------+
 |test computer/|    |    desktop    |
 |backup server |    +---------------+
 +--------------+</PRE>
<P>
With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all
three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge
interface br0 with ports eth1 and eth2.</P>
<P class=case>More details:</P>
<P>
The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using
masquerading. Here's the "script" I use at bootup for the bridge/router:</P>
<PRE>
iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT
iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE
insmod ebtables
insmod ebtable_filter
insmod ebtable_nat
insmod ebt_nat
insmod ebt_log
insmod ebt_arp
insmod ebt_ip
insmod br_db
brctl addbr br0
brctl stp br0 off
brctl addif br0 eth1
brctl addif br0 eth2
ifconfig eth1 0 0.0.0.0
ifconfig eth2 0 0.0.0.0
ifconfig br0 172.16.1.1 netmask 255.255.255.0 up
echo '1' > /proc/sys/net/ipv4/ip_forward</PRE>
<P>
The catch is in the first line. Because the <EM>iptables</EM> code gets executed for both bridged packets and routed
packets we need to make a distinction between the two. We don't really want the bridged packets to be
masqueraded. If we omit the first line then everything will work too, but things will happen differently.
Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1
port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and
172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from
172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the
first line of the script to avoid this. Note that if I wanted to filter the connections to and from the
internet, I would certainly need the first line so I don't filter the local connections as well.</P>
<P class=section>
7. ip DNAT in the <EM>iptables</EM> PREROUTING chain on frames/packets entering on a bridge port:</P>
<P>Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing
have the same output device as the input device they came on (the logical bridge device which we like to call br0)
will be bridged, not routed. So they will go through the <EM>ebtables</EM> FORWARD chain. All other DNAT'ed packets will be
routed, so won't go through the <EM>ebtables</EM> FORWARD chain, will go through the <EM>ebtables</EM> INPUT chain and might go
through the <EM>ebtables</EM> OUTPUT chain.</P>
<P class=section>
8. using the mac module extension for <EM>iptables</EM>:</P>
<P>The side effect explained here occurs when the br-nf code is compiled in the kernel, the ip packet is routed and the out device
for that packet is a logical bridge. The side effect is encountered when filtering on the mac source in the
<EM>iptables</EM> FORWARD chains. As should be clear from earlier sections, the traversal of the <EM>iptables</EM> FORWARD chains
is postponed until the packet is in the bridge code. This is done so one can filter on the bridge port out device. This has a
side effect on the MAC source address, because the ip code will have changed the MAC source address to the MAC address of the bridge.
It is therefore impossible, in the <EM>iptables</EM> FORWARD chains, to filter on the MAC source address of the computer sending
the packet in question to the bridge/router. If you really need to filter on this MAC source address, you should do it in the nat
PREROUTING chain. Agreed, very ugly, but making it possible to filter on the real MAC source address in the FORWARD chains would
involve a very dirty hack and is probably not worth it.</P>
<P>
Released under the GPL.</P>
<P>
Bart De Schuymer.</P>
<P>
Last updated June 2nd, 2002.</P>
</BODY></HTML>