describes the br-nf/ebtables/iptables interaction

author: Bart De Schuymer <bdschuym@pandora.be> 2002-06-02 14:02:18 +0000
committer: Bart De Schuymer <bdschuym@pandora.be> 2002-06-02 14:02:18 +0000
commit: 08934e3091865e6a4165702401f032e8327e380e (patch)
tree: 09d072080007dc6ec5ded07133c65bde65728c45 /docs
parent: 8ecde2955fcc739657e95f87e53e74092873c892 (diff)
1 files changed, 255 insertions, 0 deletions
diff --git a/docs/how_it_works.html b/docs/how_it_works.html
new file mode 100755
index 0000000..9ec8b27
--- /dev/null
+++ b/docs/how_it_works.html
@@ -0,0 +1,255 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
+<HTML><HEAD><TITLE>How bridge/ebtables/iptables interaction works</TITLE>
+<META http-equiv=Content-Type content="text/html; charset=iso-8859-15">
+<STYLE type=text/css>H1 {
+	FONT: bold 25pt Times, serif; TEXT-ALIGN: center; TEXT-DECORATION: underline
+}
+P {
+	FONT: 20pt Times, serif
+}
+LI {
+	MARGIN-BOTTOM: 2em; FONT: 22pt 'Times New Roman', serif
+}
+PRE {
+	FONT: 18pt Courier, monospace
+}
+.statement {
+	TEXT-DECORATION: underline
+}
+.section {
+	FONT: bold 22pt Times
+}
+.case {
+	FONT-STYLE: italic
+}
+</STYLE>
+
+<META content="MSHTML 6.00.2505.0" name=GENERATOR></HEAD>
+<BODY>
+<H1>How bridge/ebtables/iptables interaction works</H1>
+
+<P class=section>1. How frames traverse the <EM>ebtables</EM> chains:</P>
+<P>This section only considers <EM>ebtables</EM>, _not_ <EM>iptables</EM>.</P>
+<PRE>
+     Route
+       ^
+       |
+I  +--------+ Bridge  +----------+                     +-------+      +-----------+   O
+N->|BROUTING|-------->|PREROUTING|----->[BRIDGING]---->|FORWARD| ---->|POSTROUTING|-->U
+   +--------+         +----------+      [DECISION]     +-------+      +-----------+   T
+                                             |                              ^ 
+                                             v                              |
+                                          +-----+                      +----------+
+                                          |INPUT|                      |OUTPUT (2)|
+                                          +-----+                      +----------+
+                                             |                              ^
+                                             |                              |
+                                             |                         +----------+
+                                             |                         +OUTPUT (1)+
+                                             |                         +----------+
+                                             |                              ^
+                                             +------->Local Process---------+
+</PRE>
+<P>
+First thing to keep in mind is that we are talking about the ethernet layer here,
+so the OSI layer 2. A packet destined for the local computer according to the bridge
+(which works on the ethernet layer) isn't necessarily destined for the local computer
+according to the ip layer. That's how routing works (MAC destination is the router, ip
+destination is the actual box you want to communicate with).</P>
+<P>
+<EM>Ebtables</EM> currently has three tables: filter, nat and broute. The filter table has a
+FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain.
+The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2)
+appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not
+the same (and have a different intended use).</P>
+<P>
+When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING
+chain. In this special chain one can choose whether to route or bridge frames. The default
+is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame
+passes through the PREROUTING chain. This chain is intended for you to be able to alter the
+destination MAC address of
+frames (DNAT). If the frame passes this chain, the bridging code will decide where the
+frame should be sent. The bridge does this by looking at the destination MAC address, it
+doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in
+on non-forwarding ports of a bridge will not be seen by <EM>ebtables</EM>, not even by the BROUTING
+chain.</P>
+<P>
+If the bridge decides the frame is for the bridging computer, the frame will go through the
+INPUT chain. In this chain you can filter frames destined for the bridge box. After passing
+the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up),
+e.g. to the ip code. So, a routed ip packet will go through the <EM>ebtables</EM> INPUT chain, not
+through the <EM>ebtables</EM> FORWARD chain. This is logical.</P>
+<P>
+Else the frame should possibly be sent onto another side of the bridge. If it should, the
+frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one
+can filter frames that will be bridged, the POSTROUTING chain is intended to be able to
+change the MAC source address (SNAT).</P>
+<P>
+Frames that originate from the bridge box itself will go, after the bridging decision, through the
+nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The
+nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain
+allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is
+traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING
+chain is the same one as described above. Note that it is also possible for routed frames to go
+through these chains, this is when the destination device is a logical bridge device.</P>
+<P class=section>
+2. A machine used as a bridge and a router (not a brouter):</P>
+<P>
+It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT
+and POSTROUTING <EM>ebtables</EM> chains.</P>
+<P>
+This can happen when the bridge is also used as a router. The ethernet frame(s) containing that
+ip packet will have the bridge's destination MAC address, while the destination ip address is not
+that of the bridge. Including the <EM>iptables</EM> chains, this is how the ip packet runs through the
+bridge/router (eb=ebtables , ip=iptables ):</P>
+<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
+<P>
+This assumes that the routing decision sends the packet to a bridge interface. If the routing
+decision sends the packet to a physical network card, this is what happens:</P>
+<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet</PRE>
+<P>
+What is obviously "asymmetric" here is that the <EM>iptables</EM> PREROUTING chain is traversed before
+the <EM>ebtables</EM> INPUT chain, however this can not be helped. See the next section.</P>
+<P class=section>
+3. DNATing bridged packets:</P>
+<P>
+Take an ip packet received by the bridge,  it enters the bridge code. Lets assume we want to do
+some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address)
+has to happen before the bridge code decides what to do with the packet. The bridge code can decide
+to bridge it (if the destination MAC address is on another side of the bridge), flood it over all
+the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge),
+give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the
+bridge, or ignore it (the destination MAC address is located on the same side of the bridge).</P>
+<P>
+So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code
+actually does anything. This is at the same place as where the <EM>ebtables</EM> PREROUTING chain will
+be traversed (for the same reason).</P>
+<P class=section>
+4. Chain traversal for bridged ip packets:</P>
+<P>
+A bridged packet never enters any network code above layer 2. So a bridged ip packet will never
+enter the ip code. Therefore all <EM>iptables</EM> chains will be traversed while the ip packet is in the
+bridge code. The chain traversal will look like this:</P>
+<PRE>
+ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING</PRE>
+<P>
+Once again note that there is a certain form of asymmetry here that cannot be helped.</P>
+<P class=section>
+5. Using a bridge port in <EM>iptables</EM> rules:</P>
+<P>
+The wish to be able to use physical devices belonging to a bridge (bridge ports) in <EM>iptables</EM> rules
+is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If <EM>iptables</EM>
+rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip
+address to that of a box on the eth1 side, except by looking at the MAC source address (and then
+still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your
+<EM>iptables</EM> rules and therefore catch these attempts.</P>
+<P class=case>
+1. <EM>iptables</EM> wants to use bridge ports:<P>
+<P>
+To make this possible the <EM>iptables</EM> chains have to be traversed after the bridge code decided where
+the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented
+in section 2 (so, we are looking at routed traffic here). It actually looks like this:</P>
+<PRE>
+ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet</PRE>
+<P>
+Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel,
+the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to
+filter.</P>
+<PRE>ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
+<P>
+Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change
+the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.</P>
+<P class=case>
+2. IP DNAT for locally generated packets (so in the <EM>iptables</EM> nat OUTPUT chain):</P>
+<P>
+The 'normal' way locally generated packets would go through the chains looks like this:</P>
+<PRE>
+ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING</PRE>
+<P>
+From the section 5.1 we know that this actually looks like this:</P>
+<PRE>
+ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING</PRE>
+<P>
+Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the <EM>iptables</EM> nat (resp. filter) OUTPUT chain. Note that
+the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when
+the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets
+one use the bridge ports in the ipOUTPUT(2) chain.</P>
+<P class=section>
+4. Two possible ways for frames/packets to pass through the <EM>iptables</EM> PREROUTING, FORWARD and POSTROUTING
+chains:</P>
+<P>
+With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given <EM>iptables</EM>
+chains. The first way is when the frame is bridged, so the <EM>iptables</EM> chains are called by the bridge code.
+The second way is when the packet is routed. So special care has to be taken to distinguish between those
+two, especially in the <EM>iptables</EM> FORWARD chain. Here's an example of strange things to look out for:</P>
+<P>
+Consider the following situation (my personal setup)</P>
+<PRE>
+         +-----------------+
+         |   cable modem   |
+         +-------+---------+
+                 |
+                 |
+             eth0|IP via DHCP from ISP
+         +-------+---------+
+         |bridge/router/fw |
+         +--+-----------+--+
+        eth1| 172.16.1.1|eth2
+            |   (br0)   |
+            |           |
+  172.16.1.4|           |172.16.1.2
+ +----------+---+    +--+------------+
+ |test computer/|    |    desktop    |
+ |backup server |    +---------------+
+ +--------------+</PRE>
+<P>
+With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all
+three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge
+interface br0 with ports eth1 and eth2.</P>
+<P class=case>More details:</P>
+<P>
+The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using
+masquerading. Here's the "script" I use at bootup for the bridge/router:</P>
+<PRE>
+iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT
+iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE
+insmod ebtables
+insmod ebtable_filter
+insmod ebtable_nat
+insmod ebt_nat
+insmod ebt_log
+insmod ebt_arp
+insmod ebt_ip
+insmod br_db
+brctl addbr br0
+brctl stp br0 off
+brctl addif br0 eth1
+brctl addif br0 eth2
+ifconfig eth1 0 0.0.0.0
+ifconfig eth2 0 0.0.0.0
+ifconfig br0 172.16.1.1 netmask 255.255.255.0 up
+echo '1' > /proc/sys/net/ipv4/ip_forward</PRE>
+<P>
+The catch is in the first line. Because the <EM>iptables</EM> code gets executed for both bridged packets and routed
+packets we need to make a distinction between the two. We don't really want the bridged packets to be
+masqueraded. If we omit the first line then everything will work too, but things will happen differently.
+Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1
+port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and
+172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from
+172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the
+first line of the script to avoid this. Note that if I wanted to filter the connections to and from the
+internet, I would certainly need the first line so I don't filter the local connections as well.</P>
+<P class=section>
+5. ip DNAT in the <EM>iptables</EM> PREROUTING chain on frames/packets entering on a bridge port:</P>
+<P>Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing
+have the same output device as the input device they came on (the logical bridge device which we like to call br0)
+will be bridged, not routed. So they will go through the <EM>ebtables</EM> FORWARD chain. All other DNAT'ed packets will be
+routed, so won't go through the <EM>ebtables</EM> FORWARD chain, will go through the <EM>ebtables</EM> INPUT chain and might go
+through the <EM>ebtables</EM> OUTPUT chain.</P>
+<P>
+Released under the GPL.</P>
+<P>
+Bart De Schuymer.</P>
+<P>
+Last updated the 19th May 2002.</P>
+</BODY></HTML>
+\ No newline at end of file
author	Bart De Schuymer <bdschuym@pandora.be>	2002-06-02 14:02:18 +0000
committer	Bart De Schuymer <bdschuym@pandora.be>	2002-06-02 14:02:18 +0000
commit	08934e3091865e6a4165702401f032e8327e380e (patch)
tree	09d072080007dc6ec5ded07133c65bde65728c45 /docs
parent	8ecde2955fcc739657e95f87e53e74092873c892 (diff)