How bridge/ebtables/iptables interaction works

1. How frames traverse the ebtables chains:

This section only considers ebtables, _not_ iptables.

     Route
       ^
       |
I  +--------+ Bridge  +----------+                     +-------+      +-----------+   O
N->|BROUTING|-------->|PREROUTING|----->[BRIDGING]---->|FORWARD| ---->|POSTROUTING|-->U
   +--------+         +----------+      [DECISION]     +-------+      +-----------+   T
                                             |                              ^ 
                                             v                              |
                                          +-----+                      +----------+
                                          |INPUT|                      |OUTPUT (2)|
                                          +-----+                      +----------+
                                             |                              ^
                                             |                              |
                                             |                         +----------+
                                             |                         +OUTPUT (1)+
                                             |                         +----------+
                                             |                              ^
                                             +------->Local Process---------+

First thing to keep in mind is that we are talking about the ethernet layer here, so the OSI layer 2. A packet destined for the local computer according to the bridge (which works on the ethernet layer) isn't necessarily destined for the local computer according to the ip layer. That's how routing works (MAC destination is the router, ip destination is the actual box you want to communicate with).

Ebtables currently has three tables: filter, nat and broute. The filter table has a FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain. The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2) appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not the same (and have a different intended use).

When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING chain. In this special chain one can choose whether to route or bridge frames. The default is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame passes through the PREROUTING chain. This chain is intended for you to be able to alter the destination MAC address of frames (DNAT). If the frame passes this chain, the bridging code will decide where the frame should be sent. The bridge does this by looking at the destination MAC address, it doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in on non-forwarding ports of a bridge will not be seen by ebtables, not even by the BROUTING chain.

If the bridge decides the frame is for the bridging computer, the frame will go through the INPUT chain. In this chain you can filter frames destined for the bridge box. After passing the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up), e.g. to the ip code. So, a routed ip packet will go through the ebtables INPUT chain, not through the ebtables FORWARD chain. This is logical.

Else the frame should possibly be sent onto another side of the bridge. If it should, the frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one can filter frames that will be bridged, the POSTROUTING chain is intended to be able to change the MAC source address (SNAT).

Frames that originate from the bridge box itself will go, after the bridging decision, through the nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING chain is the same one as described above. Note that it is also possible for routed frames to go through these chains, this is when the destination device is a logical bridge device.

2. A machine used as a bridge and a router (not a brouter):

It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT and POSTROUTING ebtables chains.

This can happen when the bridge is also used as a router. The ethernet frame(s) containing that ip packet will have the bridge's destination MAC address, while the destination ip address is not that of the bridge. Including the iptables chains, this is how the ip packet runs through the bridge/router (eb=ebtables , ip=iptables ):

ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet

This assumes that the routing decision sends the packet to a bridge interface. If the routing decision sends the packet to a physical network card, this is what happens:

ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet

What is obviously "asymmetric" here is that the iptables PREROUTING chain is traversed before the ebtables INPUT chain, however this can not be helped. See the next section.

3. DNATing bridged packets:

Take an ip packet received by the bridge, it enters the bridge code. Lets assume we want to do some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address) has to happen before the bridge code decides what to do with the packet. The bridge code can decide to bridge it (if the destination MAC address is on another side of the bridge), flood it over all the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge), give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the bridge, or ignore it (the destination MAC address is located on the same side of the bridge).

So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code actually does anything. This is at the same place as where the ebtables PREROUTING chain will be traversed (for the same reason).

4. Chain traversal for bridged ip packets:

A bridged packet never enters any network code above layer 2. So a bridged ip packet will never enter the ip code. Therefore all iptables chains will be traversed while the ip packet is in the bridge code. The chain traversal will look like this:

ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING

Once again note that there is a certain form of asymmetry here that cannot be helped.

5. Using a bridge port in iptables rules:

The wish to be able to use physical devices belonging to a bridge (bridge ports) in iptables rules is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If iptables rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip address to that of a box on the eth1 side, except by looking at the MAC source address (and then still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your iptables rules and therefore catch these attempts.

1. iptables wants to use bridge ports:

To make this possible the iptables chains have to be traversed after the bridge code decided where the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented in section 2 (so, we are looking at routed traffic here). It actually looks like this:

ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet

Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel, the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to filter.

ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet

Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.

2. IP DNAT for locally generated packets (so in the iptables nat OUTPUT chain):

The 'normal' way locally generated packets would go through the chains looks like this:

ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING

From the section 5.1 we know that this actually looks like this:

ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING

Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the iptables nat (resp. filter) OUTPUT chain. Note that the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets one use the bridge ports in the ipOUTPUT(2) chain.

6. Two possible ways for frames/packets to pass through the iptables PREROUTING, FORWARD and POSTROUTING chains:

With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given iptables chains. The first way is when the frame is bridged, so the iptables chains are called by the bridge code. The second way is when the packet is routed. So special care has to be taken to distinguish between those two, especially in the iptables FORWARD chain. Here's an example of strange things to look out for:

Consider the following situation (my personal setup)

         +-----------------+
         |   cable modem   |
         +-------+---------+
                 |
                 |
             eth0|IP via DHCP from ISP
         +-------+---------+
         |bridge/router/fw |
         +--+-----------+--+
        eth1| 172.16.1.1|eth2
            |   (br0)   |
            |           |
  172.16.1.4|           |172.16.1.2
 +----------+---+    +--+------------+
 |test computer/|    |    desktop    |
 |backup server |    +---------------+
 +--------------+

With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge interface br0 with ports eth1 and eth2.

More details:

The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using masquerading. Here's the "script" I use at bootup for the bridge/router:

iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT
iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE
insmod ebtables
insmod ebtable_filter
insmod ebtable_nat
insmod ebt_nat
insmod ebt_log
insmod ebt_arp
insmod ebt_ip
insmod br_db
brctl addbr br0
brctl stp br0 off
brctl addif br0 eth1
brctl addif br0 eth2
ifconfig eth1 0 0.0.0.0
ifconfig eth2 0 0.0.0.0
ifconfig br0 172.16.1.1 netmask 255.255.255.0 up
echo '1' > /proc/sys/net/ipv4/ip_forward

The catch is in the first line. Because the iptables code gets executed for both bridged packets and routed packets we need to make a distinction between the two. We don't really want the bridged packets to be masqueraded. If we omit the first line then everything will work too, but things will happen differently. Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1 port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and 172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from 172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the first line of the script to avoid this. Note that if I wanted to filter the connections to and from the internet, I would certainly need the first line so I don't filter the local connections as well.

7. ip DNAT in the iptables PREROUTING chain on frames/packets entering on a bridge port:

Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing have the same output device as the input device they came on (the logical bridge device which we like to call br0) will be bridged, not routed. So they will go through the ebtables FORWARD chain. All other DNAT'ed packets will be routed, so won't go through the ebtables FORWARD chain, will go through the ebtables INPUT chain and might go through the ebtables OUTPUT chain.

8. using the mac module extension for iptables:

The side effect explained here occurs when the br-nf code is compiled in the kernel, the ip packet is routed and the out device for that packet is a logical bridge. The side effect is encountered when filtering on the mac source in the iptables FORWARD chains. As should be clear from earlier sections, the traversal of the iptables FORWARD chains is postponed until the packet is in the bridge code. This is done so one can filter on the bridge port out device. This has a side effect on the MAC source address, because the ip code will have changed the MAC source address to the MAC address of the bridge. It is therefore impossible, in the iptables FORWARD chains, to filter on the MAC source address of the computer sending the packet in question to the bridge/router. If you really need to filter on this MAC source address, you should do it in the nat PREROUTING chain. Agreed, very ugly, but making it possible to filter on the real MAC source address in the FORWARD chains would involve a very dirty hack and is probably not worth it.

Released under the GPL.

Bart De Schuymer.

Last updated June 2nd, 2002.