diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/helper/conntrackd.conf | 22 | ||||
-rw-r--r-- | doc/manual/conntrack-tools.tmpl | 214 | ||||
-rw-r--r-- | doc/misc/README | 187 | ||||
-rw-r--r-- | doc/misc/clusterip.sh | 254 |
4 files changed, 569 insertions, 108 deletions
diff --git a/doc/helper/conntrackd.conf b/doc/helper/conntrackd.conf index 6ffe008..efa318a 100644 --- a/doc/helper/conntrackd.conf +++ b/doc/helper/conntrackd.conf @@ -3,11 +3,21 @@ # Helper { - # Before this, you have to make sure you have registered the `ftp' - # user-space helper stub via: + # + # Set up the userspace helpers when the daemon is started. If unset, + # you have manually set up the user-space helper stub, e.g. # # nfct add helper ftp inet tcp # + # This new setting simplifies new deployment, so it is recommended to + # turn it on. On existing deployments, make sure to remove the nfct + # command invocation since it is not required anymore. + # + # Default: no (for backward compatibility reasons) + # Recommended: yes + # + Setup yes + Type ftp inet tcp { # # Set NFQUEUE number you want to use to receive traffic from @@ -73,7 +83,7 @@ Helper { } } Type mdns inet udp { - QueueNum 6 + QueueNum 5 QueueLen 10240 Policy mdns { ExpectMax 8 @@ -81,7 +91,7 @@ Helper { } } Type ssdp inet udp { - QueueNum 5 + QueueNum 6 QueueLen 10240 Policy ssdp { ExpectMax 8 @@ -89,7 +99,7 @@ Helper { } } Type ssdp inet tcp { - QueueNum 5 + QueueNum 7 QueueLen 10240 Policy ssdp { ExpectMax 8 @@ -97,7 +107,7 @@ Helper { } } Type slp inet udp { - QueueNum 7 + QueueNum 8 QueueLen 10240 Policy slp { ExpectMax 8 diff --git a/doc/manual/conntrack-tools.tmpl b/doc/manual/conntrack-tools.tmpl index 739b7f1..822dd49 100644 --- a/doc/manual/conntrack-tools.tmpl +++ b/doc/manual/conntrack-tools.tmpl @@ -19,7 +19,7 @@ </authorgroup> <copyright> - <year>2008-2012</year> + <year>2008-2020</year> <holder>Pablo Neira Ayuso</holder> </copyright> @@ -35,10 +35,8 @@ </legalnotice> <releaseinfo> - This document details how to install and configure the - <ulink url="http://conntrack-tools.netfilter.org">conntrack-tools</ulink> - >= 1.4.0. This document will evolve in the future to cover new features - and changes.</releaseinfo> + This document details how to install and to configure the <ulink url="http://conntrack-tools.netfilter.org">conntrack-tools</ulink>. + </releaseinfo> </bookinfo> @@ -46,21 +44,13 @@ <chapter id="introduction"><title>Introduction</title> - <para>This document should be a kick-off point to install and configure the - <ulink url="http://conntrack-tools.netfilter.org">conntrack-tools</ulink>. - If you find any error or imprecision in this document, please send an email - to the author, it will be appreciated.</para> +<para>This documentation provides a description on how to install and to configure the <ulink url="http://conntrack-tools.netfilter.org">conntrack-tools</ulink>.</para> - <para>In this document, the author assumes that the reader is familiar with firewalling concepts and iptables in general. If this is not your case, I suggest you to read the iptables documentation before going ahead. Moreover, the reader must also understand the difference between <emphasis>stateful</emphasis> and <emphasis>stateless</emphasis> firewalls. If this is not your case, I strongly suggest you to read the article <ulink url="http://people.netfilter.org/pablo/docs/login.pdf">Netfilter's Connection Tracking System</ulink> published in <emphasis>:login; the USENIX magazine</emphasis>. That document contains a general description that should help to clarify the concepts.</para> - -<para>If you do not fulfill the previous requirements, this documentation is likely to be a source of frustration. Probably, you wonder why I'm insisting on these prerequisites too much, the fact is that if your iptables rule-set is <emphasis>stateless</emphasis>, it is very likely that the <emphasis>conntrack-tools</emphasis> will not be of any help for you. You have been warned!</para> +<para>This documentation assumes that the reader is familiar with basic firewalling and Netfilter concepts. You also must understand the difference between <emphasis>stateless</emphasis> and <emphasis>stateful</emphasis> firewalls. Otherwise, please read <ulink url="http://people.netfilter.org/pablo/docs/login.pdf">Netfilter's Connection Tracking System</ulink> published in <emphasis>:login; the USENIX magazine</emphasis> for a quick reference.</para> </chapter> <chapter id="what"><title>What are the conntrack-tools?</title> - <para>The conntrack-tools are a set of free software tools for GNU/Linux that allow system administrators interact, from user-space, with the in-kernel <ulink url="http://people.netfilter.org/pablo/docs/login.pdf">Connection Tracking System</ulink>, which is the module that enables stateful packet inspection for iptables. Probably, you did not hear about this module so far. However, if any of the rules of your rule-set use the <emphasis>state</emphasis> or <emphasis>ctstate</emphasis> iptables matches, you are indeed using it. - </para> - <para>The <ulink url="http://conntrack-tools.netfilter.org">conntrack-tools</ulink> package contains two programs:</para> <itemizedlist> @@ -72,17 +62,18 @@ </listitem> </itemizedlist> - <para>Although the name of both tools is very similar - and you can blame me for that, I'm not a marketing guy - they are used for very different tasks.</para> +<para>Mind the trailing <emphasis>d</emphasis> that refers to either the command line utility or the daemon.</para> </chapter> <chapter id="requirements"><title>Requirements</title> - <para>You have to install the following software in order to get the <emphasis>conntrack-tools</emphasis> working. Make sure that you have installed them correctly before going ahead:</para> +<para>If you are using the Linux kernel that your distribution provides, then you most likely can skip this.</para> + +<para>If you compile your own Linux kernel, then please make sure the following options are enabled.</para> + +<para>You require a <ulink url="http://www.kernel.org">Linux kernel</ulink> version >= 2.6.18.</para> - <itemizedlist> - <listitem> - <para><ulink url="http://www.kernel.org">Linux kernel</ulink> version >= 2.6.18 that, at least, has support for:</para> <itemizedlist> <listitem> <para>Connection Tracking System.</para> @@ -123,19 +114,47 @@ </itemizedlist> </listitem> </itemizedlist> - <note><title>Verifying kernel support</title> - <para> - Make sure you have loaded <emphasis>nf_conntrack</emphasis>, <emphasis>nf_conntrack_ipv4</emphasis> (if your setup also supports IPv6, <emphasis>nf_conntrack_ipv6</emphasis>) and <emphasis>nf_conntrack_netlink</emphasis>. - </para> - </note> - </listitem> + +<note><title>Validating Linux kernel support</title> +<para>You can validate that your Linux kernel support for the <emphasis>conntrack-tools</emphasis> through <emphasis>modinfo</emphasis>.</para> + + <programlisting> + # modinfo nf_conntrack +filename: /lib/modules/5.2.0/kernel/net/netfilter/nf_conntrack.ko +license: GPL +alias: nf_conntrack-10 +alias: nf_conntrack-2 +alias: ip_conntrack +depends: nf_defrag_ipv6,libcrc32c,nf_defrag_ipv4 +retpoline: Y +intree: Y +name: nf_conntrack +vermagic: 5.7.0+ SMP preempt mod_unload modversions +parm: tstamp:Enable connection tracking flow timestamping. (bool) +parm: acct:Enable connection tracking flow accounting. (bool) +parm: nf_conntrack_helper:Enable automatic conntrack helper assignment (default 0) (bool) +parm: expect_hashsize:uint +parm: enable_hooks:Always enable conntrack hooks (bool) +</programlisting> + +<para>Make sure <emphasis>nf_conntrack_netlink</emphasis> is also available.</para> +</note> + +<para>You also need to install the following library dependencies:</para> + + <itemizedlist> <listitem> - <para>libnfnetlink: the netfilter netlink library use the official release available in <ulink url="http://www.netfilter.org">netfilter.org</ulink></para> + <para>libnfnetlink: the netfilter netlink library use the official release available in <ulink url="http://www.netfilter.org/projects/libnfnetlink">netfilter.org</ulink></para> </listitem> <listitem> - <para>libnetfilter_conntrack: the netfilter netlink library use the official release available in <ulink url="http://www.netfilter.org">netfilter.org</ulink></para> + <para>libnetfilter_conntrack: the netfilter netlink library use the official release available in <ulink url="http://www.netfilter.org/projects/libnetfilter_conntrack">netfilter.org</ulink></para> </listitem> </itemizedlist> + +<note><title>Installing library dependencies</title> +<para>Your distribution most likely also provides packages for this software, so you do not have to compile it yourself.</para> +</note> + </chapter> <chapter id="Installation"><title>Installation</title> @@ -148,18 +167,8 @@ (non-root)$ make (root) # make install</programlisting> -<note><title>Fedora Users</title> - <para>If you are installing the libraries in /usr/local/, do not forget to do the following things:</para> - <itemizedlist> - <listitem><para>PKG_CONFIG_PATH=/usr/local/lib/pkgconfig; export PKG_CONFIG_PATH</para></listitem> - <listitem><para>Add `/usr/local/lib' to your /etc/ld.so.conf file and run `ldconfig'</para></listitem> - </itemizedlist> - <para>Check `ldd' for trouble-shooting, read <ulink url="http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html">this</ulink> for more information on how libraries work.</para> -</note> - -<note><title>Verifying kernel support</title> - <para>To check that the modules are enabled in the kernel, run <emphasis>`conntrack -E'</emphasis> and generate traffic, you should see flow events reporting new connections and updates. - </para> +<note><title>Installing conntrack and conntrackd</title> +<para>Your distribution most likely also provides packages for this software, so you do not have to compile it yourself.</para> </note> </chapter> @@ -174,7 +183,7 @@ tcp 6 431698 ESTABLISHED src=192.168.2.100 dst=123.59.27.117 sport=34849 dport=993 packets=244 bytes=18723 src=123.59.27.117 dst=192.168.2.100 sport=993 dport=34849 packets=203 bytes=144731 [ASSURED] mark=0 use=1 </programlisting> -<para>The command line tool <emphasis>conntrack</emphasis> can be used to display the same information:</para> +<para>You can list the existing flows using the <emphasis>conntrack</emphasis> utility via <emphasis>-L</emphasis> command:</para> <programlisting> # conntrack -L tcp 6 431982 ESTABLISHED src=192.168.2.100 dst=123.59.27.117 sport=34846 dport=993 packets=169 bytes=14322 src=123.59.27.117 dst=192.168.2.100 sport=993 dport=34846 packets=113 bytes=34787 [ASSURED] mark=0 use=1 @@ -182,25 +191,23 @@ conntrack v1.4.6 (conntrack-tools): 2 flow entries have been shown. </programlisting> -<para>You can natively filter the output without using <emphasis>grep</emphasis>:</para> + <para>The <emphasis>conntrack</emphasis> syntax is similar to <emphasis>iptables</emphasis>.</para> + +<para>You can filter out the listing without using <emphasis>grep</emphasis>:</para> <programlisting> # conntrack -L -p tcp --dport 993 tcp 6 431982 ESTABLISHED src=192.168.2.100 dst=123.59.27.117 sport=34846 dport=993 packets=169 bytes=14322 src=123.59.27.117 dst=192.168.2.100 sport=993 dport=34846 packets=113 bytes=34787 [ASSURED] mark=0 use=1 conntrack v1.4.6 (conntrack-tools): 1 flow entries have been shown. </programlisting> -<para>Update the mark based on a selection, this allows you to change the mark of an entry without using the CONNMARK target:</para> +<para>You can update the ct mark, extending the previous example:</para> <programlisting> # conntrack -U -p tcp --dport 993 --mark 10 tcp 6 431982 ESTABLISHED src=192.168.2.100 dst=123.59.27.117 sport=34846 dport=993 packets=169 bytes=14322 src=123.59.27.117 dst=192.168.2.100 sport=993 dport=34846 packets=113 bytes=34787 [ASSURED] mark=10 use=1 conntrack v1.4.6 (conntrack-tools): 1 flow entries have been updated. </programlisting> -<para>Delete one entry, this can be used to block traffic if:</para> -<itemizedlist> - <listitem><para>You have a stateful rule-set that blocks traffic in INVALID state.</para></listitem> - <listitem><para>You set <emphasis>/proc/sys/net/netfilter/nf_conntrack_tcp_loose</emphasis> to zero.</para></listitem> -</itemizedlist> +<para>You can also delete entries</para> <programlisting> # conntrack -D -p tcp --dport 993 @@ -208,7 +215,14 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been updated. conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. </programlisting> -<para>Display the connection tracking events:</para> +<para> +This allows you to block TCP traffic if:</para> +<itemizedlist> + <listitem><para>You have a stateful rule-set that drops traffic in INVALID state.</para></listitem> + <listitem><para>You set <emphasis>/proc/sys/net/netfilter/nf_conntrack_tcp_loose</emphasis> to zero.</para></listitem> +</itemizedlist> + +<para>You can also listen to the connection tracking events:</para> <programlisting> # conntrack -E [NEW] udp 17 30 src=192.168.2.100 dst=192.168.2.1 sport=57767 dport=53 [UNREPLIED] src=192.168.2.1 dst=192.168.2.100 sport=53 dport=57767 @@ -218,20 +232,23 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.2.100 dst=66.102.9.104 sport=33379 dport=80 src=66.102.9.104 dst=192.168.2.100 sport=80 dport=33379 [ASSURED] </programlisting> -<para>You can also display the existing flows in XML format, filter the output based on the NAT handling applied, etc.</para> +<para>There are many options, including support for XML output, more advanced filters, and so on. Please check the manpage for more information.</para> </chapter> <chapter id="settingup"><title>Setting up conntrackd: the daemon</title> - <para>The daemon <emphasis>conntrackd</emphasis> supports two working modes:</para> + <para>The <emphasis>conntrackd</emphasis> daemon supports three modes:</para> - <itemizedlist> + <itemizedlist> <listitem> - <para><emphasis>State table synchronization</emphasis>: the daemon can be used to synchronize the connection tracking state table between several firewall replicas. This can be used to deploy fault-tolerant stateful firewalls. This is the main feature of the daemon.</para> + <para><emphasis>State table synchronization</emphasis>, to synchronize the connection tracking state table between several firewalls in High Availability (HA) scenarios.</para> </listitem> <listitem> - <para><emphasis>Flow-based statistics collection</emphasis>: the daemon can be used to collect flow-based statistics. This feature is similar to what <ulink url="http://www.netfilter.org/projects/ulogd/">ulogd-2.x</ulink> provides.</para> + <para><emphasis>Userspace connection tracking helpers</emphasis>, for layer 7 Application Layer Gateway (ALG) such as DHCPv6, MDNS, RPC, SLP and Oracle TNS. As an alternative to the in-kernel connection tracking helpers that are available in the Linux kernel.</para> + </listitem> + <listitem> + <para><emphasis>Flow-based statistics collection</emphasis>, to collect flow-based statistics as an alternative to <ulink url="http://www.netfilter.org/projects/ulogd/">ulogd2</ulink>, although <emphasis>ulogd2</emphasis> allows for more flexible statistics collection.</para> </listitem> </itemizedlist> @@ -239,15 +256,12 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. <sect2 id="sync-requirements"><title>Requirements</title> - <para>In order to get <emphasis>conntrackd</emphasis> working in synchronization mode, you have to fulfill the following requirements:</para> + <para>If you would like to configure <emphasis>conntrackd</emphasis> to work in state synchronization mode, then you require:</para> <orderedlist> <listitem> - <para>A <emphasis>high availability manager</emphasis> like <ulink url="http://www.keepalived.org">keepalived</ulink> that manages the virtual IPs of the - firewall cluster, detects errors, and decide when to migrate the virtual IPs - from one firewall replica to another. Without it, <emphasis>conntrackd</emphasis> will not work appropriately.</para> - <para>The state synchronization setup requires a working installation of <ulink url="http://www.keepalived.org">keepalived</ulink>, preferibly a recent version. Check if your distribution comes with a recent packaged version. Otherwise, you may compile it from the sources. + <para>A working installation of <ulink url="http://www.keepalived.org">keepalived</ulink>, preferibly a recent version. Check if your distribution comes with a recent packaged version. Otherwise, you may compile it from the sources. </para> <para> @@ -342,7 +356,7 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. </sect2> -<sect2 id="sync-pb"><title>Active-Backup setup</title> +<sect2 id="sync-pb"><title>Active-Backup setups</title> <note><title>Stateful firewall architectures</title> <para>A good reading to extend the information about firewall architectures is <ulink url="http://1984.lsi.us.es/~pablo/docs/intcomp09.pdf">Demystifying cluster-based fault-tolerant firewalls</ulink> published in IEEE Internet Computing magazine. @@ -380,19 +394,19 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. </sect2> -<sect2 id="sync-aa"><title>Active-Active setup</title> +<sect2 id="sync-aa"><title>Active-Active setups</title> <para>The Active-Active setup consists of having more than one stateful - firewall replicas actively filtering traffic. Thus, we reduce the resource - waste that implies to have a backup firewall which does nothing.</para> + firewall actively filtering traffic. Thus, we reduce the resource + waste that implies to have a backup firewall which is spare.</para> <para>We can classify the type of Active-Active setups in several families:</para> <itemizedlist> <listitem> - <para><emphasis>Symmetric path routing</emphasis>: The stateful firewall - replicas share the workload in terms of flows, ie. the packets that are + <para><emphasis>Symmetric path routing</emphasis>: The stateful firewalls + share the workload in terms of flows, ie. the packets that are part of a flow are always filtered by the same firewall.</para> </listitem> <listitem> @@ -406,24 +420,20 @@ conntrack v1.4.6 (conntrack-tools): 1 flow entries have been deleted. </listitem> </itemizedlist> - <para>As for 0.9.8, the design of <emphasis>conntrackd</emphasis> allows you - to deploy an symmetric Active-Active setup based on a static approach. - For example, assume that you have two virtual IPs, vIP1 and vIP2, and two - firewall replicas, FW1 and FW2. You can give the virtual vIP1 to the - firewall FW1 and the vIP2 to the FW2. + <para><emphasis>conntrackd</emphasis> allows you to deploy an symmetric +Active-Active setup based on a static approach. For example, assume that you +have two virtual IPs, vIP1 and vIP2, and two firewall replicas, FW1 and FW2. +You can give the virtual vIP1 to the firewall FW1 and the vIP2 to the FW2. </para> - <para>Unfortunately, you will have to wait for the support for the - Active-Active setup based on dynamic approach, ie. a workload sharing setup - without directors that allow the stateful firewall share the filtering.</para> - - <para>On the other hand, the asymmetric scenario may work if your setup - fulfills several strong assumptions. However, in the opinion of the author - of this work, the asymmetric setup goes against the design of stateful - firewalls and <emphasis>conntrackd</emphasis>. Therefore, you have two - choices here: you can deploy an Active-Backup setup or go back to your - old stateless rule-set (in that case, the conntrack-tools will not be - of any help anymore, of course).</para> + <para>The asymmetric path scenario is hard: races might occurs between state + synchronization and packet forwarding. If you would like to deploy an + Active-Active setup with an assymmetic multi-path routing configuration, + then, make sure the same firewall <emphasis>forwards</emphasis> packets + coming in the original and the reply directions. If you cannot guarantee + this and you still would like to deply an Active-Active setup, then you + might have to consider downgrading your firewall ruleset policy to stateless +filtering.</para> </sect2> @@ -895,32 +905,13 @@ maintainance.</para></listitem> <para>The following steps describe how to enable the RPC portmapper helper for NFSv3 (this is similar for other helpers):</para> <orderedlist> -<listitem><para>Register user-space helper: - -<programlisting> -nfct add helper rpc inet udp -nfct add helper rpc inet tcp -</programlisting> - -This registers the portmapper helper for both UDP and TCP (NFSv3 traffic goes both over TCP and UDP). -</para></listitem> - -<listitem><para>Add iptables rule using the CT target: - -<programlisting> -# iptables -I OUTPUT -t raw -p udp --dport 111 -j CT --helper rpc -# iptables -I OUTPUT -t raw -p tcp --dport 111 -j CT --helper rpc -</programlisting> - -With this, packets matching port TCP/UDP/111 are passed to user-space for -inspection. If there is no instance of conntrackd configured to support -user-space helpers, no inspection happens and packets are not sent to -user-space.</para></listitem> <listitem><para>Add configuration to conntrackd.conf: <programlisting> Helper { + Setup yes + Type rpc inet udp { QueueNum 1 QueueLen 10240 @@ -952,6 +943,25 @@ for inspection to user-space</para> </listitem> +<listitem><para>Run conntrackd: +<programlisting> +# conntrackd -d -C /path/to/conntrackd.conf +</programlisting> +</para> +</listitem> + +<listitem><para>Add iptables rule using the CT target: + +<programlisting> +# iptables -I OUTPUT -t raw -p udp --dport 111 -j CT --helper rpc +# iptables -I OUTPUT -t raw -p tcp --dport 111 -j CT --helper rpc +</programlisting> + +With this, packets matching port TCP/UDP/111 are passed to user-space for +inspection. If there is no instance of conntrackd configured to support +user-space helpers, no inspection happens and packets are not sent to +user-space.</para></listitem> + </orderedlist> <para>Now you can test this (assuming you have some working NFSv3 setup) with: diff --git a/doc/misc/README b/doc/misc/README new file mode 100644 index 0000000..7d0a1ae --- /dev/null +++ b/doc/misc/README @@ -0,0 +1,187 @@ += Setting up active-active load-sharing hash-based stateful firewall = + by Pablo Neira Ayuso <pablo@netfilter.org> in 2010 + +If you want to know more about this configuration and other firewall +architectures, please read: + +* Demystifying cluster-based fault-tolerant firewalls. + IEEE Internet Computing, 13(6):31-38, December 2009. + Available at: https://perso.ens-lyon.fr/laurent.lefevre/pdf/IC2009_Neira_Gasca_Lefevre.pdf + +== 0x0 intro == + +Under this directory you can find a script that allows you to setup a simple +active-active hash-based load-sharing firewall cluster based on the iptables' +cluster match. + +== 0x1 testbed == + +My testbed looks like the following: + + ---------- eth1 eth2 ---------- + client A ------| |--- firewall 1 ----| | + (192.168.0.2) | switch | (.0.5) (.1.5) | switch |--- server + | | | | (192.168.1.2) + client B ------| |--- firewall 2 ----| | + (192.168.0.11) ---------- (.0.5) (.1.5) ---------- + eth1 eth2 + +The firewalls perform SNAT to masquerade clients. Note that both cluster +firewall have the same IP addresses. For administrative purposes, it is +a good idea that each firewall has its one IP address to SSH them, make +sure you add the appropriate rule to skip the cluster match rule-set! +More comments: although the picture shows two switches, I'm actually +using one and I separated the clients and the server in two different +VLANs. + +The script also sets a multicast MAC address that is the same for both +firewalls so that the switch floods the same packets to both firewalls. +Using a multicast MAC address is a RFC violation [1], since network node +must not include multicast MAC address in ARP replies, but: + + a) it is the only way I found so far to obtain the behaviour from my + HP procurve switches. + + b) the VRRP MAC address range is not supported appropritely by switch + vendors, at least by my HP procurve switches. If switch vendors + support this MAC address range appropriately, they will handle them + as multicast MAC address. As of 2011 I did not find any switch handling + VRRP MAC address range as multicast ports (they still handle them as + normal unicast MAC addresses, therefore my solution does not work with + two nodes with the same VRRP MAC address). + +The cluster match relies upon the Connection Tracking System (conntrack). +Thus, traffic coming in the reply direction which does not belong this node +is labeled as INVALID for TCP and ICMP protocols. The scripts add a rule to +drop this traffic to avoid possible packet duplication. For UDP traffic, +you will have to add a rule to drop NEW traffic in the reply direction +because conntrack considers it valid. If you don't do this, both nodes +may accept reply traffic, thus, sending duplicated packets to the client, +which is not what you want. + +During my last experiments, I was using the Linux kernel 2.6.37 in the +firewalls and the server. Everything you need to setup this configuration +is available in stock Linux kernels. No external patches with new features +are required. + +== 0x2 running scripts == + +Copy the script to each node, then adjust the script variables to your +configuration. + +On firewall 1: +firewall1# ./clusterip-node1.sh start + +On firewall 2: +firewall2# ./clusterip-node2.sh start + +== 0x3 trouble-shooting == + +Some troubleshooting may help to understand how this setup works. Check +the following if you experience problems: + +1) Check that Multicast MAC address are assigned to the NICs: + +firewall1$ ip maddr +[...] +2: eth1 +[...] + link 01:00:5e:00:01:01 static +3: eth2 +[...] + link 01:00:5e:00:01:02 static + +The scripts add the multicast MAC addresses to the NICs, if this +is not done the traffic will be discarded by the firewalls' +networking stack. + +2) ICMP ping the server from one the clients: + +client$ ping -c 1 192.168.1.2 +PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. +64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.220 ms + +--- 192.168.1.2 ping statistics --- +1 packets transmitted, 1 received, 0% packet loss, time 0ms +rtt min/avg/max/mdev = 0.220/0.220/0.220/0.000 ms + +If this does not work, make sure the firewalls are including the +multicast MAC address in their ARP replies, you can check this +by looking at the neigbour cache: + +client$ ip neighbour +[...] +192.168.0.5 dev eth1 lladdr 01:00:5e:00:01:01 REACHABLE + +server$ ip neighbour +[...] +192.168.1.5 dev eth1 lladdr 01:00:5e:00:01:02 REACHABLE + +firewall$ ip neighbour +[...] +192.168.0.5 dev eth1 lladdr 01:00:5e:00:01:01 REACHABLE +192.168.1.5 dev eth2 lladdr 01:00:5e:00:01:02 REACHABLE + +3) Test TCP connections: you can use netcat to start simple connections +between the client and the server. + +You can also use intensive HTTP traffic generation to test performance +like injectX.c and httpterm from Willy Tarreau: + +http://1wt.eu/tools/inject/ +http://1wt.eu/tools/httpterm/ + +clientA:~/http-client-benchmark# ./client -t 60 -u 200 -G 192.168.1.2:8000 +# hits hits/s ^h/s ^bytes kB/s errs rst tout mhtime + 266926 26692 26766 3881270 3779 0 0 0 0.237 + 294067 26733 27141 3935621 3785 0 0 0 0.176 + +clientB~/http-client-benchmark# ./client -t 30 -u 40 -G 192.168.1.2:8020 +# hits hits/s ^h/s ^bytes kB/s errs rst tout mhtime + 53250 17750 17368 2518448 2513 0 0 0 0.240 + 70766 17691 17516 2539907 2505 0 0 0 0.297 + +^h/s is the current number of HTTP petitions per second. This means +that you get ~45000 HTTP petitions per second. In my setup, with only +one firewall active I get ~27000 HTTP petitions per second. We obtain +extra performance of ~66%, not that bad 8-). + +I have configured httpterm to send object of 0 bytes over HTTP +to obtain the maximum number of HTTP flows. This is the worst case +scenario in firewall load. + +I forgot to mention that I set CPU affinity for NICs IRQs. I've got +two cores, one for each firewall NIC. + +== 0x4 report sucessful setups == + +My testbed is composed of low-cost basic five years old HP proliant +systems, you can see that the numbers are not great. I like knowing +about numbers, I'd appreciate if you drop me a line to tell me the +numbers that you get and your experience. + +== 0x5 conclusions and future works == + +The cluster match allows to setup load-sharing hash-based stateful +firewalls that is a way to avoid having a spare backup firewall as +it happens in classical Primary-Backup setups. + +Still, there is some pending work to fully integrate conntrackd and HA +managers with it (in case that you want high availability, of course). + +-o- + +[1] More specifically, it's a RFC 1812 (section 3.3.2) violation. +It's been reported that this is a problem for CISCO routers: +http://marc.info/?l=netfilter&m=128810399113170&w=2 + +Michele Codutti: "The problem is the multicast MAC address that these +routers doesn't "like". They discard any incoming packet with MAC +multicast address to be compliant with RFC1812. The only documented +(by Cisco) workaround is to put a fixed arp entry with the multicast +address that maps the clustered IP in the router." + +If you keep reading the mailing thread, the reported problem affected +Cisco 7200 VXR. + +--02/02/2010 diff --git a/doc/misc/clusterip.sh b/doc/misc/clusterip.sh new file mode 100644 index 0000000..911f676 --- /dev/null +++ b/doc/misc/clusterip.sh @@ -0,0 +1,254 @@ +#!/bin/sh + +# +# (C) 2009-2011 by Pablo Neira Ayuso <pneira@us.es> +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# + +# +# Here, you can find the variables that you have to change. +# + +# enable this for debugging +LOG_DEBUG=0 + +# number of cluster node (must be unique, from 1 to N cluster nodes) +NODE=1 + +# this is the real MAC address of eth1 +REAL_HWADDR1=00:18:71:68:f2:34 + +# this is the real MAC address of eth2 +REAL_HWADDR2=00:11:0a:60:e7:32 + +# +# These variables MUST have the same values in both cluster nodes +# + +# number of nodes that belong this cluster +TOTAL_NODES=2 + +# this is the cluster multicast MAC address of eth1 +MC_HWADDR1=01:00:5e:00:01:01 + +# this is the cluster multicast MAC address of eth2 +MC_HWADDR2=01:00:5e:00:01:02 + +# cluster IP address of eth1 +ADDR1=192.168.0.5/24 + +# cluster IP address of eth2 +ADDR2=192.168.1.5/24 + +# random seed for hashing +SEED=0xdeadbeef + +start_cluster_address() +{ + # set cluster IP addresses + ip a a $ADDR1 dev eth1 + ip a a $ADDR2 dev eth2 + # set cluster multicast MAC addresses + ip maddr add $MC_HWADDR1 dev eth1 + ip maddr add $MC_HWADDR2 dev eth2 + # mangle ARP replies to include the cluster multicast MAC addresses + arptables -I OUTPUT -o eth1 --h-length 6 \ + -j mangle --mangle-mac-s $MC_HWADDR1 + # mangle ARP request to use the original MAC address (otherwise the + # stack drops this packet). + arptables -I INPUT -i eth1 --h-length 6 --destination-mac \ + $MC_HWADDR1 -j mangle --mangle-mac-d $REAL_HWADDR1 + arptables -I OUTPUT -o eth2 --h-length 6 \ + -j mangle --mangle-mac-s $MC_HWADDR2 + arptables -I INPUT -i eth2 --h-length 6 --destination-mac \ + $MC_HWADDR2 -j mangle --mangle-mac-d $REAL_HWADDR2 +} + +stop_cluster_address() +{ + # delete cluster IP addresses + ip a d $ADDR1 dev eth1 + ip a d $ADDR2 dev eth2 + # delete cluster multicast MAC addresses + ip maddr del $MC_HWADDR1 dev eth1 + ip maddr del $MC_HWADDR2 dev eth2 + # delete ARP replies mangling + arptables -D OUTPUT -o eth1 --h-length 6 \ + -j mangle --mangle-mac-s $MC_HWADDR1 + # delete ARP requests mangling + arptables -D INPUT -i eth1 --h-length 6 --destination-mac \ + $MC_HWADDR1 -j mangle --mangle-mac-d $REAL_HWADDR1 + arptables -D OUTPUT -o eth2 --h-length 6 \ + -j mangle --mangle-mac-s $MC_HWADDR2 + arptables -D INPUT -i eth2 --h-length 6 --destination-mac \ + $MC_HWADDR2 -j mangle --mangle-mac-d $REAL_HWADDR2 +} + +start_nat() +{ + iptables -A POSTROUTING -t nat -s 192.168.0.11 \ + -j SNAT --to-source 192.168.1.5 + iptables -A POSTROUTING -t nat -s 192.168.0.2 \ + -j SNAT --to-source 192.168.1.5 +} + +stop_nat() +{ + iptables -D POSTROUTING -t nat -s 192.168.0.11 \ + -j SNAT --to-source 192.168.1.5 + iptables -D POSTROUTING -t nat -s 192.168.0.2 \ + -j SNAT --to-source 192.168.1.5 +} + +iptables_start_cluster_rules() +{ + # mark packets that belong to this node (go direction) + iptables -A CLUSTER-RULES -t mangle -i eth1 -m cluster \ + --cluster-total-nodes $TOTAL_NODES --cluster-local-node $1 \ + --cluster-hash-seed $SEED -j MARK --set-mark 0xffff + + # mark packet that belong to this node (reply direction) + # note: we *do* need this to change the packet type to PACKET_HOST, + # otherwise the stack silently drops the packet. + iptables -A CLUSTER-RULES -t mangle -i eth2 -m cluster \ + --cluster-total-nodes $TOTAL_NODES --cluster-local-node $1 \ + --cluster-hash-seed $SEED -j MARK --set-mark 0xffff +} + +iptables_stop_cluster_rules() +{ + iptables -D CLUSTER-RULES -t mangle -i eth1 -m cluster \ + --cluster-total-nodes $TOTAL_NODES --cluster-local-node $1 \ + --cluster-hash-seed $SEED -j MARK --set-mark 0xffff + + iptables -D CLUSTER-RULES -t mangle -i eth2 -m cluster \ + --cluster-total-nodes $TOTAL_NODES --cluster-local-node $1 \ + --cluster-hash-seed $SEED -j MARK --set-mark 0xffff +} + +start_cluster_ruleset() { + iptables -N CLUSTER-RULES -t mangle + + iptables_start_cluster_rules $NODE + + iptables -A PREROUTING -t mangle -j CLUSTER-RULES + + if [ $LOG_DEBUG -eq 1 ] + then + iptables -A PREROUTING -t mangle -i eth1 -m mark \ + --mark 0xffff -j LOG --log-prefix "cluster-accept: " + iptables -A PREROUTING -t mangle -i eth1 -m mark \ + ! --mark 0xffff -j LOG --log-prefix "cluster-drop: " + iptables -A PREROUTING -t mangle -i eth2 -m mark \ + --mark 0xffff \ + -j LOG --log-prefix "cluster-reply-accept: " + iptables -A PREROUTING -t mangle -i eth2 -m mark \ + ! --mark 0xffff \ + -j LOG --log-prefix "cluster-reply-drop: " + fi + + # drop packets that don't belong to us (go direction) + iptables -A PREROUTING -t mangle -i eth1 -m mark \ + ! --mark 0xffff -j DROP + + # drop packets that don't belong to us (reply direction) + iptables -A PREROUTING -t mangle -i eth2 -m mark \ + ! --mark 0xffff -j DROP +} + +stop_cluster_ruleset() { + iptables -D PREROUTING -t mangle -j CLUSTER-RULES + + if [ $LOG_DEBUG -eq 1 ] + then + iptables -D PREROUTING -t mangle -i eth1 -m mark \ + --mark 0xffff -j LOG --log-prefix "cluster-accept: " + iptables -D PREROUTING -t mangle -i eth1 -m mark \ + ! --mark 0xffff -j LOG --log-prefix "cluster-drop: " + iptables -D PREROUTING -t mangle -i eth2 -m mark \ + --mark 0xffff \ + -j LOG --log-prefix "cluster-reply-accept: " + iptables -D PREROUTING -t mangle -i eth2 -m mark \ + ! --mark 0xffff \ + -j LOG --log-prefix "cluster-reply-drop: " + fi + + iptables -D PREROUTING -t mangle -i eth1 -m mark \ + ! --mark 0xffff -j DROP + + iptables -D PREROUTING -t mangle -i eth2 -m mark \ + ! --mark 0xffff -j DROP + + iptables_stop_cluster_rules $NODE + + iptables -F CLUSTER-RULES -t mangle + iptables -X CLUSTER-RULES -t mangle +} + +case "$1" in +start) + echo "starting cluster configuration for node $NODE." + + # just in case that you forget it + echo 1 > /proc/sys/net/ipv4/ip_forward + + # disable TCP pickup + echo 0 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal + echo 0 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_loose + + start_cluster_address + start_nat + + # drop invalid flows from eth2 (not allowed). This is mandatory + # because traffic which does not belong to this node is always + # labeled as INVALID by TCP and ICMP state tracking. For protocols like + # UDP, you will have to drop NEW traffic from eth2, otherwise reply + # traffic may be accepted by both nodes, thus duplicating the traffic. + iptables -A PREROUTING -t mangle -i eth2 \ + -m state --state INVALID -j DROP + + start_cluster_ruleset + ;; +stop) + echo "stopping cluster configuration for node $NODE." + + stop_cluster_address + stop_nat + + iptables -D PREROUTING -t mangle -i eth2 \ + -m state --state INVALID -j DROP + + stop_cluster_ruleset + ;; +primary) + logger "cluster-match-script: entering MASTER state for node $2" + if [ -x $CONNTRACKD_SCRIPT ] + then + sh $CONNTRACKD_SCRIPT primary $NODE $2 + fi + iptables_start_cluster_rules $2 + ;; +backup) + logger "cluster-match-script: entering BACKUP state for node $2" + if [ -x $CONNTRACKD_SCRIPT ] + then + sh $CONNTRACKD_SCRIPT backup $NODE $2 + fi + iptables_stop_cluster_rules $2 + ;; +fault) + logger "cluster-match-script: entering FAULT state for node $2" + if [ -x $CONNTRACKD_SCRIPT ] + then + sh $CONNTRACKD_SCRIPT fault $NODE $2 + fi + iptables_stop_cluster_rules $2 + ;; +*) + echo "$0 start|stop|add|del [nodeid]" + ;; +esac |