Scenario / Questions
This issue is driving me crazy. I run a fresh install of Ubuntu 18.04, with:
- ufw to manage the firewall
- a br0 bridge
- lxd and libvirt (KVM)
I tried stock docker.io package and packages form docker’s own deb repository.
I want o be able to deploy docker containers choosing the ip to bind its port (eg. -p 10.58.26.6:98800:98800) and then open the port with UFW.
But docker seems to create iptables rules that pertubates the br0 bridge (eg. host cannot ping libvirt guests)
I have looked all around and cannot find good, security aware solution.
iptables -I FORWARD -i br0 -o br0 -j ACCEPTseems to makes everything work.
"iptables": false for the docker daemon allows the bridge to behave normally, but breaks docker’s containers egress network.
I have found this solution that seemed simple, by editing a single UFW’s file https://stackoverflow.com/a/51741599/1091772, but it doesn’t work at all.
What would be the best practice and secure way of solving this permanently, surviving to reboots ?
I ended up adding
-A ufw-before-forward -i br0 -o br0 -j ACCEPT at the end of
/etc/ufw/before.rules before the COMMIT. Can I consider this as a fix or doesn’t it raise some issues ?
Find below all possible solutions or suggestions for the above questions..
The problem, actually a feature: br_netfilter
From the description, I believe the only logical explanation is that the bridge netfilter code is enabled: intended among other usages for stateful bridge firewalling or for leveraging iptables‘ matches and targets from bridge path without having to (or being able to) duplicate them all in ebtables. Quite disregarding network layering, the ethernet bridge code, at network layer 2, now makes upcalls to iptables working at IP level, ie network layer 3. It can be enabled only globally yet: either for host and every containers, or for none. Once understood what’s going and knowing what to look for, adapted choices can be made.
The netfilter project describes the various
iptables interactions when br_netfilter is enabled. Especially of interest is the section 7 explaining why some rules without apparent effect are sometimes needed to avoid unintended effects from the bridge path, like using:
iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE
to avoid two systems on the same LAN to be NATed by… the bridge (see example below).
You have a few choices to avoid your problem, but the choice you took is probably the best if you don’t want to know all the details nor verify if some iptables rules (sometimes hidden in other namespaces) would be disrupted:
permanently prevent the br_netfilter module to be loaded. Usually
installmust be used. This is a choice prone to issues for applications relying on br_netfilter: obviously Docker, Kubernetes, …
echo install br_netfilter /bin/true > /etc/modprobe.d/disable-br-netfilter.conf
Have the module loaded, but disable its effects. For iptables‘ effects that is:
sysctl -w net.bridge.bridge-nf-call-iptables=0
If putting this at startup, the module should be loaded first or this toggle won’t exist yet.
These two previous choices will for sure disrupt iptables match
-m physdev: The xt_physdev module when itself loaded, auto-loads the br_netfilter module (this would happen even if a rule added from a container triggered the loading). Now br_netfilter won’t be loaded,
-m physdev will probably never match.
Work around br_netfilter’s effect when needed, like OP: add those apparent no-op rules in various chains (PREROUTING, FORWARD, POSTROUTING) as described in section 7. For example:
iptables -t nat -A POSTROUTING -s 172.18.0.0/16 -d 172.18.0.0/16 -j ACCEPT iptables -A FORWARD -i br0 -o br0 -j ACCEPT
Those rules should never match because traffic in the same IP LAN is not routed, except for some rare DNAT setups. But thanks to br_netfilter they do match, because they are first called for switched frames (“upgraded” to IP packets) traversing the bridge. Then they are called again for routed packets traversing the router to an unrelated interface (but won’t match then).
Don’t put an IP on the bridge: put that IP on one end of a
vethinterface with its other end on the bridge: this should ensure that the bridge won’t interact with routing, but that’s not what are doing most container/VM common products.
You can even hide the bridge in its own isolated network namespace (that would only be helpful if wanting to isolate from other ebtables rules this time).
Switch everything to nftables which among stated goals will avoid these bridge interaction issues. For now the bridge firewalling has no stateful support available, it’s still WIP but is promised to be cleaner when available, because there won’t be any “upcall”.
You should search what triggers the loading of br_netfilter (eg:
-m physdev) and see if you can avoid it or not, to choose how to proceed.
Example with network namespaces
Let’s reproduce some effects using a network namespace. Note that nowhere any ebtables rule will be used. Also note that this example relies on the usual legacy
iptables, not iptables over nftables as enabled by default on Debian buster.
Let’s reproduce a simple case similar with many container usages: a router 192.168.0.1/192.0.2.100 doing NAT with two hosts behind: 192.168.0.101 and 192.168.0.102, linked with a bridge on the router. The two hosts can communicate directly on the same LAN, through the bridge.
#!/bin/sh for ns in host1 host2 router; do ip netns del $ns 2>/dev/null || : ip netns add $ns ip -n $ns link set lo up done ip netns exec router sysctl -q -w net.ipv4.conf.default.forwarding=1 ip -n router link add bridge0 type bridge ip -n router link set bridge0 up ip -n router address add 192.168.0.1/24 dev bridge0 for i in 1 2; do ip -n host$i link add eth0 type veth peer netns router port$i ip -n host$i link set eth0 up ip -n host$i address add 192.168.0.10$i/24 dev eth0 ip -n host$i route add default via 192.168.0.1 ip -n router link set port$i up master bridge0 done #to mimic a standard NAT router, iptables rule voluntarily made as it is to show the last "effect" ip -n router link add name eth0 type dummy ip -n router link set eth0 up ip -n router address add 192.0.2.100/24 dev eth0 ip -n router route add default via 192.0.2.1 ip netns exec router iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -j MASQUERADE
Let’s load the kernel module br_netfilter (to be sure it won’t be later) and disable its effects with the (not-per-namespace) toggle bridge-nf-call-iptables, available only in initial namespace:
modprobe br_netfilter sysctl -w net.bridge.bridge-nf-call-iptables=0
Warning: again, this can disrupt iptables rules like
-m physdev anywhere on the host or in containers which rely on br_netfilter loaded and enabled.
Let’s add some icmp ping traffic counters.
ip netns exec router iptables -A FORWARD -p icmp --icmp-type echo-request ip netns exec router iptables -A FORWARD -p icmp --icmp-type echo-reply
# ip netns exec host1 ping -n -c2 192.168.0.102 PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data. 64 bytes from 192.168.0.102: icmp_seq=1 ttl=64 time=0.047 ms 64 bytes from 192.168.0.102: icmp_seq=2 ttl=64 time=0.058 ms --- 192.168.0.102 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1017ms rtt min/avg/max/mdev = 0.047/0.052/0.058/0.009 ms
The counters won’t match:
# ip netns exec router iptables -v -S FORWARD -P FORWARD ACCEPT -c 0 0 -A FORWARD -p icmp -m icmp --icmp-type 8 -c 0 0 -A FORWARD -p icmp -m icmp --icmp-type 0 -c 0 0
Let’s enable bridge-nf-call-iptables and ping again:
# sysctl -w net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-iptables = 1 # ip netns exec host1 ping -n -c2 192.168.0.102 PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data. 64 bytes from 192.168.0.102: icmp_seq=1 ttl=64 time=0.094 ms 64 bytes from 192.168.0.102: icmp_seq=2 ttl=64 time=0.163 ms --- 192.168.0.102 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1006ms rtt min/avg/max/mdev = 0.094/0.128/0.163/0.036 ms
This time switched packets got a match in iptables’ filter/FORWARD chain:
# ip netns exec router iptables -v -S FORWARD -P FORWARD ACCEPT -c 4 336 -A FORWARD -p icmp -m icmp --icmp-type 8 -c 2 168 -A FORWARD -p icmp -m icmp --icmp-type 0 -c 2 168
Let’s put a DROP policy (which zeroes the default counters) and try again:
# ip netns exec host1 ping -n -c2 192.168.0.102 PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data. --- 192.168.0.102 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1008ms # ip netns exec router iptables -v -S FORWARD -P FORWARD DROP -c 2 168 -A FORWARD -p icmp -m icmp --icmp-type 8 -c 4 336 -A FORWARD -p icmp -m icmp --icmp-type 0 -c 2 168
The bridge code filtered the switched frames/packets via iptables. Let’s add the bypass rule (which will zero again the default counters) like in OP and try again:
# ip netns exec router iptables -A FORWARD -i bridge0 -o bridge0 -j ACCEPT # ip netns exec host1 ping -n -c2 192.168.0.102 PING 192.168.0.102 (192.168.0.102) 56(84) bytes of data. 64 bytes from 192.168.0.102: icmp_seq=1 ttl=64 time=0.132 ms 64 bytes from 192.168.0.102: icmp_seq=2 ttl=64 time=0.123 ms --- 192.168.0.102 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1024ms rtt min/avg/max/mdev = 0.123/0.127/0.132/0.012 ms # ip netns exec router iptables -v -S FORWARD -P FORWARD DROP -c 0 0 -A FORWARD -p icmp -m icmp --icmp-type 8 -c 6 504 -A FORWARD -p icmp -m icmp --icmp-type 0 -c 4 336 -A FORWARD -i bridge0 -o bridge0 -c 4 336 -j ACCEPT
Let’s see what is now actually received on host2 during a ping from host1:
# ip netns exec host2 tcpdump -l -n -s0 -i eth0 -p icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 02:16:11.068795 IP 192.168.0.1 > 192.168.0.102: ICMP echo request, id 9496, seq 1, length 64 02:16:11.068817 IP 192.168.0.102 > 192.168.0.1: ICMP echo reply, id 9496, seq 1, length 64 02:16:12.088002 IP 192.168.0.1 > 192.168.0.102: ICMP echo request, id 9496, seq 2, length 64 02:16:12.088063 IP 192.168.0.102 > 192.168.0.1: ICMP echo reply, id 9496, seq 2, length 64
… instead of source 192.168.0.101. The MASQUERADE rule was also called from the bridge path. To avoid this either add (as explained in section 7‘s example) an exception rule before, or state a non-bridge outgoing interface, if possible at all (now it’s available you can even use
-m physdev if it has to be a bridge…).
LKML/netfilter-dev: br_netfilter: enable in non-initial netns: it would help to enable this feature per namespace rather than globally, thus limiting interactions between hosts and containers.
netfilter-dev: netfilter: physdev: relax br_netfilter dependency: merely attempting to delete a non-existing physdev rule could create problems.
netfilter-dev: connection tracking support for bridge: WIP bridge netfilter code to prepare stateful bridge firewalling using nftables, this time more elegantly. I think one of the last steps to get rid of iptables (‘s kernel side API).
If the above threats not solving your problem, here’s how I resolved the problem on my Debian Stretch.
1st, save your current iptables
iptables-save > your-current-iptables.rules
2nd, delete ALL the Docker created rules
iptables -D <DOCKER-CHAIN-RULES> <target-line-number>
3rd, add itpables rules to accept any traffic to INPUT, FORWARD and OUTPUT
iptables -I INPUT -j ACCEPT iptables -I FORWARD -j ACCEPT iptables -I OUTPUT -j ACCEPT
4th, restart your Docker
service docker restart
Once step 3 completed, you can ping your blocked libvert KVM host from another PC, you will see ICMP responses.
Restarting Docker will also add its required iptables rules back to your machine but it will not be blocking your bridged KVM hosts any more.
If the above solution not working for you, you can restore the iptables using the following command:
iptables-restore < your-current-iptables.rules
Kubernetes Free Online Tutorial, Kubernetes Beginner Tutorial
DevOps Free Online Tutorial, DevOps Beginner Tutorial
Ansible Free Online Tutorial, Ansible Beginner Tutorial
Docker Free Online Tutorial, Docker Beginner Tutorial
Openstack Free Online Tutorial, Openstack Beginner Tutorial
Disclaimer: This has been sourced from a third party syndicated feed through internet. We are not responsibility or liability for its dependability, trustworthiness, reliability and data of the text. We reserves the sole right to alter, delete or remove (without notice) the content in its absolute discretion for any reason whatsoever.