Unicast VXLAN: overlay network for multiple servers with dozens of containers

Introduction

Virtualization is a great technology, as it lets you run multiple virtual systems on a single server.

It's easy to create a "LAN" for containers on a single server - just attach them to the same bridge, use the same subnet (i.e. 10.10.10.0/24) - done. Containers can communicate with each other using their private IP address.

However, with more then one server *not* in the same LAN (i.e. two LXD servers in different datacentres, or even in the same datacentre, but with hosting which doesn't allow LANs), the things get tricky. There are numerous examples of such hostings - i.e. Amazon AWS or Hetzner.

Virtual Extensible LAN (VXLAN) can help us solve this issue. No more hassle with port redirections, adjusting IPs after containers were migrated etc.

While this HOWTO is primarily with LXD in mind, it can be used with any networking technologies. Apart from LXD - LXC, KVM, Xen, docker should be fine to use it. Make sure to read MTU notes.

Network diagram

We will build a Virtual Extensible LAN as on example diagrams below. Each server is using unicast VXLAN connected to every other server. In the examples, the containers are using 10.10.10.0/24 subnet, but of course you can use any other subnet.

Example 1:


Example 2:

Prerequisites

You will need a fairly modern kernel and userspace. Ubuntu 14.04 LTS is too old; Ubuntu 16.04 LTS is fine. CentOS 7 should also be fine.

Each container needs to be attached to the bridge created by the script below. We're using vxbr0 as the bridge for VXLAN devices.

Performance

VXLAN offers near wire speed. I.e. if iperf between your containers with public IPs will show 90 MB/s traffic, VXLAN traffic should be showing around 85 MB/s.

Drawbacks

  • VXLAN does not encrypt the traffic
  • VXLAN does not compress the traffic

While it's typically fine to run the traffic unencrypted between the servers in the same VPC / security group in AWS or in general, a single datacentre, you may want to think about extra traffic encryption with server in different geographical locations.

MTU issue

VXLAN interface will lower your MTU to 1450. It means, any container attached to such a bridge using VXLAN needs its networking to have MTU of 1450 as well. Otherwise, any traffic with larger packets will hang.

If you're using LXD, you can run "lxc config edit your-container" and set it as below:

preventing bridge looping

If you use more than two servers, you will have more than one VXLAN device attached to the bridge. This normally creates packet loops. To prevent it, we use ebtables to block the traffic between different VXLAN devices on the same server.

The script sets this up automatically.

Please note that the script assumes it's the only thing on the server which manipulates ebtables.

vxlansetup.sh script

The script creates VXLAN devices between every server.

  • usage - available actions are start, stop, restart and status
  • copy the script to every server running the containers (do not copy or use the script in the containers)
  • modify LOCALIP, REMOTEIPS, LOCAL_DEV, BRIDGE_DEV variables on every server
  • DRYRUN - set it to 1 to see which commands would be run
  • VXLAN_DEV, PORT - should be OK for most systems, but adjust if needed
  • you have to run the script on every server (remember to adjust the IPs accordingly)

Example output

Here is an example output in "DRYRUN" mode: