Unicast VXLAN: overlay network for multiple servers with dozens of containers

From lxadm | Linux administration tips, tutorials, HOWTOs and articles
Jump to: navigation, search

Introduction[edit]

Virtualization is a great technology, as it lets you run multiple virtual systems on a single server.

It's easy to create a "LAN" for containers on a single server - just attach them to the same bridge, use the same subnet (i.e. 10.10.10.0/24) - done. Containers can communicate with each other using their private IP address.

However, with more then one server *not* in the same LAN (i.e. two LXD servers in different datacentres, or even in the same datacentre, but with hosting which doesn't allow LANs), the things get tricky. There are numerous examples of such hostings - i.e. Amazon AWS or Hetzner.

Virtual Extensible LAN (VXLAN) can help us solve this issue. No more hassle with port redirections, adjusting IPs after containers were migrated etc.

While this HOWTO is primarily with LXD in mind, it can be used with any networking technologies. Apart from LXD - LXC, KVM, Xen, docker should be fine to use it. Make sure to read MTU notes.


Network diagram[edit]

We will build a Virtual Extensible LAN as on example diagrams below. Each server is using unicast VXLAN connected to every other server. In the examples, the containers are using 10.10.10.0/24 subnet, but of course you can use any other subnet.

Example 1:

LXD1: IP 1.1.1.1, Europe          LXD2: IP 2.2.2.2, Asia
container01, 10.10.10.10          container04, 10.10.10.20
container02, 10.10.10.11          container05, 10.10.10.21
container03, 10.10.10.12          container06, 10.10.10.22

LXD3: IP 3.3.3.3, US              LXD4: IP 4.4.4.4, Africa
container07, 10.10.10.30          container10, 10.10.10.40
container08, 10.10.10.31          container11, 10.10.10.41
container09, 10.10.10.32          container12, 10.10.10.42


Example 2:

LXD1: IP 1.1.1.1, Hetzner DC18    LXD2: IP 2.2.2.2, Hetzner DC19
container01, 10.10.10.10          container04, 10.10.10.20
container02, 10.10.10.11          container05, 10.10.10.21
container03, 10.10.10.12          container06, 10.10.10.22

LXD3: IP 3.3.3.3, Hetzner DC20    LXD4: IP 4.4.4.4, Hetzner DC20
container07, 10.10.10.30          container10, 10.10.10.40
container08, 10.10.10.31          container11, 10.10.10.41
container09, 10.10.10.32          container12, 10.10.10.42


Prerequisites[edit]

You will need a fairly modern kernel and userspace. Ubuntu 14.04 LTS is too old; Ubuntu 16.04 LTS is fine. CentOS 7 should also be fine.

Each container needs to be attached to the bridge created by the script below. We're using vxbr0 as the bridge for VXLAN devices.


Performance[edit]

VXLAN offers near wire speed. I.e. if iperf between your containers with public IPs will show 90 MB/s traffic, VXLAN traffic should be showing around 85 MB/s.


Drawbacks[edit]

  • VXLAN does not encrypt the traffic
  • VXLAN does not compress the traffic

While it's typically fine to run the traffic unencrypted between the servers in the same VPC / security group in AWS or in general, a single datacentre, you may want to think about extra traffic encryption with server in different geographical locations.


MTU issue[edit]

VXLAN interface will lower your MTU to 1450. It means, any container attached to such a bridge using VXLAN needs its networking to have MTU of 1450 as well. Otherwise, any traffic with larger packets will hang.

If you're using LXD, you can run "lxc config edit your-container" and set it as below:

(...)
devices:
  eth0:
    mtu: "1450"      # <----- lowers container's NIC MTU to 1450
    nictype: bridged
    parent: vxbr0    # <----- bridge where the container has to be attached
    type: nic
  root:
    path: /
    type: disk
(...)


preventing bridge looping[edit]

If you use more than two servers, you will have more than one VXLAN device attached to the bridge. This normally creates packet loops. To prevent it, we use ebtables to block the traffic between different VXLAN devices on the same server.

The script sets this up automatically.

Please note that the script assumes it's the only thing on the server which manipulates ebtables.


vxlansetup.sh script[edit]

The script creates VXLAN devices between every server.

  • usage - available actions are start, stop, restart and status
# ./vxlansetup.sh
 * Usage: vxlansetup.sh {start|stop|restart|status}
#


  • copy the script to every server running the containers (do not copy or use the script in the containers)
  • modify LOCALIP, REMOTEIPS, LOCAL_DEV, BRIDGE_DEV variables on every server
  • DRYRUN - set it to 1 to see which commands would be run
  • VXLAN_DEV, PORT - should be OK for most systems, but adjust if needed
  • you have to run the script on every server (remember to adjust the IPs accordingly)


#!/bin/bash

LOCALIP=1.1.1.1
REMOTEIPS=(2.2.2.2 3.3.3.3 4.4.4.4)

# existing local device where $LOCALIP is attached
LOCAL_DEV=vmbr0

# New bridge device where $VXLAN_DEV devices will be attached
BRIDGE_DEV=vxbr0

# Will set up one vxlan* device per remote IP.
# For example, with two remote IPs (three servers in total), we will set up vxlan0 and vxlan1 devices,
# attached to BRIDGE_DEV (vxbr0)
VXLAN_DEV=vxlan

# Port used for vxlan
PORT=4789

# If set to 1 - only print commands we would run
DRYRUN=1

# No need to change anything below
function vxrun() {
    COMMAND=$@
    if [ "$DRYRUN" -eq 1 ] ; then
        echo $COMMAND
    else
        $COMMAND
    fi
}

function vxlan_check() {
    brctl show $BRIDGE_DEV 2>&1 | grep -q 'No such device'
    if [ $? -ne 0 ] ; then
        echo "vxlan already set up?"
        exit 1
    fi
    ip addr | grep -q " $VXLAN_DEV"
    if [ $? -eq 0 ] ; then
        echo "vxlan already set up?"
        exit 1
    fi
}

function vxlan_start() {
    # first, check if we have any vxlan devices or interfaces
    vxlan_check

    VXLAN_DEVICES=$((${#REMOTEIPS[@]} - 1))

    which brctl >/dev/null
    if [ $? -ne 0 ] ; then
        vxrun echo brctl command not found
        exit 1
    fi

    # If there is more than one remote IP, we have to add ebtables rules to prevent looping
    if [ "${#REMOTEIPS[@]}" -gt 1 ] ; then
        # Check if ebtables is installed
        which ebtables >/dev/null
        if [ $? -ne 0 ] ; then
            vxrun echo ebtables command not found
            exit 1
        fi
        # Our vxlan* devices must not pass traffic to each other
        for i in $(seq 0 $VXLAN_DEVICES) ; do
            for j in $(seq 0 $VXLAN_DEVICES) ; do
                if [ $i -ne $j ] ; then
                    vxrun ebtables -A FORWARD -i ${VXLAN_DEV}${i} -o ${VXLAN_DEV}${j} -j DROP
                fi
            done
        done
    fi

    # Add bridge
    vxrun brctl addbr $BRIDGE_DEV
    #vxrun brctl stp $BRIDGE_DEV on
    vxrun ip link set up $BRIDGE_DEV

    # Add vxlan* devices
    for VXLAN_DEVICE in $(seq 0 $VXLAN_DEVICES) ; do
        vxrun ip link add ${VXLAN_DEV}${VXLAN_DEVICE} type vxlan id ${VXLAN_DEVICE} remote ${REMOTEIPS[$VXLAN_DEVICE]} local $LOCALIP dev $LOCAL_DEV dstport $PORT
        vxrun ip link set up dev ${VXLAN_DEV}${VXLAN_DEVICE}
        vxrun brctl addif $BRIDGE_DEV ${VXLAN_DEV}${VXLAN_DEVICE}
    done
}

function vxlan_stop() {
    vxrun ip link set down $BRIDGE_DEV
    vxrun brctl delbr $BRIDGE_DEV
    # List all vxlan devices
    VXLAN_SETS=$(ip addr | awk -F: "/$VXLAN_DEV/ {print \$2}")
    for VXLAN_SET in $VXLAN_SETS ; do
        vxrun ip link set down $VXLAN_SET
        vxrun ip link del $VXLAN_SET
    done
    # We assume ebtables are only used for our vxlan setup script
    ebtables -F
}

function vxlan_status() {
    echo vxlan bridge:
    brctl show $BRIDGE_DEV
    echo
    echo vxlan interfaces:
    ip link show | grep -A 1 $VXLAN_DEV
    echo
    ebtables -L
}

MODE=$1
set -u
# start, stop, restart
if [ "$MODE" == "start" ] ; then
    vxlan_start
elif [ "$MODE" == "stop" ] ; then
    vxlan_stop
elif [ "$MODE" == "restart" ] ; then
    vxlan_stop
    vxlan_start
elif [ "$MODE" == "status" ] ; then
    vxlan_status
else
    echo " * Usage: $0 {start|stop|restart|status}"
    exit 1
fi


Example output[edit]

Here is an example output in "DRYRUN" mode:

# vxlansetup.sh start
ebtables -A FORWARD -i vxlan0 -o vxlan1 -j DROP
ebtables -A FORWARD -i vxlan0 -o vxlan2 -j DROP
ebtables -A FORWARD -i vxlan1 -o vxlan0 -j DROP
ebtables -A FORWARD -i vxlan1 -o vxlan2 -j DROP
ebtables -A FORWARD -i vxlan2 -o vxlan0 -j DROP
ebtables -A FORWARD -i vxlan2 -o vxlan1 -j DROP
brctl addbr vxbr0
ip link set up vxbr0
ip link add vxlan0 type vxlan id 0 remote 2.2.2.2 local 1.1.1.1 dev vmbr0 dstport 4789
ip link set up dev vxlan0
brctl addif vxbr0 vxlan0
ip link add vxlan1 type vxlan id 1 remote 3.3.3.3 local 1.1.1.1 dev vmbr0 dstport 4789
ip link set up dev vxlan1
brctl addif vxbr0 vxlan1
ip link add vxlan2 type vxlan id 2 remote 4.4.4.4 local 1.1.1.1 dev vmbr0 dstport 4789
ip link set up dev vxlan2
brctl addif vxbr0 vxlan2