Working with several networks#

When one single network isn’t enough.



Prerequisites#

Make sure you’ve run the one time setup for your environment

Setup#

[ ]:
import enoslib as en

# Enable rich logging
_ = en.init_logging()

We reserve two nodes (with at least two network interfaces), the first network interface of each node will use the production network of Grid’5000 (not isolated network) while a second network interface will be configured to use a Vlan.

  • To find out which machine have at least two network cards, you can refer to the hardware page of Grid’5000

  • To know more about Vlans on Grid’5000, you can refer to this page

Beware: the number of VLAN is limited. Here we want a routed vlans and there are only 6 routed vlan per sites (3 are monosite and 3 are multisite)

[ ]:
SITE = "rennes"

network = en.G5kNetworkConf(type="prod", roles=["public"], site=SITE)
private = en.G5kNetworkConf(type="kavlan", roles=["private"], site=SITE)

conf = (
    en.G5kConf.from_settings(job_name="enoslib_several_networks")
        .add_network_conf(network)
        .add_network_conf(private)
        .add_machine(
            roles=["server", "xp"],
            cluster="paravance",
            nodes=1,
            primary_network=network,
            secondary_networks=[private],
        )
        .add_machine(
            roles=["client", "xp"],
            cluster="paravance",
            nodes=1,
            primary_network=network,
            secondary_networks=[private],
        )
        .finalize()
)
conf
[ ]:
provider = en.G5k(conf)
roles, networks = provider.init()
roles

Get the network information of your nodes#

First we retrieve the network information by syncing the Host descriptions with the remote machines. Syncing the information will populate every single Host datastructure with some actual information (e.g. number of cores, network information). This relies on Ansible fact gathering and is provider agnostic. Note that Grid’5000 provides a lot of node information in its REST API (but provides only static information)

[ ]:
roles = en.sync_info(roles, networks)
roles

We can now filter the network addresses of the nodes given a network

[ ]:
server = roles["server"][0]
server.filter_addresses(networks=networks["private"])
[ ]:
ip_address = server.filter_addresses(networks=networks["private"])[0]
str(ip_address.ip.ip)
[ ]:
server.filter_addresses(networks=networks["public"])

A simple load generation tool#

We are using flent, a convenient client to netperf that is able to play different network benchmarks.

Roughly speaking, Flent connects to a Netperf server, starts a benchmark and collect metrics in various format (csv, images … ). That makes a good candidates when you need to get a quick insight into the performance of the network between your nodes

The goal of this part is to initiate a benchmark of TCP traffic on the private network. So we need to instruct flent to connect to the netperf server on the relevant address.

[ ]:
with en.actions(roles=roles) as a:
    a.apt_repository(
        repo="deb http://deb.debian.org/debian $(lsb_release -c -s) main contrib non-free",
        state="present",
    )
    a.apt(
        name=["flent", "netperf", "python3-setuptools", "python3-matplotlib"],
        state="present",
        update_cache = "yes"
    )

Checking the routes on the nodes. Make sure the private network goes through the private interface.

[ ]:
routes = en.run_command("ip route list", roles=roles)
print("\n-Routes-\n")
print("\n\n".join([f"{r.host} => {r.stdout}" for r in routes]))
[ ]:
server_address = str(server.filter_addresses(networks=networks["private"])[0].ip.ip)

with en.actions(pattern_hosts="server", roles=roles) as a:
    a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
    a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")


with en.actions(pattern_hosts="client", roles=roles) as a:
    a.shell(
        " flent tcp_upload -p totals "
        " -l 60 "
        f" -H { server_address } "
        " -t 'tcp_upload test' "
        " -o result.png"
    )
    a.fetch(src="result.png", dest="result")
[ ]:
with en.actions(pattern_hosts="client", roles=roles) as a:
    a.fetch(src="result.png", dest="/tmp/result")
    r = a.results
r
[ ]:
from IPython.display import Image
Image(f"/tmp/result/{roles['client'][0].alias}/result.png")

Forcing the flent client to be bound on the right network (not really necessary if the routes are set correctly). It’s an opportunity to use host variables so let’s do it ;)

flent has an option for this --local-bind <ip>

[ ]:
for h in roles["client"]:
    h.extra.update({"local_bind": h.filter_addresses(networks=networks["private"])[0].ip.ip})
roles["client"][0]
[ ]:
server_address = str(server.filter_addresses(networks=networks["private"])[0].ip.ip)

with en.actions(pattern_hosts="server", roles=roles) as a:
    a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
    a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")


with en.actions(pattern_hosts="client", roles=roles) as a:
    a.shell(
        " flent tcp_upload -p totals "
        " -l 60 "
        f" -H { server_address } "
        "--local-bind {{ local_bind }} "
        " -t 'tcp_upload test' "
        " -o result_bind.png"
    )
    a.fetch(src="result_bind.png", dest="/tmp/result")
[ ]:
from IPython.display import Image
Image(f"/tmp/result/{roles['client'][0].alias}/result_bind.png")

Checking that the network traffic flows through the right interface :)#

[ ]:
# we enable the statistics on all known interfaces
# note that this seems incompatible with --epoch :( :(
with en.Dstat(nodes=roles["xp"], options="--full") as d:
    backup_dir = d.backup_dir
    with en.actions(pattern_hosts="server", roles=roles) as a:
        a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
        a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")


    with en.actions(pattern_hosts="client", roles=roles) as a:
        a.shell(
            " flent tcp_upload -p totals "
            " -l 60 "
            f" -H { server_address } "
            "--local-bind {{ local_bind }} "
            " -t 'tcp_upload test' "
            " -o result_bind.png"
        )
        a.fetch(src="result_bind.png", dest="result")
[ ]:
import pandas as pd
import seaborn as sns

print(backup_dir)

# create a dictionnary: host -> pd.DataFrame
results = dict()
for host in roles["xp"]:
    result = pd.DataFrame()
    host_dir = backup_dir / host.alias
    csvs = host_dir.rglob("*.csv")
    for csv in csvs:
        print(csv)
        df = pd.read_csv(csv, skiprows=5, index_col=False)
        df["host"] = host.alias
        df["csv"] = csv
        result = pd.concat([result, df], axis=0)
    results[host] = result
[ ]:
results[roles["xp"][0]]
[ ]:
from itertools import product
import matplotlib.pyplot as plt

for host, result in results.items():
    interfaces = host.filter_interfaces()
    # interfaces = [eno1, enos2]
    keys_in_csv = [fmt % interface for interface, fmt in product(interfaces, ["net/%s:recv", "net/%s:send"])]
    # keys_in_csv = ['net/eno2:recv', 'net/eno2:send', 'net/eno1:recv', 'net/eno1:send']
    print(keys_in_csv)
    plt.figure()
    # melt makes the data tidy
    # 0, {recv, send}, value_0
    # 1, {recv, send}, value_1
    sns.lineplot(data=result.melt(value_vars = keys_in_csv, ignore_index=False).reset_index(), x="index", y="value", hue="variable")
    plt.title(f"{host.alias} \n ~ traffic should be on {host.filter_interfaces(networks=networks['private'])} ~")

Emulating the network conditions#

We’ll illustrate how network constraints can be set on specific network interfaces on the nodes of the experiment. To do so EnOSlib provides two services: - the Netem service which is a wrapper around netem. - the NetemHTB which provides a high level interface to finer grained HTB network based emulation

More information can be found in the EnOSlib documentation: https://discovery.gitlabpages.inria.fr/enoslib/apidoc/netem.

EnOSlib let’s you set the constraint easily on a dedicated network by only specifying it with its logical name.

[ ]:
netem = en.Netem()
# symetric constraints:
# node1|10ms ---> 10ms|node2|10ms --> 10ms|node1
netem.add_constraints("delay 10ms", roles["xp"], symetric=True, networks=networks["private"])
[ ]:
netem.deploy()

There’s a convenient method that let you quickly check the network condition (at least the RTT latency)

[ ]:
netem.validate()
[ ]:
from pathlib import Path
server_alias = roles['server'][0].alias
print(server_alias)
print(Path(f"_tmp_enos_/{server_alias[:-3]}.fpingout")read_text())

print("...8<"*20)
client_alias = roles['client'][0].alias
print(client_alias)

print(Path(f"_tmp_enos_/{client_alias[:-3]}.fpingout").read_text())

Clean#

[ ]:
provider.destroy()
[ ]: