Working with several networks#
When one single network isn’t enough.
Website: https://discovery.gitlabpages.inria.fr/enoslib/index.html
Instant chat: https://framateam.org/enoslib
Source code: https://gitlab.inria.fr/discovery/enoslib
Prerequisites#
⚠️ Make sure you’ve run the one time setup for your environment
⚠️ Make sure you’re running this notebook under the right kernel
[ ]:
import enoslib as en
en.check()
Setup#
[ ]:
import enoslib as en
# Enable rich logging
_ = en.init_logging()
We reserve two nodes (with at least two network interfaces), the first network interface of each node will use the production network of Grid’5000 (not isolated network) while a second network interface will be configured to use a Vlan.
To find out which machine have at least two network cards, you can refer to the hardware page of Grid’5000
To know more about Vlans on Grid’5000, you can refer to this page
Beware: the number of VLAN is limited. Here we want a routed vlans and there are only 6 routed vlan per sites (3 are monosite and 3 are multisite)
[ ]:
SITE = "rennes"
network = en.G5kNetworkConf(type="prod", roles=["public"], site=SITE)
private = en.G5kNetworkConf(type="kavlan", roles=["private"], site=SITE)
conf = (
en.G5kConf.from_settings(job_name="enoslib_several_networks")
.add_network_conf(network)
.add_network_conf(private)
.add_machine(
roles=["server", "xp"],
cluster="paravance",
nodes=1,
primary_network=network,
secondary_networks=[private],
)
.add_machine(
roles=["client", "xp"],
cluster="paravance",
nodes=1,
primary_network=network,
secondary_networks=[private],
)
.finalize()
)
conf
[ ]:
provider = en.G5k(conf)
roles, networks = provider.init()
roles
Get the network information of your nodes#
First we retrieve the network information by syncing the Host descriptions with the remote machines. Syncing the information will populate every single Host datastructure with some actual information (e.g. number of cores, network information). This relies on Ansible fact gathering and is provider agnostic. Note that Grid’5000 provides a lot of node information in its REST API (but provides only static information)
[ ]:
roles = en.sync_info(roles, networks)
roles
We can now filter the network addresses of the nodes given a network
[ ]:
server = roles["server"][0]
server.filter_addresses(networks=networks["private"])
[ ]:
ip_address = server.filter_addresses(networks=networks["private"])[0]
str(ip_address.ip.ip)
[ ]:
server.filter_addresses(networks=networks["public"])
A simple load generation tool#
We are using flent, a convenient client to netperf that is able to play different network benchmarks.
Roughly speaking, Flent connects to a Netperf server, starts a benchmark and collect metrics in various format (csv, images … ). That makes a good candidates when you need to get a quick insight into the performance of the network between your nodes
The goal of this part is to initiate a benchmark of TCP traffic on the private
network. So we need to instruct flent
to connect to the netperf
server on the relevant address.
[ ]:
with en.actions(roles=roles) as a:
# Note flent is on the non-free repo (activated by default nowadays on g5k)
a.apt(
name=["flent", "netperf", "python3-setuptools", "python3-matplotlib"],
state="present",
update_cache = "yes"
)
Checking the routes on the nodes. Make sure the private
network goes through the private
interface.
[ ]:
routes = en.run_command("ip route list", roles=roles)
print("\n-Routes-\n")
print("\n\n".join([f"{r.host} => {r.stdout}" for r in routes]))
[ ]:
server_address = str(server.filter_addresses(networks=networks["private"])[0].ip.ip)
with en.actions(pattern_hosts="server", roles=roles) as a:
a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")
with en.actions(pattern_hosts="client", roles=roles) as a:
a.shell(
" flent tcp_upload -p totals "
" -l 60 "
f" -H { server_address } "
" -t 'tcp_upload test' "
" -o result.png"
)
a.fetch(src="result.png", dest="result")
[ ]:
with en.actions(pattern_hosts="client", roles=roles) as a:
a.fetch(src="result.png", dest="/tmp/result")
r = a.results
r
[ ]:
from IPython.display import Image
Image(f"/tmp/result/{roles['client'][0].alias}/result.png")
Forcing the flent client to be bound on the right network (not really necessary if the routes are set correctly). It’s an opportunity to use host variables so let’s do it ;)
flent
has an option for this --local-bind <ip>
[ ]:
for h in roles["client"]:
h.extra.update({"local_bind": h.filter_addresses(networks=networks["private"])[0].ip.ip})
roles["client"][0]
[ ]:
server_address = str(server.filter_addresses(networks=networks["private"])[0].ip.ip)
with en.actions(pattern_hosts="server", roles=roles) as a:
a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")
with en.actions(pattern_hosts="client", roles=roles) as a:
a.shell(
" flent tcp_upload -p totals "
" -l 60 "
f" -H { server_address } "
"--local-bind {{ local_bind }} "
" -t 'tcp_upload test' "
" -o result_bind.png"
)
a.fetch(src="result_bind.png", dest="/tmp/result")
[ ]:
from IPython.display import Image
Image(f"/tmp/result/{roles['client'][0].alias}/result_bind.png")
Checking that the network traffic flows through the right interface :)#
[ ]:
# we enable the statistics on all known interfaces
# note that this seems incompatible with --epoch :( :(
with en.Dstat(nodes=roles["xp"], options="--full") as d:
backup_dir = d.backup_dir
with en.actions(pattern_hosts="server", roles=roles) as a:
a.shell("netperf", background=True) # this is somehow idempotent .. will fail silently if netperf is already started
a.wait_for(port=12865, state="started", task_name="Waiting for netperf to be ready")
with en.actions(pattern_hosts="client", roles=roles) as a:
a.shell(
" flent tcp_upload -p totals "
" -l 60 "
f" -H { server_address } "
"--local-bind {{ local_bind }} "
" -t 'tcp_upload test' "
" -o result_bind.png"
)
a.fetch(src="result_bind.png", dest="result")
[ ]:
import pandas as pd
import seaborn as sns
print(backup_dir)
# create a dictionnary: host -> pd.DataFrame
results = dict()
for host in roles["xp"]:
result = pd.DataFrame()
host_dir = backup_dir / host.alias
csvs = host_dir.rglob("*.csv")
for csv in csvs:
print(csv)
df = pd.read_csv(csv, skiprows=5, index_col=False)
df["host"] = host.alias
df["csv"] = csv
result = pd.concat([result, df], axis=0)
results[host] = result
[ ]:
results[roles["xp"][0]]
[ ]:
from itertools import product
import matplotlib.pyplot as plt
for host, result in results.items():
interfaces = host.filter_interfaces()
# interfaces = [eno1, enos2]
keys_in_csv = [fmt % interface for interface, fmt in product(interfaces, ["net/%s:recv", "net/%s:send"])]
# keys_in_csv = ['net/eno2:recv', 'net/eno2:send', 'net/eno1:recv', 'net/eno1:send']
print(keys_in_csv)
plt.figure()
# melt makes the data tidy
# 0, {recv, send}, value_0
# 1, {recv, send}, value_1
sns.lineplot(data=result.melt(value_vars = keys_in_csv, ignore_index=False).reset_index(), x="index", y="value", hue="variable")
plt.title(f"{host.alias} \n ~ traffic should be on {host.filter_interfaces(networks=networks['private'])} ~")
Emulating the network conditions#
We’ll illustrate how network constraints can be set on specific network interfaces on the nodes of the experiment. To do so EnOSlib provides two services: - the Netem service which is a wrapper around netem. - the NetemHTB which provides a high level interface to finer grained HTB network based emulation
More information can be found in the EnOSlib documentation: https://discovery.gitlabpages.inria.fr/enoslib/apidoc/netem.
EnOSlib let’s you set the constraint easily on a dedicated network by only specifying it with its logical name.
[ ]:
netem = en.Netem()
# symetric constraints:
# node1|10ms ---> 10ms|node2|10ms --> 10ms|node1
netem.add_constraints("delay 10ms", roles["xp"], symetric=True, networks=networks["private"])
[ ]:
netem.deploy()
There’s a convenient method that let you quickly check the network condition (at least the RTT latency)
[ ]:
netem.validate()
[ ]:
from pathlib import Path
server_alias = roles['server'][0].alias
print(server_alias)
print(Path(f"_tmp_enos_/{server_alias[:-3]}.fpingout")read_text())
print("...8<"*20)
client_alias = roles['client'][0].alias
print(client_alias)
print(Path(f"_tmp_enos_/{client_alias[:-3]}.fpingout").read_text())
Clean#
[ ]:
provider.destroy()