Grid’5000 tutorials#

This tutorial illustrates the use of EnOSlib to interact with Grid’5000. For a full description of the API and the available options, please refer to the API documentation of the Grid’5000 provider.

Hint

For a complete schema reference see G5k Schema

Installation#

To use Grid’5000 with Enoslib, you can go with a virtualenv :

$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install -U pip

$ pip install enoslib

Configuration#

Since python-grid5000 is used behind the scene, the configuration is read from a configuration file located in the home directory. It can be created with the following:

echo '
username: MYLOGIN
password: MYPASSWORD
' > ~/.python-grid5000.yaml

chmod 600 ~/.python-grid5000.yaml

The above configuration should work both from a Grid’5000 frontend machine and from your local machine as well.

External access (from your laptop)#

If you are running your experiment from outside Grid’5000 (e.g. from your local machine), using a SSH jump host is required.

Enoslib (version 8.1.0 and above) will automatically setup such a SSH jump host connection through access.grid5000.fr when it detects you are working outside of the Grid’5000 network. See Global configuration if you need to configure this behaviour.

Hint

Using a SSH jump host does not provide the best performance when controlling a large number of nodes. This is because the number of simultaneous SSH connection is limited on the jump host. See Performance tuning for many tips and tricks to improve performance.

First reservation example#

The following shows how to deal with a basic reservation. This will use nodes running the standard Grid’5000 software environment (Debian stable with performance tuning and many pre-installed software) and connect to them over SSH as root. For this purpose you must have a ~/.ssh/id_rsa.pub file available. Using the standard Grid’5000 environment is good for prototyping, but not for scientific experiments that care about reproducibility: we’ll see later how to deploy a specific operating system.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf.from_settings(job_name=job_name, walltime="0:10:00")
13    .add_machine(roles=["groupA"], cluster="paravance", nodes=1)
14    .add_machine(roles=["groupB"], cluster="parasilo", nodes=1)
15)
16
17# This will validate the configuration, but not reserve resources yet
18provider = en.G5k(conf)
19
20# Get actual resources
21roles, networks = provider.init()
22# Do your stuff here
23# ...
24
25
26# Release all Grid'5000 resources
27provider.destroy()

tuto_grid5000_basic.py

To run this experiment, you just have to launch the script:

$ python tuto_grid5000_basic.py

The script will output the different steps needed to reserve and provision the physical nodes. However, we don’t actually do anything with the nodes yet, so the script will finish rather quickly.

Using roles to run commands#

After Grid’5000 machines are provisioned, they are assigned to their roles, which can be used to run commands in parallel:

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf.from_settings(job_name=job_name, walltime="0:10:00")
13    .add_machine(roles=["compute", "control"], cluster="paravance", nodes=1)
14    .add_machine(roles=["compute"], cluster="paravance", nodes=1)
15)
16
17# This will validate the configuration, but not reserve resources yet
18provider = en.G5k(conf)
19
20# Get actual resources
21roles, networks = provider.init()
22
23# Run a command on all hosts belonging to a given role
24results = en.run_command("nproc", roles=roles["compute"])
25for result in results:
26    print(f"{result.host} has {result.payload['stdout']} logical CPU cores")
27
28# Run a command on all hosts, whatever their roles
29results = en.run_command("uname -a", roles=roles)
30for result in results:
31    print(result.payload["stdout"])
32
33
34# Release all Grid'5000 resources
35provider.destroy()

tuto_grid5000_commands.py

See Ansible Integration for more details about running commands and configuring your experimental machines.

Deploying operating systems#

Grid’5000 provides several operating systems that can be β€œdeployed” (i.e. installed automatically) on all of your nodes. To specify the operating system, use env_name as well as the deploy job type:

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf.from_settings(
13        job_name=job_name,
14        job_type=["deploy"],
15        env_name="ubuntu2204-min",
16        walltime="0:20:00",
17    )
18    .add_machine(roles=["groupA"], cluster="paravance", nodes=1)
19    .add_machine(roles=["groupB"], cluster="parasilo", nodes=1)
20)
21
22# This will validate the configuration, but not reserve resources yet
23provider = en.G5k(conf)
24
25# Get actual resources
26roles, networks = provider.init()
27
28results = en.run_command("lsb_release -a", roles=roles)
29for result in results:
30    print(result.payload["stdout"])
31
32
33# Release all Grid'5000 resources
34provider.destroy()

tuto_grid5000_deploy.py

Finished 1 tasks (lsb_release -a)
──────────────────────────────────────
Distributor ID:     Ubuntu
Description:        Ubuntu 22.04.1 LTS
Release:    22.04
Codename:   jammy

Deployment takes a few minutes, with some variation depending on cluster hardware.

The full list of available operating systems is in the Grid’5000 documentation.

To obtain a minimal environment that reflects the default settings of the operating system, use a -min environment. You will likely have to install additional packages and tools for your experiments.

If you need to share data on a network filesystem (available under /home/YOURLOGIN/), use a -nfs or -big environment.

Using reservable disks on nodes#

Grid’5000 has a disk reservation feature: on several clusters, reserving secondary disks is mandatory if you want to use them in your experiments.

The following tutorial shows how to reserve the disks using Enoslib, and then how they can be used a raw devices. Here the goal is to build a software RAID array with mdadm and then benchmark it using fio:

 1import json
 2import logging
 3from pathlib import Path
 4
 5import enoslib as en
 6
 7en.init_logging(level=logging.INFO)
 8en.check()
 9
10job_name = Path(__file__).name
11
12conf = en.G5kConf.from_settings(
13    job_name=job_name, job_type=[], walltime="0:30:00"
14).add_machine(
15    roles=["storage"],
16    cluster="grimoire",
17    nodes=2,
18    reservable_disks=True,
19)
20
21provider = en.G5k(conf)
22
23# Get actual resources
24roles, _ = provider.init()
25
26with en.actions(roles=roles) as p:
27    # Check that the expected disks are present.
28    # https://www.grid5000.fr/w/Nancy:Hardware#grimoire
29    # Notice that we use the "diskN" aliases because they are more
30    # stable than "sdX".
31    disks = ["disk1", "disk2", "disk3", "disk4"]
32    for disk in disks:
33        p.command(f"test -e /dev/{disk}", task_name=f"Check availability of {disk}")
34
35    # Partition disks
36    for disk in disks:
37        p.shell(
38            f"echo -e 'label: gpt\n,,raid' | sfdisk --no-reread /dev/{disk}",
39            task_name=f"Create partition on {disk}",
40        )
41
42    # Create a software RAID-5 array
43    nb_disks = len(disks)
44    raid_parts = " ".join(f"/dev/{disk}p1" for disk in disks)
45    p.shell(
46        f"grep -q md0 /proc/mdstat || "
47        f"mdadm --create /dev/md0 --run --level 5 "
48        f"--raid-devices {nb_disks} {raid_parts}",
49        task_name="Create RAID array",
50    )
51
52    # Run FIO to benchmark the array (at the block device level)
53    p.apt(name="fio", state="present", task_name="Install fio")
54    p.command(
55        "fio --output-format=json --name=enoslib --ioengine=libaio "
56        "--direct=1 --gtod_reduce=1 --readwrite=randread "
57        "--bs=4K --iodepth=8 --numjobs=8 --runtime 30s "
58        "--filename=/dev/md0",
59        task_name="Run fio",
60    )
61
62    # Destroy everything
63    p.command("mdadm --stop /dev/md0", task_name="Stop RAID array")
64    p.command(f"wipefs -a {raid_parts}", task_name="Wipe RAID signatures")
65
66results = p.results
67
68# Get output of FIO and print result
69res_per_node = {res.host: res.stdout for res in results.filter(task="Run fio")}
70for host, output in res_per_node.items():
71    data = json.loads(output)
72    # Sum performance of all parallel FIO "jobs"
73    read_perf_iops = sum(job["read"]["iops"] for job in data["jobs"])
74    print(
75        f"{data['fio version']} running on {host}: "
76        f"average /dev/md0 read performance = {read_perf_iops:.2f} IOPS"
77    )
78
79
80# Release all Grid'5000 resources
81provider.destroy()

tuto_grid5000_reservable_disks.py

Finished 1 tasks (Granting root access on the nodes (sudo-g5k))
─────────────────────────────────────────────────────────────────────────────────────────────────────────
Finished 13 tasks (Check availability of disk1,Check availability of disk2,Check availability of
disk3,Check availability of disk4,Create partition on disk1,Create partition on disk2,Create partition on
disk3,Create partition on disk4,Create RAID array,Install fio,Run fio,Stop RAID array,Wipe RAID
signatures)
─────────────────────────────────────────────────────────────────────────────────────────────────────────
fio-3.25 running on grimoire-8.nancy.grid5000.fr: average /dev/md0 read performance = 550.67 IOPS
fio-3.25 running on grimoire-6.nancy.grid5000.fr: average /dev/md0 read performance = 519.71 IOPS

Specific nodes reservation#

On Grid’5000, machines belonging to a given cluster are normally homogeneous. But it is impossible to provide absolute guarantees about it: for instance, physical disks may have different performance characteristics across nodes of a cluster even though they share the same vendor and model. For this reason, experimenters may need to reproduce an experiment several times using the exact same hardware.

This is possible by specifying nodes with their exact name. By default all the servers specified this way will get reserved unless you specify a target number of nodes using the nodes parameter.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf()
13    .from_settings(job_name=job_name, walltime="0:10:00")
14    .add_machine(
15        roles=["compute"],
16        servers=["paravance-19.rennes.grid5000.fr", "paravance-20.rennes.grid5000.fr"],
17    )
18    .add_machine(
19        roles=["compute"],
20        servers=[f"parasilo-{i}.rennes.grid5000.fr" for i in range(10, 20)],
21        nodes=3,
22    )
23)
24
25provider = en.G5k(conf)
26
27# Get actual resources
28roles, networks = provider.init()
29# Do your stuff here
30# ...
31
32
33# Release all Grid'5000 resources
34provider.destroy()

tuto_grid5000_specific_servers.py

This is an advanced feature: if the required nodes are not available, the experiment will either wait for the resources to become available (e.g. if another user is currently using them) or fail (e.g. if one machine is down due to a maintenance or hardware issue).

Multi-sites experiments#

To run an experiment involving multiple Grid’5000 sites, you simply have to request clusters from each site in the same configuration. For instance, to request nodes from Lille and Rennes (with convenient roles) and check connectivity:

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11# fmt: off
12conf = (
13    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
14    # For convenience, we use the site name as role
15    .add_machine(roles=["rennes", "intel"], cluster="paravance", nodes=1)
16    .add_machine(roles=["lille", "amd"], cluster="chiclet", nodes=1)
17)
18# fmt: on
19
20provider = en.G5k(conf)
21
22# Get actual resources
23roles, networks = provider.init()
24
25# Check connectivity from Rennes to Lille
26target = roles["lille"][0]
27results = en.run_command(f"ping -c3 {target.address}", roles=roles["rennes"])
28for result in results:
29    print(f"Ping from {result.host} to {target.address}:")
30    print(f"{result.stdout}")
31
32
33# Release all Grid'5000 resources
34provider.destroy()

tuto_grid5000_multisites.py

Network-wise, traffic between sites is routed (layer 3) over the Grid’5000 network backbone. If you need nodes from different sites to share the same layer-2 network, you need a global kavlan, see Dedicated networks (kavlan).

Note

There is no global scheduler on Grid’5000. Multi-sites reservation involves finding a common slot to start the jobs on each requested site at the same time. EnOSlib will do that for you. The logic behind it is part of a more generic logic that can synchronize resources between distinct providers.

Dedicated networks (kavlan)#

Kavlan allows to create dedicated networks that are isolated on layer 2, and then reconfigure the physical network interfaces of nodes to put them in these dedicated networks.

Kavlan on secondary interfaces#

We explicitly put the second network interface of each node on a dedicated vlan. The primary interface is still implicitly in the default network. Note that using Kavlan currently requires an OS deployment.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")
12
13conf = (
14    en.G5kConf.from_settings(
15        job_name=job_name,
16        job_type=["deploy"],
17        env_name="debian11-nfs",
18        walltime="0:20:00",
19    )
20    .add_network_conf(private_net)
21    .add_machine(
22        roles=["server"],
23        cluster="paravance",
24        nodes=1,
25        secondary_networks=[private_net],
26    )
27    .add_machine(
28        roles=["client"],
29        cluster="paravance",
30        nodes=1,
31        secondary_networks=[private_net],
32    )
33)
34
35provider = en.G5k(conf)
36
37# Get actual resources
38roles, networks = provider.init()
39
40# Fill in network information from nodes
41roles = en.sync_info(roles, networks)
42
43# Get server's IP address on the private network
44server = roles["server"][0]
45ip_address_obj = server.filter_addresses(networks=networks["private"])[0]
46# This may seem weird: ip_address_obj.ip is a `netaddr.IPv4Interface`
47# which itself has an `ip` attribute.
48server_private_ip = ip_address_obj.ip.ip
49
50# Run ping from client to server on the private network
51results = en.run_command(f"ping -c3 {server_private_ip}", roles=roles["client"])
52for result in results:
53    print(f"Ping from {result.host} to {server_private_ip}:")
54    print(f"{result.stdout}")
55
56
57# Release all Grid'5000 resources
58provider.destroy()

tuto_grid5000_kavlan_secondary.py

Hint

You have to make sure that the cluster you select has at least two physical network interfaces. Check the List of Hardware to choose a suitable cluster.

Kavlan on primary interface#

The primary network interface of the nodes is a special case, because Enoslib uses it to manage the nodes through SSH. The primary interface can still be configured in a Kavlan network, but be aware that you should not break connectivity on this interface.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")
12
13conf = (
14    en.G5kConf.from_settings(
15        job_name=job_name,
16        job_type=["deploy"],
17        env_name="debian11-nfs",
18        walltime="0:20:00",
19    )
20    .add_network_conf(private_net)
21    .add_machine(
22        roles=["roleA"], cluster="paravance", nodes=2, primary_network=private_net
23    )
24    .finalize()
25)
26
27provider = en.G5k(conf)
28# Get actual resources
29roles, networks = provider.init()
30
31# Show kavlan subnet
32print("Kavlan subnet:", networks["private"][0].network)
33
34# The nodes use this kavlan network for all traffic
35# (the network is interconnected at layer-3 with the rest of Grid'5000)
36results = en.run_command("ip route get 9.9.9.9", roles=roles["roleA"])
37for result in results:
38    print(f"{result.stdout}")
39
40
41# Release all Grid'5000 resources
42provider.destroy()

tuto_grid5000_kavlan_primary.py

Multi-sites layer-2 connectivity with global Kavlan#

Each global kavlan network is a layer-2 network that spans all Grid’5000 sites. This is very useful when you want to experiment with software routers in different locations and you need direct layer-2 connectivity between them.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11# A globlal Kavlan can be reserved on any site
12kavlan_global = en.G5kNetworkConf(type="kavlan-global", roles=["global"], site="lille")
13
14# Request nodes from Rennes and Lille
15conf = (
16    en.G5kConf()
17    .from_settings(
18        job_type=["deploy"],
19        env_name="debian11-nfs",
20        job_name=job_name,
21        walltime="00:50:00",
22    )
23    .add_network_conf(kavlan_global)
24    .add_machine(
25        roles=["rennes", "client"],
26        cluster="paravance",
27        nodes=1,
28        secondary_networks=[kavlan_global],
29    )
30    .add_machine(
31        roles=["lille", "server"],
32        cluster="chiclet",
33        nodes=1,
34        secondary_networks=[kavlan_global],
35    )
36)
37
38provider = en.G5k(conf)
39
40# Get actual resources
41roles, networks = provider.init()
42
43# Fill in network information from nodes
44roles = en.sync_info(roles, networks)
45
46for host in roles["client"] + roles["server"]:
47    # Find out which physical interface is connected to Kavlan network
48    interfaces = host.filter_interfaces(networks=networks["global"])
49    assert len(interfaces) == 1
50    interface_name = interfaces[0]
51    # Set MTU to 9000
52    cmd = f"ip link set {interface_name} mtu 9000"
53    en.run_command(cmd, task_name=cmd, roles=host, gather_facts=False)
54
55server = roles["server"][0]
56ip_address_obj = server.filter_addresses(networks=networks["global"])[0]
57# This may seem weird: ip_address_obj.ip is a `netaddr.IPv4Interface`
58# which itself has an `ip` attribute.
59server_private_ip = ip_address_obj.ip.ip
60
61# Run ping from client to server on the private network.
62# Ensure they are in the same L2 network (TTL=1) and that MTU is 9000.
63results = en.run_command(
64    f"ping -t 1 -c3 -M do -s 8972 {server_private_ip}", roles=roles["client"]
65)
66for result in results:
67    print(f"Ping from {result.host} to {server_private_ip}:")
68    print(f"{result.stdout}")
69
70
71# Release all Grid'5000 resources
72provider.destroy()

tuto_grid5000_kavlan_global.py

Hint

Although global kavlan networks are assigned to a site, they can be used on any other site. In addition, there is only a single kavlan network available on each site. Consequently, if you need several global kavlan networks for a single experiment, you need to pick them from different sites.

Using many Kavlan networks together#

For this much more complex example, we use the grisou cluster on which every node has 4 physical network interfaces. In addition, this example includes many advanced features:

  • how to setup a complex network topology involving several Grid’5000 sites

  • how to target specific network interfaces (here, the Intel X520 NIC of grisou nodes)

  • how to efficiently iterate on groups of hosts to setup routes

  • how to install python packages, copy a script and run it on target nodes

  • how to process results

  1import json
  2import logging
  3from pathlib import Path
  4
  5import enoslib as en
  6from enoslib.infra.enos_g5k.objects import G5kEnosVlan6Network
  7
  8en.init_logging(level=logging.INFO)
  9en.check()
 10
 11job_name = Path(__file__).name
 12
 13# Topology goal:
 14# (Nantes nodes) --- (node1.nancy) --- (node2.nancy) --- (Rennes nodes)
 15
 16# The site doesn't really matter, but let's be consistent with nodes
 17kavlan_global1 = en.G5kNetworkConf(
 18    type="kavlan-global",
 19    roles=["global1"],
 20    site="rennes",
 21)
 22kavlan_global2 = en.G5kNetworkConf(
 23    type="kavlan-global",
 24    roles=["global2"],
 25    site="nantes",
 26)
 27# Internal VLAN in Nancy
 28nancy_kavlan = en.G5kNetworkConf(type="kavlan-local", roles=["nancy"], site="nancy")
 29# Default network for nancy (see below)
 30nancy_prod = en.G5kNetworkConf(type="prod", roles=["prod"], site="nancy")
 31
 32# Request nodes from Rennes, Nantes and Nancy
 33conf = (
 34    en.G5kConf()
 35    .from_settings(
 36        job_type=["deploy"],
 37        env_name="debian11-nfs",
 38        job_name=job_name,
 39        walltime="00:30:00",
 40    )
 41    .add_network_conf(kavlan_global1)
 42    .add_network_conf(kavlan_global2)
 43    .add_network_conf(nancy_kavlan)
 44    .add_network_conf(nancy_prod)
 45    .add_machine(
 46        roles=["rennes"],
 47        cluster="paravance",
 48        nodes=2,
 49        secondary_networks=[kavlan_global1],
 50    )
 51    .add_machine(
 52        roles=["nantes"],
 53        cluster="ecotype",
 54        nodes=2,
 55        secondary_networks=[kavlan_global2],
 56    )
 57    # These two nodes in Nancy will act as routers: one as a gateway for
 58    # Rennes nodes, one as a gateway for Nantes nodes.
 59    .add_machine(
 60        roles=["nancy", "router", "gw-rennes"],
 61        cluster="grisou",
 62        nodes=1,
 63        # Demonstrates how to choose the correct physical network
 64        # interfaces.  Here, we assume we specifically want to use the
 65        # Intel X520 NIC on grisou:
 66        #
 67        # https://www.grid5000.fr/w/Nancy:Hardware#grisou
 68        #
 69        # To do this, we specify that "eth1" should simply use the regular
 70        # network, while "eth2" and "eth3" are configured with our kavlan
 71        # networks.
 72        secondary_networks=[nancy_prod, kavlan_global1, nancy_kavlan],
 73    )
 74    .add_machine(
 75        roles=["nancy", "router", "gw-nantes"],
 76        cluster="grisou",
 77        nodes=1,
 78        secondary_networks=[nancy_prod, kavlan_global2, nancy_kavlan],
 79    )
 80)
 81
 82provider = en.G5k(conf)
 83
 84# Get actual resources
 85roles, networks = provider.init()
 86
 87# Fill in network information from nodes
 88roles = en.sync_info(roles, networks)
 89
 90
 91# Helper functions
 92def get_ip(node, nets):
 93    """Returns the IPv4 address of the given node on the given network"""
 94    addresses = node.filter_addresses(networks=nets)
 95    if len(addresses) == 0:
 96        raise ValueError(f"Cannot determine IP address of node in nets: {node.address}")
 97    ip_address_obj = addresses[0]
 98    return ip_address_obj.ip.ip
 99
100
101def display_results(results):
102    for result in results:
103        print(f"# {result.host}")
104        print(f"{result.stdout}")
105
106
107gw_rennes = roles["gw-rennes"][0]
108gw_nantes = roles["gw-nantes"][0]
109# For each group, define which routes need to be added, and which nexthop
110# will be used for these routes.
111routes = {
112    "rennes": networks["nancy"] + networks["global2"],
113    "nantes": networks["nancy"] + networks["global1"],
114    "gw-rennes": networks["global2"],
115    "gw-nantes": networks["global1"],
116}
117nexthops = {
118    "rennes": get_ip(gw_rennes, networks["global1"]),
119    "nantes": get_ip(gw_nantes, networks["global2"]),
120    "gw-rennes": get_ip(gw_nantes, networks["nancy"]),
121    "gw-nantes": get_ip(gw_rennes, networks["nancy"]),
122}
123
124# Setup actual routes
125for group in routes.keys():
126    with en.actions(roles=roles[group]) as p:
127        nexthop = nexthops[group]
128        for net in routes[group]:
129            # No automatic IPv6 for now
130            if isinstance(net, G5kEnosVlan6Network):
131                continue
132            subnet = net.network
133            # Use "replace" instead of "add" to ensure indempotency
134            cmd = f"ip route replace {subnet} via {nexthop}"
135            p.command(cmd, task_name=f"route {subnet} via {nexthop}")
136
137# Enable IP forwarding on routers
138en.run_command("sysctl net.ipv4.ip_forward=1", roles=roles["router"])
139
140# Test connectivity from Rennes to Nancy
141target = get_ip(gw_nantes, networks["nancy"])
142cmd = f"ping -c 3 {target}"
143results = en.run_command(cmd, task_name=cmd, roles=roles["rennes"])
144display_results(results)
145
146# Test connectivity from Nantes to Nancy
147target = get_ip(gw_rennes, networks["nancy"])
148cmd = f"ping -c 3 {target}"
149results = en.run_command(cmd, task_name=cmd, roles=roles["nantes"])
150display_results(results)
151
152# Test connectivity from Nantes to Rennes, check latency and TTL.
153# We install pythonping and use it in a small python script to avoid
154# parsing ping output.
155pingscript = """
156import json
157import sys
158import pythonping
159res = pythonping.ping(sys.argv[1], interval=1, count=3)
160answer = list(res)[0]
161# pythonping does not expose the TTL, but we can access the raw IP header
162ttl = answer.message.packet.raw[8]
163display = dict(ttl=ttl, rtt_min_ms=res.rtt_min_ms)
164print(json.dumps(display))
165"""
166target_nodes = roles["rennes"]
167targets = [get_ip(node, networks["global1"]) for node in target_nodes]
168with en.actions(roles=roles["nantes"]) as p:
169    p.apt(name="python3-pip")
170    p.pip(name="pythonping>=1.1.4,<1.2")
171    p.copy(dest="/tmp/ping.py", content=pingscript)
172    for target in targets:
173        p.command(f"python3 /tmp/ping.py {target}", task_name=f"ping {target}")
174
175results = p.results
176
177# Print all pairs of pings and check validity
178for (target_node, target) in zip(target_nodes, targets):
179    for res in results.filter(task=f"ping {target}"):
180        print(f"# {res.host} -> {target_node.address} via Nancy")
181        data = json.loads(res.stdout)
182        print(f"TTL = {data['ttl']}")
183        print(f"Min RTT = {data['rtt_min_ms']} ms")
184        print()
185        assert data["ttl"] == 62
186        assert data["rtt_min_ms"] >= 20
187
188
189# Release all Grid'5000 resources
190provider.destroy()

tuto_grid5000_kavlan_global.py

Reconfigurable Firewall: Open ports to the external world#

The reconfigurable firewall on Grid’5000 allows you to open some ports of some of your nodes. One rationale for this would be to allow connection from FIT platform to Grid’5000. To learn more about this you can visit the dedicated documentation page.

 1import logging
 2import time
 3from pathlib import Path
 4
 5import enoslib as en
 6
 7en.init_logging(level=logging.INFO)
 8en.check()
 9
10job_name = Path(__file__).name
11
12conf = (
13    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
14    .add_machine(roles=["control"], cluster="paravance", nodes=1)
15    .add_machine(
16        roles=["compute"],
17        cluster="paravance",
18        nodes=1,
19    )
20)
21
22provider = en.G5k(conf)
23
24# Get actual resources
25roles, networks = provider.init()
26# Open port 22 for host in the control group
27# Add a firewall rule (just during the time of the context)
28# Alternatively you can use provider.fw_create/fw_delete
29with provider.firewall(hosts=roles["control"], port=80):
30    en.run("dhclient -6 br0", roles=roles["control"])
31    en.run("apt update && apt install -y nginx", roles=roles["control"])
32    result = en.run("ip -6 addr show dev br0", roles=roles["control"])
33
34    print("-" * 80)
35    print(f"Nginx available on IPV6: {result[0].stdout}")
36    time.sleep(3600)
37
38
39# Clean the firewall rules (not mandatory since this will be removed when
40# the job finishes)
41# provider.fw_delete()
42# Release all Grid'5000 resources
43provider.destroy()

tuto_grid5000_reconfigurable_firewall.py

Setting up Docker#

There’s a docker registry cache installed on Grid’5000 that can be used to speed up your docker-based deployment and also to overcome the docker pull limits. Also the /var partition is rather small. You may want to bind docker state directory /var/lib/docker to /tmp/docker to gain more space.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11CLUSTER = "paravance"
12
13conf = en.G5kConf.from_settings(
14    job_type=[], job_name=job_name, walltime="0:30:00"
15).add_machine(roles=["control"], cluster=CLUSTER, nodes=2)
16
17provider = en.G5k(conf)
18roles, networks = provider.init()
19
20
21registry_opts = dict(type="external", ip="docker-cache.grid5000.fr", port=80)
22
23d = en.Docker(
24    agent=roles["control"],
25    docker_version="25.0",
26    bind_var_docker="/tmp/docker",
27    registry_opts=registry_opts,
28    # Optional credentials for docker hub
29    # credentials=dict(login="mylogin", password="mytoken"),
30)
31d.deploy()
32
33
34# Release all Grid'5000 resources
35provider.destroy()

tuto_grid5000_docker.py

Resources inspection#

The G5k provider object exposes the actual hosts and networks. It allows for inspecting the acquired resources.

# Get all the reserved (and deployed) hosts:
provider.hosts

# Get all the networks
provider.networks

# Example on getting on host
[x] provider.hosts[0]
<G5kHost(roles=['control'], fqdn=grisou-8.nancy.grid5000.fr, ssh_address=grisou-8-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>

# Another example (what are the hosts in the same network as me)
[x] provider.hosts[0].primary_network.hosts
[<G5kHost(roles=['control'], fqdn=grisou-8.nancy.grid5000.fr, ssh_address=grisou-8-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>,
 <G5kHost(roles=['control', 'compute'], fqdn=grisou-9.nancy.grid5000.fr, ssh_address=grisou-9-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>]

Accessing internal services from the outside#

Sometimes, your experiment involves services that you deploy on Grid’5000 nodes, and you would like to access these services from outside Grid’5000 (e.g. from your laptop or from a server independent from Grid’5000).

There are several solutions depending on your requirements:

  • Native IPv6 connectivity: the reconfigurable firewall allows IPv6 connectivity to your Grid’5000 nodes from the Internet. This is the recommended method if your experiment is sensitive to network performance, because it uses native IP connectivity. See Reconfigurable Firewall: Open ports to the external world.

  • Grid’5000 VPN: this allows IPv4 connectivity to the Grid’5000 network. However, the VPN is a shared service and has no performance guarantee. This method is useful if you want to quickly check the state of web service from your laptop, but you should not connect external machines to the VPN to perform actual network-intensive experiments (e.g. network benchmarks, stress tests, or latency measurements)

  • SOCKS proxy tunnel for HTTP traffic:

    # on one shell
    ssh -ND 2100 access.grid5000.fr
    
    # on another shell
    export https_proxy="socks5h://localhost:2100"
    export http_proxy="socks5h://localhost:2100"
    
    # Note that browsers can work with proxy socks
    chromium-browser --proxy-server="socks5://127.0.0.1:2100" &
    
  • Grid’5000 HTTP reverse proxy. This method has several limitations: it only works for HTTP services listening on ports 80, 443, 8080 or 8443; it requires authenticating with your Grid’5000 credentials.

  • Manual SSH port forwarding:

    # on one shell
    ssh -NL 3000:paravance-42.rennes.grid5000.fr:3000 access.grid5000.fr
    
    # Now all traffic that goes on localhost:3000 is forwarded to paravance-42.rennes.grid5000.fr:3000
    
  • Programmatic SSH port forwarding: the same method, but programmatically with G5kTunnel. See also Create a tunnel to a service.

Using a custom operating system environment#

First, the description file of your environment should use resolvable URIs for the kadeploy3 server. An example of such description is the following

# myimage.desc and myimage.tgz are both located in
# the public subdirectory of rennes site of the user {{ YOURLOGIN }}
---
name: ubuntu1804-x64-min
version: 2019052116
description: ubuntu 18.04 (bionic) - min
author: support-staff@list.grid5000.fr
visibility: public
destructive: false
os: linux
image:
file: https://api.grid5000.fr/sid/sites/rennes/public/{{ YOURLOGIN }}/myimage.tgz
kind: tar
compression: gzip
postinstalls:
- archive: server:///grid5000/postinstalls/g5k-postinstall.tgz
compression: gzip
script: g5k-postinstall --net netplan
boot:
kernel: "/vmlinuz"
initrd: "/initrd.img"
filesystem: ext4
partition_type: 131
multipart: false

Then in the configuration of the Grid’5000 provider you need to specify the URL where your description file can be found:

conf = en.G5kConf.from_settings(
  job_name="test_myimage",
  job_type=["deploy"],
  env_name="https://api.grid5000.fr/sid/sites/rennes/public/{{ YOURLOGIN }}/myimage.desc",
)

Subnet reservation#

This shows how to make a subnet reservation, which is useful if you want to manually run containers or virtual machines on your Grid’5000 nodes.

Build the configuration from a dictionary#

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11provider_conf = {
12    "job_name": job_name,
13    "walltime": "0:10:00",
14    "resources": {
15        "machines": [
16            {
17                "roles": ["control"],
18                "cluster": "paravance",
19                "nodes": 1,
20            }
21        ],
22        "networks": [
23            {
24                "id": "not_linked_to_any_machine",
25                "type": "slash_22",
26                "roles": ["my_subnet"],
27                "site": "rennes",
28            },
29        ],
30    },
31}
32
33# claim the resources
34conf = en.G5kConf.from_dictionary(provider_conf)
35provider = en.G5k(conf)
36
37# Get actual resources
38roles, networks = provider.init()
39
40# Retrieving subnet
41subnet = networks["my_subnet"][0]
42logging.info(subnet.__dict__)
43# This returns the subnet information:
44# subnet.network -> IPv4Network('10.158.0.0/22')
45# subnet.gateway -> IPv4Address('10.159.255.254')
46
47# Do your stuff here
48# ...
49
50
51# Release all Grid'5000 resources
52provider.destroy()

tuto_grid5000_subnet.py

Build the configuration programmatically#

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf.from_settings(job_name=job_name, job_type=[], walltime="0:10:00")
13    .add_network(
14        id="not_linked_to_any_machine",
15        type="slash_16",
16        roles=["my_subnet"],
17        site="rennes",
18    )
19    .add_machine(roles=["control"], cluster="paravance", nodes=1)
20)
21
22provider = en.G5k(conf)
23
24# Get actual resources
25
26roles, networks = provider.init()
27
28
29# Release all Grid'5000 resources
30provider.destroy()

tuto_grid5000_subnet_p.py

Create a tunnel to a service#

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = en.G5kConf.from_settings(
12    job_type=[], job_name=job_name, walltime="0:20:00"
13).add_machine(roles=["control"], cluster="parasilo", nodes=1)
14
15provider = en.G5k(conf)
16roles, networks = provider.init()
17
18with en.play_on(roles=roles) as p:
19    p.apt(name="nginx", state="present")
20    p.wait_for(host="localhost", port=80, state="started")
21
22with en.G5kTunnel(roles["control"][0].address, 80) as (local_address, local_port, _):
23    import requests
24
25    response = requests.get(f"http://{local_address}:{local_port}")
26    print(response.text)
27
28
29# Release all Grid'5000 resources
30provider.destroy()

tuto_grid5000_tunnel.py

Disabling the cache#

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9# Disabling the cache
10en.set_config(g5k_cache=False)
11
12job_name = Path(__file__).name
13
14conf = (
15    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
16    .add_machine(roles=["control"], cluster="paravance", nodes=1)
17    .add_machine(
18        roles=["control", "network"],
19        cluster="paravance",
20        nodes=1,
21    )
22)
23
24provider = en.G5k(conf)
25
26# Get actual resources
27roles, networks = provider.init()
28# Do your stuff here
29# ...
30
31
32# Release all Grid'5000 resources
33provider.destroy()

tuto_grid5000_disable_cache.py