Grid’5000 tutorials #

This tutorial illustrates the use of EnOSlib to interact with Grid’5000. For a full description of the API and the available options, please refer to the API documentation of the Grid’5000 provider.

Hint

For a complete schema reference see G5k Schema

Installation #

To use Grid’5000 with Enoslib, you can go with a virtualenv :

$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install -U pip

$ pip install enoslib

Configuration #

Since python-grid5000 is used behind the scene, the configuration is read from a configuration file located in the home directory. It can be created with the following:

echo '
username: MYLOGIN
password: MYPASSWORD
' > ~/.python-grid5000.yaml

chmod 600 ~/.python-grid5000.yaml

The above configuration should work both from a Grid’5000 frontend machine and from your local machine as well.

External access (from your laptop)#

If you are running your experiment from outside Grid’5000 (e.g. from your local machine), using a SSH jump host is required.

Enoslib (version 8.1.0 and above) will automatically setup such a SSH jump host connection through access.grid5000.fr when it detects you are working outside of the Grid’5000 network. See Global configuration if you need to configure this behaviour.

Hint

Using a SSH jump host does not provide the best performance when controlling a large number of nodes. This is because the number of simultaneous SSH connection is limited on the jump host. See Performance tuning for many tips and tricks to improve performance.

First reservation example #

The following shows how to deal with a basic reservation. This will use nodes running the standard Grid’5000 software environment (Debian stable with performance tuning and many pre-installed software) and connect to them over SSH as root. For this purpose you must have a ~/.ssh/id_rsa.pub file available. Using the standard Grid’5000 environment is good for prototyping, but not for scientific experiments that care about reproducibility: we’ll see later how to deploy a specific operating system.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(job_name=job_name, walltime="0:10:00")
    .add_machine(roles=["groupA"], cluster="paravance", nodes=1)
    .add_machine(roles=["groupB"], cluster="parasilo", nodes=1)
)

# This will validate the configuration, but not reserve resources yet
provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()
# Do your stuff here
# ...


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_basic.py

To run this experiment, you just have to launch the script:

$ python tuto_grid5000_basic.py

The script will output the different steps needed to reserve and provision the physical nodes. However, we don’t actually do anything with the nodes yet, so the script will finish rather quickly.

Using roles to run commands #

After Grid’5000 machines are provisioned, they are assigned to their roles, which can be used to run commands in parallel:

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(job_name=job_name, walltime="0:10:00")
    .add_machine(roles=["compute", "control"], cluster="paravance", nodes=1)
    .add_machine(roles=["compute"], cluster="paravance", nodes=1)
)

# This will validate the configuration, but not reserve resources yet
provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Run a command on all hosts belonging to a given role
results = en.run_command("nproc", roles=roles["compute"])
for result in results:
    print(f"{result.host} has {result.payload['stdout']} logical CPU cores")

# Run a command on all hosts, whatever their roles
results = en.run_command("uname -a", roles=roles)
for result in results:
    print(result.payload["stdout"])


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_commands.py

See Ansible Integration for more details about running commands and configuring your experimental machines.

Deploying operating systems #

Grid’5000 provides several operating systems that can be “deployed” (i.e. installed automatically) on all of your nodes. To specify the operating system, use env_name as well as the deploy job type:

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(
        job_name=job_name,
        job_type=["deploy"],
        env_name="ubuntu2204-min",
        walltime="0:20:00",
    )
    .add_machine(roles=["groupA"], cluster="paravance", nodes=1)
    .add_machine(roles=["groupB"], cluster="parasilo", nodes=1)
)

# This will validate the configuration, but not reserve resources yet
provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

results = en.run_command("lsb_release -a", roles=roles)
for result in results:
    print(result.payload["stdout"])


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_deploy.py

Finished 1 tasks (lsb_release -a)
──────────────────────────────────────
Distributor ID:     Ubuntu
Description:        Ubuntu 22.04.1 LTS
Release:    22.04
Codename:   jammy

Deployment takes a few minutes, with some variation depending on cluster hardware.

The full list of available operating systems is in the Grid’5000 documentation.

To obtain a minimal environment that reflects the default settings of the operating system, use a -min environment. You will likely have to install additional packages and tools for your experiments.

If you need to share data on a network filesystem (available under /home/YOURLOGIN/), use a -nfs or -big environment.

Using reservable disks on nodes #

Grid’5000 has a disk reservation feature: on several clusters, reserving secondary disks is mandatory if you want to use them in your experiments.

The following tutorial shows how to reserve the disks using Enoslib, and then how they can be used a raw devices. Here the goal is to build a software RAID array with mdadm and then benchmark it using fio:

import json
import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = en.G5kConf.from_settings(
    job_name=job_name, job_type=[], walltime="0:30:00"
).add_machine(
    roles=["storage"],
    cluster="grimoire",
    nodes=2,
    reservable_disks=True,
)

provider = en.G5k(conf)

# Get actual resources
roles, _ = provider.init()

with en.actions(roles=roles) as p:
    # Check that the expected disks are present.
    # https://www.grid5000.fr/w/Nancy:Hardware#grimoire
    # Notice that we use the "diskN" aliases because they are more
    # stable than "sdX".
    disks = ["disk1", "disk2", "disk3", "disk4"]
    for disk in disks:
        p.command(f"test -e /dev/{disk}", task_name=f"Check availability of {disk}")

    # Partition disks
    for disk in disks:
        p.shell(
            f"echo -e 'label: gpt\n,,raid' | sfdisk --no-reread /dev/{disk}",
            task_name=f"Create partition on {disk}",
        )

    # Create a software RAID-5 array
    nb_disks = len(disks)
    raid_parts = " ".join(f"/dev/{disk}p1" for disk in disks)
    p.shell(
        f"grep -q md0 /proc/mdstat || "
        f"mdadm --create /dev/md0 --run --level 5 "
        f"--raid-devices {nb_disks} {raid_parts}",
        task_name="Create RAID array",
    )

    # Run FIO to benchmark the array (at the block device level)
    p.apt(name="fio", state="present", task_name="Install fio")
    p.command(
        "fio --output-format=json --name=enoslib --ioengine=libaio "
        "--direct=1 --gtod_reduce=1 --readwrite=randread "
        "--bs=4K --iodepth=8 --numjobs=8 --runtime 30s "
        "--filename=/dev/md0",
        task_name="Run fio",
    )

    # Destroy everything
    p.command("mdadm --stop /dev/md0", task_name="Stop RAID array")
    p.command(f"wipefs -a {raid_parts}", task_name="Wipe RAID signatures")

results = p.results

# Get output of FIO and print result
res_per_node = {res.host: res.stdout for res in results.filter(task="Run fio")}
for host, output in res_per_node.items():
    data = json.loads(output)
    # Sum performance of all parallel FIO "jobs"
    read_perf_iops = sum(job["read"]["iops"] for job in data["jobs"])
    print(
        f"{data['fio version']} running on {host}: "
        f"average /dev/md0 read performance = {read_perf_iops:.2f} IOPS"
    )


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_reservable_disks.py

Finished 1 tasks (Granting root access on the nodes (sudo-g5k))
─────────────────────────────────────────────────────────────────────────────────────────────────────────
Finished 13 tasks (Check availability of disk1,Check availability of disk2,Check availability of
disk3,Check availability of disk4,Create partition on disk1,Create partition on disk2,Create partition on
disk3,Create partition on disk4,Create RAID array,Install fio,Run fio,Stop RAID array,Wipe RAID
signatures)
─────────────────────────────────────────────────────────────────────────────────────────────────────────
fio-3.25 running on grimoire-8.nancy.grid5000.fr: average /dev/md0 read performance = 550.67 IOPS
fio-3.25 running on grimoire-6.nancy.grid5000.fr: average /dev/md0 read performance = 519.71 IOPS

Specific nodes reservation #

On Grid’5000, machines belonging to a given cluster are normally homogeneous. But it is impossible to provide absolute guarantees about it: for instance, physical disks may have different performance characteristics across nodes of a cluster even though they share the same vendor and model. For this reason, experimenters may need to reproduce an experiment several times using the exact same hardware.

This is possible by specifying nodes with their exact name. By default all the servers specified this way will get reserved unless you specify a target number of nodes using the nodes parameter.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf()
    .from_settings(job_name=job_name, walltime="0:10:00")
    .add_machine(
        roles=["compute"],
        servers=["paravance-19.rennes.grid5000.fr", "paravance-20.rennes.grid5000.fr"],
    )
    .add_machine(
        roles=["compute"],
        servers=[f"parasilo-{i}.rennes.grid5000.fr" for i in range(10, 20)],
        nodes=3,
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()
# Do your stuff here
# ...


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_specific_servers.py

This is an advanced feature: if the required nodes are not available, the experiment will either wait for the resources to become available (e.g. if another user is currently using them) or fail (e.g. if one machine is down due to a maintenance or hardware issue).

Multi-sites experiments #

To run an experiment involving multiple Grid’5000 sites, you simply have to request clusters from each site in the same configuration. For instance, to request nodes from Lille and Rennes (with convenient roles) and check connectivity:

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# fmt: off
conf = (
    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
    # For convenience, we use the site name as role
    .add_machine(roles=["rennes", "intel"], cluster="paravance", nodes=1)
    .add_machine(roles=["lille", "amd"], cluster="chiclet", nodes=1)
)
# fmt: on

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Check connectivity from Rennes to Lille
target = roles["lille"][0]
results = en.run_command(f"ping -c3 {target.address}", roles=roles["rennes"])
for result in results:
    print(f"Ping from {result.host} to {target.address}:")
    print(f"{result.stdout}")


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_multisites.py

Network-wise, traffic between sites is routed (layer 3) over the Grid’5000 network backbone. If you need nodes from different sites to share the same layer-2 network, you need a global kavlan, see Dedicated networks (kavlan).

Note

There is no global scheduler on Grid’5000. Multi-sites reservation involves finding a common slot to start the jobs on each requested site at the same time. EnOSlib will do that for you. The logic behind it is part of a more generic logic that can synchronize resources between distinct providers.

Dedicated networks (kavlan)#

Kavlan allows to create dedicated networks that are isolated on layer 2, and then reconfigure the physical network interfaces of nodes to put them in these dedicated networks.

Kavlan on secondary interfaces#

We explicitly put the second network interface of each node on a dedicated vlan. The primary interface is still implicitly in the default network. Note that using Kavlan currently requires an OS deployment.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")

conf = (
    en.G5kConf.from_settings(
        job_name=job_name,
        job_type=["deploy"],
        env_name="debian11-nfs",
        walltime="0:20:00",
    )
    .add_network_conf(private_net)
    .add_machine(
        roles=["server"],
        cluster="paravance",
        nodes=1,
        secondary_networks=[private_net],
    )
    .add_machine(
        roles=["client"],
        cluster="paravance",
        nodes=1,
        secondary_networks=[private_net],
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Fill in network information from nodes
roles = en.sync_info(roles, networks)

# Get server's IP address on the private network
server = roles["server"][0]
ip_address_obj = server.filter_addresses(networks=networks["private"])[0]
# This may seem weird: ip_address_obj.ip is a `netaddr.IPv4Interface`
# which itself has an `ip` attribute.
server_private_ip = ip_address_obj.ip.ip

# Run ping from client to server on the private network
results = en.run_command(f"ping -c3 {server_private_ip}", roles=roles["client"])
for result in results:
    print(f"Ping from {result.host} to {server_private_ip}:")
    print(f"{result.stdout}")


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_kavlan_secondary.py

Hint

You have to make sure that the cluster you select has at least two physical network interfaces. Check the List of Hardware to choose a suitable cluster.

Kavlan on primary interface#

The primary network interface of the nodes is a special case, because Enoslib uses it to manage the nodes through SSH. The primary interface can still be configured in a Kavlan network, but be aware that you should not break connectivity on this interface.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")

conf = (
    en.G5kConf.from_settings(
        job_name=job_name,
        job_type=["deploy"],
        env_name="debian11-nfs",
        walltime="0:20:00",
    )
    .add_network_conf(private_net)
    .add_machine(
        roles=["roleA"], cluster="paravance", nodes=2, primary_network=private_net
    )
    .finalize()
)

provider = en.G5k(conf)
# Get actual resources
roles, networks = provider.init()

# Show kavlan subnet
print("Kavlan subnet:", networks["private"][0].network)

# The nodes use this kavlan network for all traffic
# (the network is interconnected at layer-3 with the rest of Grid'5000)
results = en.run_command("ip route get 9.9.9.9", roles=roles["roleA"])
for result in results:
    print(f"{result.stdout}")


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_kavlan_primary.py

Multi-sites layer-2 connectivity with global Kavlan#

Each global kavlan network is a layer-2 network that spans all Grid’5000 sites. This is very useful when you want to experiment with software routers in different locations and you need direct layer-2 connectivity between them.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# A globlal Kavlan can be reserved on any site
kavlan_global = en.G5kNetworkConf(type="kavlan-global", roles=["global"], site="lille")

# Request nodes from Rennes and Lille
conf = (
    en.G5kConf()
    .from_settings(
        job_type=["deploy"],
        env_name="debian11-nfs",
        job_name=job_name,
        walltime="00:50:00",
    )
    .add_network_conf(kavlan_global)
    .add_machine(
        roles=["rennes", "client"],
        cluster="paravance",
        nodes=1,
        secondary_networks=[kavlan_global],
    )
    .add_machine(
        roles=["lille", "server"],
        cluster="chiclet",
        nodes=1,
        secondary_networks=[kavlan_global],
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Fill in network information from nodes
roles = en.sync_info(roles, networks)

for host in roles["client"] + roles["server"]:
    # Find out which physical interface is connected to Kavlan network
    interfaces = host.filter_interfaces(networks=networks["global"])
    assert len(interfaces) == 1
    interface_name = interfaces[0]
    # Set MTU to 9000
    cmd = f"ip link set {interface_name} mtu 9000"
    en.run_command(cmd, task_name=cmd, roles=host, gather_facts=False)

server = roles["server"][0]
ip_address_obj = server.filter_addresses(networks=networks["global"])[0]
# This may seem weird: ip_address_obj.ip is a `netaddr.IPv4Interface`
# which itself has an `ip` attribute.
server_private_ip = ip_address_obj.ip.ip

# Run ping from client to server on the private network.
# Ensure they are in the same L2 network (TTL=1) and that MTU is 9000.
results = en.run_command(
    f"ping -t 1 -c3 -M do -s 8972 {server_private_ip}", roles=roles["client"]
)
for result in results:
    print(f"Ping from {result.host} to {server_private_ip}:")
    print(f"{result.stdout}")


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_kavlan_global.py

Hint

Although global kavlan networks are assigned to a site, they can be used on any other site. In addition, there is only a single kavlan network available on each site. Consequently, if you need several global kavlan networks for a single experiment, you need to pick them from different sites.

Using many Kavlan networks together#

For this much more complex example, we use the grisou cluster on which every node has 4 physical network interfaces. In addition, this example includes many advanced features:

how to setup a complex network topology involving several Grid’5000 sites
how to target specific network interfaces (here, the Intel X520 NIC of grisou nodes)
how to efficiently iterate on groups of hosts to setup routes
how to install python packages, copy a script and run it on target nodes
how to process results

import json
import logging
from pathlib import Path

import enoslib as en
from enoslib.infra.enos_g5k.objects import G5kEnosVlan6Network

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# Topology goal:
# (Nantes nodes) --- (node1.nancy) --- (node2.nancy) --- (Rennes nodes)

# The site doesn't really matter, but let's be consistent with nodes
kavlan_global1 = en.G5kNetworkConf(
    type="kavlan-global",
    roles=["global1"],
    site="rennes",
)
kavlan_global2 = en.G5kNetworkConf(
    type="kavlan-global",
    roles=["global2"],
    site="nantes",
)
# Internal VLAN in Nancy
nancy_kavlan = en.G5kNetworkConf(type="kavlan-local", roles=["nancy"], site="nancy")
# Default network for nancy (see below)
nancy_prod = en.G5kNetworkConf(type="prod", roles=["prod"], site="nancy")

# Request nodes from Rennes, Nantes and Nancy
conf = (
    en.G5kConf()
    .from_settings(
        job_type=["deploy"],
        env_name="debian11-nfs",
        job_name=job_name,
        walltime="00:30:00",
    )
    .add_network_conf(kavlan_global1)
    .add_network_conf(kavlan_global2)
    .add_network_conf(nancy_kavlan)
    .add_network_conf(nancy_prod)
    .add_machine(
        roles=["rennes"],
        cluster="paravance",
        nodes=2,
        secondary_networks=[kavlan_global1],
    )
    .add_machine(
        roles=["nantes"],
        cluster="econome",
        nodes=2,
        secondary_networks=[kavlan_global2],
    )
    # These two nodes in Nancy will act as routers: one as a gateway for
    # Rennes nodes, one as a gateway for Nantes nodes.
    .add_machine(
        roles=["nancy", "router", "gw-rennes"],
        cluster="grisou",
        nodes=1,
        # Demonstrates how to choose the correct physical network
        # interfaces.  Here, we assume we specifically want to use the
        # Intel X520 NIC on grisou:
        #
        # https://www.grid5000.fr/w/Nancy:Hardware#grisou
        #
        # To do this, we specify that "eth1" should simply use the regular
        # network, while "eth2" and "eth3" are configured with our kavlan
        # networks.
        secondary_networks=[nancy_prod, kavlan_global1, nancy_kavlan],
    )
    .add_machine(
        roles=["nancy", "router", "gw-nantes"],
        cluster="grisou",
        nodes=1,
        secondary_networks=[nancy_prod, kavlan_global2, nancy_kavlan],
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Fill in network information from nodes
roles = en.sync_info(roles, networks)


# Helper functions
def get_ip(node, nets):
    """Returns the IPv4 address of the given node on the given network"""
    addresses = node.filter_addresses(networks=nets)
    if len(addresses) == 0:
        raise ValueError(f"Cannot determine IP address of node in nets: {node.address}")
    ip_address_obj = addresses[0]
    return ip_address_obj.ip.ip


def display_results(results):
    for result in results:
        print(f"# {result.host}")
        print(f"{result.stdout}")


gw_rennes = roles["gw-rennes"][0]
gw_nantes = roles["gw-nantes"][0]
# For each group, define which routes need to be added, and which nexthop
# will be used for these routes.
routes = {
    "rennes": networks["nancy"] + networks["global2"],
    "nantes": networks["nancy"] + networks["global1"],
    "gw-rennes": networks["global2"],
    "gw-nantes": networks["global1"],
}
nexthops = {
    "rennes": get_ip(gw_rennes, networks["global1"]),
    "nantes": get_ip(gw_nantes, networks["global2"]),
    "gw-rennes": get_ip(gw_nantes, networks["nancy"]),
    "gw-nantes": get_ip(gw_rennes, networks["nancy"]),
}

# Setup actual routes
for group in routes.keys():
    with en.actions(roles=roles[group]) as p:
        nexthop = nexthops[group]
        for net in routes[group]:
            # No automatic IPv6 for now
            if isinstance(net, G5kEnosVlan6Network):
                continue
            subnet = net.network
            # Use "replace" instead of "add" to ensure indempotency
            cmd = f"ip route replace {subnet} via {nexthop}"
            p.command(cmd, task_name=f"route {subnet} via {nexthop}")

# Enable IP forwarding on routers
en.run_command("sysctl net.ipv4.ip_forward=1", roles=roles["router"])

# Test connectivity from Rennes to Nancy
target = get_ip(gw_nantes, networks["nancy"])
cmd = f"ping -c 3 {target}"
results = en.run_command(cmd, task_name=cmd, roles=roles["rennes"])
display_results(results)

# Test connectivity from Nantes to Nancy
target = get_ip(gw_rennes, networks["nancy"])
cmd = f"ping -c 3 {target}"
results = en.run_command(cmd, task_name=cmd, roles=roles["nantes"])
display_results(results)

# Test connectivity from Nantes to Rennes, check latency and TTL.
# We install pythonping and use it in a small python script to avoid
# parsing ping output.
pingscript = """
import json
import sys
import pythonping
res = pythonping.ping(sys.argv[1], interval=1, count=3)
answer = list(res)[0]
# pythonping does not expose the TTL, but we can access the raw IP header
ttl = answer.message.packet.raw[8]
display = dict(ttl=ttl, rtt_min_ms=res.rtt_min_ms)
print(json.dumps(display))
"""
target_nodes = roles["rennes"]
targets = [get_ip(node, networks["global1"]) for node in target_nodes]
with en.actions(roles=roles["nantes"]) as p:
    p.apt(name="python3-pip")
    p.pip(name="pythonping>=1.1.4,<1.2")
    p.copy(dest="/tmp/ping.py", content=pingscript)
    for target in targets:
        p.command(f"python3 /tmp/ping.py {target}", task_name=f"ping {target}")

results = p.results

# Print all pairs of pings and check validity
for (target_node, target) in zip(target_nodes, targets):
    for res in results.filter(task=f"ping {target}"):
        print(f"# {res.host} -> {target_node.address} via Nancy")
        data = json.loads(res.stdout)
        print(f"TTL = {data['ttl']}")
        print(f"Min RTT = {data['rtt_min_ms']} ms")
        print()
        assert data["ttl"] == 62
        assert data["rtt_min_ms"] >= 20


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_kavlan_global.py

Reconfigurable Firewall: Open ports to the external world #

The reconfigurable firewall on Grid’5000 allows you to open some ports of some of your nodes. One rationale for this would be to allow connection from FIT platform to Grid’5000. To learn more about this you can visit the dedicated documentation page.

import logging
import time
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
    .add_machine(roles=["control"], cluster="paravance", nodes=1)
    .add_machine(
        roles=["compute"],
        cluster="paravance",
        nodes=1,
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()
# Open port 22 for host in the control group
# Add a firewall rule (just during the time of the context)
# Alternatively you can use provider.fw_create/fw_delete
with provider.firewall(hosts=roles["control"], port=80):
    en.run("dhclient -6 br0", roles=roles["control"])
    en.run("apt update && apt install -y nginx", roles=roles["control"])
    result = en.run("ip -6 addr show dev br0", roles=roles["control"])

    print("-" * 80)
    print(f"Nginx available on IPV6: {result[0].stdout}")
    time.sleep(3600)


# Clean the firewall rules (not mandatory since this will be removed when
# the job finishes)
# provider.fw_delete()
# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_reconfigurable_firewall.py

Setting up Docker #

There’s a docker registry cache installed on Grid’5000 that can be used to speed up your docker-based deployment and also to overcome the docker pull limits. Also the /var partition is rather small. You may want to bind docker state directory /var/lib/docker to /tmp/docker to gain more space.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

CLUSTER = "paravance"

conf = en.G5kConf.from_settings(
    job_type=[], job_name=job_name, walltime="0:30:00"
).add_machine(roles=["control"], cluster=CLUSTER, nodes=2)

provider = en.G5k(conf)
roles, networks = provider.init()


registry_opts = dict(type="external", ip="docker-cache.grid5000.fr", port=80)

d = en.Docker(
    agent=roles["control"],
    docker_version="25.0",
    bind_var_docker="/tmp/docker",
    registry_opts=registry_opts,
    # Optional credentials for docker hub
    # credentials=dict(login="mylogin", password="mytoken"),
)
d.deploy()


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_docker.py

Resources inspection #

The G5k provider object exposes the actual hosts and networks. It allows for inspecting the acquired resources.

# Get all the reserved (and deployed) hosts:
provider.hosts

# Get all the networks
provider.networks

# Example on getting on host
[x] provider.hosts[0]
<G5kHost(roles=['control'], fqdn=grisou-8.nancy.grid5000.fr, ssh_address=grisou-8-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>

# Another example (what are the hosts in the same network as me)
[x] provider.hosts[0].primary_network.hosts
[<G5kHost(roles=['control'], fqdn=grisou-8.nancy.grid5000.fr, ssh_address=grisou-8-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>,
 <G5kHost(roles=['control', 'compute'], fqdn=grisou-9.nancy.grid5000.fr, ssh_address=grisou-9-kavlan-4.nancy.grid5000.fr, primary_network=<G5kVlanNetwork(roles=['my_network'], site=nancy, vlan_id=4)>, secondary_networks=[<G5kVlanNetwork(roles=['my_second_network'], site=nancy, vlan_id=5)>])>]

Accessing internal services from the outside #

Sometimes, your experiment involves services that you deploy on Grid’5000 nodes, and you would like to access these services from outside Grid’5000 (e.g. from your laptop or from a server independent from Grid’5000).

There are several solutions depending on your requirements:

Native IPv6 connectivity: the reconfigurable firewall allows IPv6 connectivity to your Grid’5000 nodes from the Internet. This is the recommended method if your experiment is sensitive to network performance, because it uses native IP connectivity. See Reconfigurable Firewall: Open ports to the external world.
Grid’5000 VPN: this allows IPv4 connectivity to the Grid’5000 network. However, the VPN is a shared service and has no performance guarantee. This method is useful if you want to quickly check the state of web service from your laptop, but you should not connect external machines to the VPN to perform actual network-intensive experiments (e.g. network benchmarks, stress tests, or latency measurements)

SOCKS proxy tunnel for HTTP traffic:

# on one shell
ssh -ND 2100 access.grid5000.fr

# on another shell
export https_proxy="socks5h://localhost:2100"
export http_proxy="socks5h://localhost:2100"

# Note that browsers can work with proxy socks
chromium-browser --proxy-server="socks5://127.0.0.1:2100" &

Grid’5000 HTTP reverse proxy. This method has several limitations: it only works for HTTP services listening on ports 80, 443, 8080 or 8443; it requires authenticating with your Grid’5000 credentials.

Manual SSH port forwarding:

# on one shell
ssh -NL 3000:paravance-42.rennes.grid5000.fr:3000 access.grid5000.fr

# Now all traffic that goes on localhost:3000 is forwarded to paravance-42.rennes.grid5000.fr:3000

Programmatic SSH port forwarding: the same method, but programmatically with G5kTunnel. See also Create a tunnel to a service.

Using a custom operating system environment #

First, the description file of your environment should use resolvable URIs for the kadeploy3 server. An example of such description is the following

# myimage.desc and myimage.tgz are both located in
# the public subdirectory of rennes site of the user {{ YOURLOGIN }}
---
name: ubuntu1804-x64-min
version: 2019052116
description: ubuntu 18.04 (bionic) - min
author: support-staff@list.grid5000.fr
visibility: public
destructive: false
os: linux
image:
file: https://api.grid5000.fr/sid/sites/rennes/public/{{ YOURLOGIN }}/myimage.tgz
kind: tar
compression: gzip
postinstalls:
- archive: server:///grid5000/postinstalls/g5k-postinstall.tgz
compression: gzip
script: g5k-postinstall --net netplan
boot:
kernel: "/vmlinuz"
initrd: "/initrd.img"
filesystem: ext4
partition_type: 131
multipart: false

Then in the configuration of the Grid’5000 provider you need to specify the URL where your description file can be found:

conf = en.G5kConf.from_settings(
  job_name="test_myimage",
  job_type=["deploy"],
  env_name="https://api.grid5000.fr/sid/sites/rennes/public/{{ YOURLOGIN }}/myimage.desc",
)

Subnet reservation #

This shows how to make a subnet reservation, which is useful if you want to manually run containers or virtual machines on your Grid’5000 nodes.

Build the configuration from a dictionary#

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

provider_conf = {
    "job_name": job_name,
    "walltime": "0:10:00",
    "resources": {
        "machines": [
            {
                "roles": ["control"],
                "cluster": "paravance",
                "nodes": 1,
            }
        ],
        "networks": [
            {
                "id": "not_linked_to_any_machine",
                "type": "slash_22",
                "roles": ["my_subnet"],
                "site": "rennes",
            },
        ],
    },
}

# claim the resources
conf = en.G5kConf.from_dictionary(provider_conf)
provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Retrieving subnet
subnet = networks["my_subnet"][0]
logging.info(subnet.__dict__)
# This returns the subnet information:
# subnet.network -> IPv4Network('10.158.0.0/22')
# subnet.gateway -> IPv4Address('10.159.255.254')

# Do your stuff here
# ...


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_subnet.py

Build the configuration programmatically#

import logging
from pathlib import Path

import enoslib as en

en.init_logging(logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(job_name=job_name, job_type=[], walltime="0:10:00")
    .add_network(
        id="not_linked_to_any_machine",
        type="slash_16",
        roles=["my_subnet"],
        site="rennes",
    )
    .add_machine(roles=["control"], cluster="paravance", nodes=1)
)

provider = en.G5k(conf)

# Get actual resources

roles, networks = provider.init()


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_subnet_p.py

Create a tunnel to a service #

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = en.G5kConf.from_settings(
    job_type=[], job_name=job_name, walltime="0:20:00"
).add_machine(roles=["control"], cluster="parasilo", nodes=1)

provider = en.G5k(conf)
roles, networks = provider.init()

with en.play_on(roles=roles) as p:
    p.apt(name="nginx", state="present")
    p.wait_for(host="localhost", port=80, state="started")

with en.G5kTunnel(roles["control"][0].address, 80) as (local_address, local_port, _):
    import requests

    response = requests.get(f"http://{local_address}:{local_port}")
    print(response.text)


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_tunnel.py

Disabling the cache #

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

# Disabling the cache
en.set_config(g5k_cache=False)

job_name = Path(__file__).name

conf = (
    en.G5kConf.from_settings(job_type=[], job_name=job_name, walltime="0:10:00")
    .add_machine(roles=["control"], cluster="paravance", nodes=1)
    .add_machine(
        roles=["control", "network"],
        cluster="paravance",
        nodes=1,
    )
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()
# Do your stuff here
# ...


# Release all Grid'5000 resources
provider.destroy()

tuto_grid5000_disable_cache.py