Resources selection and environment control#

Get the resources that fit your need in terms of servers characteristics, network, disks and Operating Systems. Controlling what you get is a first step towards experiments reproducibility.

Website: https://discovery.gitlabpages.inria.fr/enoslib/index.html
Instant chat: https://framateam.org/enoslib
Source code: https://gitlab.inria.fr/discovery/enoslib

Prerequisites#

⚠️ Make sure you’ve run the one time setup for your environment
⚠️ Make sure you’re running this notebook under the right kernel

[ ]:

import enoslib as en

# Display some general information about the library
en.check()

# Enable rich logging
_ = en.init_logging()

General considerations#

Grid’5000 uses the OAR scheduler behind the scene. The scheduler has powerful resource selections capabilities. You can refer to some of the Grid’5000 tutorials to explore them.

EnOSlib exposes a higher level interface for selecting resources which is based on the Grid’5000 REST API (which wraps OAR). In EnOSlib you can reserve compute resources, networks (provided by Grid5000) and disks with the following assumptions:

Nodes are reserved as a whole (this makes a difference with OAR that supports reserving part of a node) but multisite is transparent using EnOSlib
Networks are those offered by Grid’5000 (Layer 3 subnets and Layer 2 VLANS - possibly spanning multiple sites)
Local disks are reserved with there associated machines

## Nodes selection

### By cluster name

In EnOSlib you can reserve some nodes by specifying the cluster name. The summary of all the available cluster is summarized in the hardware page of the Grid’5000 documentation. EnOSlib supports the so called multisite experiments (experiments spanning different sites) easily. To illustrate this let’s reserve nodes from different sites. The multisites experiment requires to synchronize jobs on different sites. EnOSlib eases this process for you.

💡 You might check the availability page (non production nodes)

[ ]:

job_name="multisite"
conf = (
    en.G5kConf.from_settings(job_name=job_name, walltime="0:10:00")
    # For convenience, we use the site name as role but that's only informative
    # paravance cluster has many nodes (rennes site)
    .add_machine(roles=["rennes", "intel"], cluster="paravance", nodes=1)
    # chiclet cluster has only 8 nodes (lille site)
    .add_machine(roles=["lille", "amd"], cluster="chiclet", nodes=1)
)
provider = en.G5k(conf)

[ ]:

roles, networks = provider.init()

[ ]:

en.run_command("cat /proc/cpuinfo", roles=roles)

[ ]:

provider.destroy()

By server names#

On Grid’5000, machines belonging to a given cluster are normally homogeneous. But it is impossible to provide absolute guarantees about it: for instance, physical disks may have different performance characteristics across nodes of a cluster even though they share the same vendor and model. For this reason, experimenters may need to reproduce an experiment several times using the exact same hardware.

This is possible by specifying nodes with their exact name. By default all the servers specified this way will get reserved unless you specify a target number of nodes using the nodes parameter.

In the following make sure to change the servers list, otherwise your reservation will conflict with others.

[ ]:

job_name = "specific-server"
conf = (
    en.G5kConf()
    .from_settings(job_name=job_name, walltime="0:10:00")
    .add_machine(
        roles=["compute"],
        servers=["paravance-19.rennes.grid5000.fr", "paravance-20.rennes.grid5000.fr"],
    )
)


provider = en.G5k(conf)

[ ]:

roles, networks = provider.init()

Non default network selection#

In all of the above we get the default network resource (the “production network”). This network is shared with other users. There are two other types of networks: - subnets, which can be used if you need to assign extra addresses to your “nodes” (e.g virtual machines) - kavlans are layer 2 isolated network. Using this network type currently requires an extra step after getting the resources: a reconfiguration/deployment of a full OS on the node.

💡 The number of kavlans is limited:

kavlan-local: 3 per sites (non routed network)
kavlan: 6 per sites (routed network)
kavlan-global: 1 per site (allow multi site, isolated experiments)

💡 To check the OS available to be deployed, run kaen3 -l in a frontend node (open a terminal) or build your own :)

[ ]:

import logging

job_name = "vlan"

private_net = en.G5kNetworkConf(type="kavlan", roles=["private"], site="rennes")

conf = (
    en.G5kConf.from_settings(
        job_name=job_name,
        job_type=["deploy"],
        env_name="debian11-nfs",
        walltime="0:20:00",
    )
    .add_network_conf(private_net)
    .add_machine(
        roles=["roleA"], cluster="parasilo", nodes=1, primary_network=private_net
    )
    .finalize()
)

provider = en.G5k(conf)
roles, networks = provider.init()

[ ]:

# checking the networks we got
networks

[ ]:

# Checking the ips of the nodes
roles = en.sync_info(roles, networks)
roles

[ ]:

# Show kavlan subnet
print("Kavlan subnet:", networks["private"][0].network)

# The nodes use this kavlan network for all traffic
# (the network is interconnected at layer-3 with the rest of Grid'5000)
results = en.run_command("ip route get 9.9.9.9", roles=roles["roleA"])

for result in results:
    print(f"{result.stdout}")

[ ]:

# release resources
provider.destroy()

Disk reservation primer#

Grid’5000 has a disk reservation feature: on several clusters, reserving secondary disks is mandatory if you want to use them in your experiments.

Disk reservation feature addresses different use cases: - benchmarking of storage - long term storage of data local to the node computing them

Let’s have a look in the following

Make sure to specify a cluster that supports this feature – refer to the documentation and the status page

[ ]:

job_name = "without disks"
conf = en.G5kConf.from_settings(
    job_name=job_name, job_type=[], walltime="0:30:00"

).add_machine(
    roles=["storage"],
    cluster="gros",
    nodes=1,
)

with en.G5k(conf) as (roles, _):
    results = en.run_command("lsblk", roles=roles)

[ ]:

# no extra disk available
print(results[0].stdout)

[ ]:

job_name = "with disks"
conf = en.G5kConf.from_settings(
    job_name=job_name, job_type=[], walltime="0:30:00"

).add_machine(
    roles=["storage"],
    cluster="gros",
    nodes=1,
    reservable_disks=True
)

with en.G5k(conf) as (roles, _):
    results = en.run_command("lsblk", roles=roles)

results

[ ]:

# another disk is available
print(results[0].stdout)