VMonG5K tutorials #

This tutorial leverages the VMonG5k provider: a provider that provisions virtual machines for you on Grid’5000.

Hint

For a complete schema reference see VMonG5k Schema

Installation #

To use Grid’5000 with Enoslib, you can go with a virtualenv :

$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install -U pip

$ pip install enoslib

Configuration #

Since python-grid5000 is used behind the scene, the configuration is read from a configuration file located in the home directory. It can be created with the following:

echo '
username: MYLOGIN
password: MYPASSWORD
' > ~/.python-grid5000.yaml

chmod 600 ~/.python-grid5000.yaml

The above configuration should work both from a Grid’5000 frontend machine and from your local machine as well.

External access (from your laptop)#

If you are running your experiment from outside Grid’5000 (e.g. from your local machine), using a SSH jump host is required.

Enoslib (version 8.1.0 and above) will automatically setup such a SSH jump host connection through access.grid5000.fr when it detects you are working outside of the Grid’5000 network. See Global configuration if you need to configure this behaviour.

Hint

Using a SSH jump host does not provide the best performance when controlling a large number of nodes. This is because the number of simultaneous SSH connection is limited on the jump host. See Performance tuning for many tips and tricks to improve performance.

Basic example #

We’ll imagine a system that requires 5 compute machines and 1 controller machine. We express this using the VMonG5k provider:

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# claim the resources
conf = (
    en.VMonG5kConf.from_settings(job_name=job_name)
    .add_machine(
        roles=["docker", "compute"],
        cluster="paravance",
        number=5,
        flavour_desc={"core": 1, "mem": 1024},
    )
    .add_machine(
        roles=["docker", "control"], cluster="paravance", number=1, flavour="large"
    )
)

provider = en.VMonG5k(conf)

roles, networks = provider.init()
print(roles)
print(networks)

en.wait_for(roles)

# install docker on the nodes
# bind /var/lib/docker to /tmp/docker to gain some places
docker = en.Docker(agent=roles["docker"], bind_var_docker="/tmp/docker")
docker.deploy()

# start containers.
# Here on all nodes
with en.actions(roles=roles) as a:
    a.docker_container(
        name="mycontainer",
        image="nginx",
        ports=["80:80"],
        state="started",
    )


# Release all Grid'5000 resources
provider.destroy()

tuto_vmong5k.py

You can launch the script using :

$ python tuto_vmg5k.py

The raw data structures of EnOSlib will be displayed and you should be able to connect to any machine using SSH and the root account.

How it works #

The VMonG5k provider internally uses the G5k provider. In particular it uses the default job type to obtain physical nodes with a standard Grid’5000 environment, and it also claims an extra slash_22 subnet.
SSH access will be granted to the VMs using the ~/.ssh/id_rsa | ~/.ssh/id_rsa.pub keypair. So these files must be present in your home directory.
The working_dir setting controls where the temporary files and virtual images disks will be stored. The default is to store everything in the temp folder of the physical nodes.
You might be interested in adding wait_for() just after init() to make sure SSH is up and running on all VMs. Otherwise you might get an unreachable error from SSH.
The provider will try to use as few physical hosts per group of machines as possible, using a very simple bin-packing allocation algorithm. Note however that each requested group of machines will always use its own physical hosts. For instance, if you create 2 groups with 1 VM each, it will use 2 physical hosts.
By default, the provider will allocate VMs on physical hosts based on the number of hardware CPU threads. It is possible to use the number of hardware CPU cores instead: this will allocate fewer VMs per physical host, but you will likely obtain better CPU performance in the VMs. This is controlled by the vcore_type parameter.

Warning

The working_dir and all its content is deleted by the provider when calling destroy().

Changing resource size of virtual machines #

As for the CPU and memory resources, you can simply change the name of the flavour (available flavours are listed here), or create your own flavour with flavour_desc.

[...]
.add_machine(
    [...],
    flavour_desc={"core": 1, "mem": "512"}
)

Customizing the disks of Virtual Machines #

Adding a new disk: Using the disk attribute of flavour_desc will create a new disk and make it available to the VM. For instance to get an extra disk of 10GB you can use this python configuration parameter:
```
[...]
.add_machine(
    [...],
    flavour_desc={"core": 1, "mem": 512, "disk": 10}
)
```

Note that with the above configuration an extra disk of 10GB will be provisioned and available to the Virtual Machine. In the current implementation, the disk is neither formatted nor mounted in the Virtual Machine OS.

Make an external (not managed by EnOSlib) disk available to the Virtual Machine A typical use case is to use an hardware disk from the host machine. In this situation, use the extra_devices parameter of the configuration. It corresponds to the XML string of Libvirt.

[...]
.add_machine(
    [...],
    extra_devices = """
    <disk type='block' device='disk'>
    <driver name='qemu' type='raw'/>
    <source dev='/dev/disk/by-path/pci-0000:82:00.0-sas-phy1-lun-0'/>
    <target dev='vde' bus='virtio'/>
    </disk>
    """

Resize the root filesystem To do so, you will need to get the qcow2 file, put it in your public folder, and resize it. Location of the file is shown here.

cp /grid5000/virt-images/debian10-x64-nfs.qcow2 $HOME/public/original.qcow2
cd $HOME/public
qemu-img info original.qcow2  # check the size (should be 10GB)
qemu-img resize original.qcow2 +30G
# at this stage, image is resized at 40GB but not the partition
virt-resize --expand /dev/sda1 original.qcow2 my-image.qcow2
rm original.qcow2
# now you can check the size of each partition (/dev/sda1 should be almost 40GB)
virt-filesystems –long -h –all -a my-image.qcow2

Finally, you need to tell EnosLib to use this file with:

Configuration.from_settings(...
                            image="/home/<username>/public/my-image.qcow2",
                            ...
                            )

Large-scale VM deployments #

When deploying a large number of VMs, you should follow the guide at Performance tuning to make it work and ensure it is reasonably efficient. Enoslib has been reported to be able to deploy 3000 VMs.

# Make sure your run this example from a dedicated Grid'5000 control node.
# See https://discovery.gitlabpages.inria.fr/enoslib/tutorials/performance_tuning.html

import logging
import os
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

# Set very high parallelism to be able to handle a large number of VMs
en.set_config(ansible_forks=100)

# Enable Ansible pipelining
os.environ["ANSIBLE_PIPELINING"] = "True"

job_name = Path(__file__).name

conf = en.VMonG5kConf.from_settings(job_name=job_name, walltime="00:45:00").add_machine(
    roles=["fog"],
    cluster="dahu",
    number=200,
    vcore_type="thread",
    flavour_desc={"core": 1, "mem": 2048},
)

provider = en.VMonG5k(conf)

# Get actual resources
roles, networks = provider.init()

# Wait for VMs to finish booting
en.wait_for(roles)

# Run same command on all VMs
results = en.run_command("uname -a", roles=roles)
for result in results:
    print(result.payload["stdout"])


# Release all Grid'5000 resources
provider.destroy()

vmong5k_forks.py

Controlling the virtual machines placement #

There are a few ways to control VM placement:

each group is considered independently. It means that you can separate VMs on different physical hosts by just declaring them in different groups. They can still share the same roles.
you can use vcore_type to control whether VMs will be allocated based on the physical number of cores or physical number of hyper-threads.

# Make sure your run this example from a dedicated Grid'5000 control node.
# See https://discovery.gitlabpages.inria.fr/enoslib/tutorials/performance_tuning.html

import logging
import os
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

# Set very high parallelism to be able to handle a large number of VMs
en.set_config(ansible_forks=100)

# Enable Ansible pipelining
os.environ["ANSIBLE_PIPELINING"] = "True"

job_name = Path(__file__).name

CLIENT_FLAVOUR = {"core": 1, "mem": 1024}
COMPUTE_FLAVOUR = {"core": 8, "mem": 16384}
DATABASE_FLAVOUR = {"core": 4, "mem": 8192}

# claim the resources
conf = (
    en.VMonG5kConf.from_settings(job_name=job_name, walltime="00:45:00")
    # Put as many client VMs as possible on the same physical nodes. They
    # should fit on 2 paravance nodes (each node has 32 hyper-threads).
    .add_machine(
        roles=["clients"],
        cluster="paravance",
        number=50,
        vcore_type="thread",
        flavour_desc=CLIENT_FLAVOUR,
    )
    # CPU-intensive VMs: don't allocate VMs on hyper-threads.  They should
    # fit on 4 dahu nodes (each node has 32 physical cores).
    .add_machine(
        roles=["cpu_intensive"],
        cluster="dahu",
        number=16,
        vcore_type="core",
        flavour_desc=COMPUTE_FLAVOUR,
    )
    # Database cluster: make sure each VM is on a separate physical host.
    # We ensure this by using multiple groups with the same role.  This
    # could be done in a loop if necessary.
    .add_machine(
        roles=["database"], cluster="econome", number=1, flavour_desc=DATABASE_FLAVOUR
    )
    .add_machine(
        roles=["database"], cluster="econome", number=1, flavour_desc=DATABASE_FLAVOUR
    )
    .add_machine(
        roles=["database"], cluster="econome", number=1, flavour_desc=DATABASE_FLAVOUR
    )
)

provider = en.VMonG5k(conf)

roles, networks = provider.init()

en.wait_for(roles)

# Display the mapping from VM to physical nodes
for role, vms in roles.items():
    print(f"\n=== {role} ===")
    for vm in vms:
        print(f"{vm.alias:20} {vm.address:15} {vm.pm.alias}")

tuto_placement_basic.py

Controlling the virtual machines placement (advanced)#

If you need more control, you will have to reserve the physical resources yourself and then pass them to the VMonG5K provider:

import logging
from itertools import islice
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

CLUSTER = "parasilo"
SITE = "rennes"


conf = (
    en.G5kConf.from_settings(job_type=[], job_name=job_name)
    .add_network(
        id="not_linked_to_any_machine", type="slash_22", roles=["my_subnet"], site=SITE
    )
    .add_machine(roles=["role1"], cluster=CLUSTER, nodes=1)
    .add_machine(roles=["role2"], cluster=CLUSTER, nodes=1)
)

provider = en.G5k(conf)
roles, networks = provider.init()
roles = en.sync_info(roles, networks)

# Retrieving subnets
subnet = networks["my_subnet"]
logging.info(subnet)

# We describe the VMs types and placement in the following
# We build a VMonG5KConf with some extra fields:
# - undercloud: where the VMs should be placed (round-robin)
# - macs: list of macs to take: on G5k the dhcp is configured to assign specific
#   ip based on the configured mac

n_vms = 16
virt_conf = (
    en.VMonG5kConf.from_settings(image="/grid5000/virt-images/debian11-x64-base.qcow2")
    # Starts some vms on a single role
    # Here that means start the VMs on a single machine
    .add_machine(
        roles=["vms"],
        number=n_vms,
        undercloud=roles["role1"],
        macs=list(islice(subnet[0].free_macs, n_vms))
        # alternative
        # macs=list(islice(en.mac_range(subnet), n_vms))
    )
)

# Start them
vmroles = en.start_virtualmachines(virt_conf)
print(vmroles)
print(networks)

tuto_placement.py

Multisite Support #

You can specify clusters from different sites in the configuration. The provider will take care of reserving nodes and subnet on the different sites and configure the VMs’ network card accordingly.

Mounting your home directory (or a group storage)#

Mounting your home directory within the VMs is a two steps process. It first relies on a white list of IPS allowed to mount the NFS exported home: so you need to add your VM’s IPS to this list. This is done using an REST API call. Second, you need to mount the home inside your VMs.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# claim the resources
conf = en.VMonG5kConf.from_settings(job_name=job_name).add_machine(
    roles=["vms"],
    cluster="econome",
    number=2,
    flavour_desc={"core": 1, "mem": 1024},
)

provider = en.VMonG5k(conf)

roles, networks = provider.init()
print(roles)
print(networks)

en.wait_for(roles)

# get the job
job = provider.g5k_provider.jobs[0]

# get the ips to white list
ips = [vm.address for vm in roles["vms"]]

# add ips to the white list for the job duration
en.g5k_api_utils.enable_home_for_job(job, ips)

# mount the home dir and try writing a file
username = en.g5k_api_utils.get_api_username()
with en.actions(roles=roles) as a:
    a.mount(
        src=f"nfs:/export/home/{username}",
        path=f"/home/{username}",
        fstype="nfs",
        state="mounted",
    )
    a.file(
        path=f"/home/{username}/enoslib-was-here.txt",
        state="touch",
    )


# Release all Grid'5000 resources
provider.destroy()

tuto_vmong5k_home.py

Note that you can allow any group storage using enable_group_storage()