VMonG5K tutorials#

This tutorial leverages the VMonG5k provider: a provider that provisions virtual machines for you on Grid’5000.

Hint

For a complete schema reference see VMonG5k Schema

Installation#

To use Grid’5000 with Enoslib, you can go with a virtualenv :

$ virtualenv -p python3 venv
$ source venv/bin/activate
$ pip install -U pip

$ pip install enoslib

Configuration#

Since python-grid5000 is used behind the scene, the configuration is read from a configuration file located in the home directory. It can be created with the following:

echo '
username: MYLOGIN
password: MYPASSWORD
' > ~/.python-grid5000.yaml

chmod 600 ~/.python-grid5000.yaml

The above configuration should work both from a Grid’5000 frontend machine and from your local machine as well.

External access (from your laptop)#

If you are running your experiment from outside Grid’5000 (e.g. from your local machine), using a SSH jump host is required.

Enoslib (version 8.1.0 and above) will automatically setup such a SSH jump host connection through access.grid5000.fr when it detects you are working outside of the Grid’5000 network. See Global configuration if you need to configure this behaviour.

Hint

Using a SSH jump host does not provide the best performance when controlling a large number of nodes. This is because the number of simultaneous SSH connection is limited on the jump host. See Performance tuning for many tips and tricks to improve performance.

Basic example#

We’ll imagine a system that requires 5 compute machines and 1 controller machine. We express this using the VMonG5k provider:

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11# claim the resources
12conf = (
13    en.VMonG5kConf.from_settings(job_name=job_name)
14    .add_machine(
15        roles=["docker", "compute"],
16        cluster="paravance",
17        number=5,
18        flavour_desc={"core": 1, "mem": 1024},
19    )
20    .add_machine(
21        roles=["docker", "control"], cluster="paravance", number=1, flavour="large"
22    )
23)
24
25provider = en.VMonG5k(conf)
26
27roles, networks = provider.init()
28print(roles)
29print(networks)
30
31en.wait_for(roles)
32
33# install docker on the nodes
34# bind /var/lib/docker to /tmp/docker to gain some places
35docker = en.Docker(agent=roles["docker"], bind_var_docker="/tmp/docker")
36docker.deploy()
37
38# start containers.
39# Here on all nodes
40with en.actions(roles=roles) as a:
41    a.docker_container(
42        name="mycontainer",
43        image="nginx",
44        ports=["80:80"],
45        state="started",
46    )
47
48
49# Release all Grid'5000 resources
50provider.destroy()

tuto_vmong5k.py

  • You can launch the script using :

$ python tuto_vmg5k.py
  • The raw data structures of EnOSlib will be displayed and you should be able to connect to any machine using SSH and the root account.

How it works#

  • The VMonG5k provider internally uses the G5k provider. In particular it uses the default job type to obtain physical nodes with a standard Grid’5000 environment, and it also claims an extra slash_22 subnet.

  • SSH access will be granted to the VMs using the ~/.ssh/id_rsa | ~/.ssh/id_rsa.pub keypair. So these files must be present in your home directory.

  • The working_dir setting controls where the temporary files and virtual images disks will be stored. The default is to store everything in the temp folder of the physical nodes.

  • You might be interested in adding wait_for() just after init() to make sure SSH is up and running on all VMs. Otherwise you might get an unreachable error from SSH.

  • The provider will try to use as few physical hosts per group of machines as possible, using a very simple bin-packing allocation algorithm. Note however that each requested group of machines will always use its own physical hosts. For instance, if you create 2 groups with 1 VM each, it will use 2 physical hosts.

  • By default, the provider will allocate VMs on physical hosts based on the number of hardware CPU threads. It is possible to use the number of hardware CPU cores instead: this will allocate fewer VMs per physical host, but you will likely obtain better CPU performance in the VMs. This is controlled by the vcore_type parameter.

Warning

The working_dir and all its content is deleted by the provider when calling destroy().

Changing resource size of virtual machines#

As for the CPU and memory resources, you can simply change the name of the flavour (available flavours are listed here), or create your own flavour with flavour_desc.

[...]
.add_machine(
    [...],
    flavour_desc={"core": 1, "mem": "512"}
)

Customizing the disks of Virtual Machines#

  • Adding a new disk: Using the disk attribute of flavour_desc will create a new disk and make it available to the VM. For instance to get an extra disk of 10GB you can use this python configuration parameter:

    [...]
    .add_machine(
        [...],
        flavour_desc={"core": 1, "mem": 512, "disk": 10}
    )
    

Note that with the above configuration an extra disk of 10GB will be provisioned and available to the Virtual Machine. In the current implementation, the disk is neither formatted nor mounted in the Virtual Machine OS.

  • Make an external (not managed by EnOSlib) disk available to the Virtual Machine A typical use case is to use an hardware disk from the host machine. In this situation, use the extra_devices parameter of the configuration. It corresponds to the XML string of Libvirt.

    [...]
    .add_machine(
        [...],
        extra_devices = """
        <disk type='block' device='disk'>
        <driver name='qemu' type='raw'/>
        <source dev='/dev/disk/by-path/pci-0000:82:00.0-sas-phy1-lun-0'/>
        <target dev='vde' bus='virtio'/>
        </disk>
        """
    
  • Resize the root filesystem To do so, you will need to get the qcow2 file, put it in your public folder, and resize it. Location of the file is shown here.

    cp /grid5000/virt-images/debian10-x64-nfs.qcow2 $HOME/public/original.qcow2
    cd $HOME/public
    qemu-img info original.qcow2  # check the size (should be 10GB)
    qemu-img resize original.qcow2 +30G
    # at this stage, image is resized at 40GB but not the partition
    virt-resize --expand /dev/sda1 original.qcow2 my-image.qcow2
    rm original.qcow2
    # now you can check the size of each partition (/dev/sda1 should be almost 40GB)
    virt-filesystems –long -h –all -a my-image.qcow2
    

    Finally, you need to tell EnosLib to use this file with:

    Configuration.from_settings(...
                                image="/home/<username>/public/my-image.qcow2",
                                ...
                                )
    

Large-scale VM deployments#

When deploying a large number of VMs, you should follow the guide at Performance tuning to make it work and ensure it is reasonably efficient. Enoslib has been reported to be able to deploy 3000 VMs.

 1# Make sure your run this example from a dedicated Grid'5000 control node.
 2# See https://discovery.gitlabpages.inria.fr/enoslib/tutorials/performance_tuning.html
 3
 4import logging
 5import os
 6from pathlib import Path
 7
 8import enoslib as en
 9
10en.init_logging(level=logging.INFO)
11en.check()
12
13# Set very high parallelism to be able to handle a large number of VMs
14en.set_config(ansible_forks=100)
15
16# Enable Ansible pipelining
17os.environ["ANSIBLE_PIPELINING"] = "True"
18
19job_name = Path(__file__).name
20
21conf = en.VMonG5kConf.from_settings(job_name=job_name, walltime="00:45:00").add_machine(
22    roles=["fog"],
23    cluster="dahu",
24    number=200,
25    vcore_type="thread",
26    flavour_desc={"core": 1, "mem": 2048},
27)
28
29provider = en.VMonG5k(conf)
30
31# Get actual resources
32roles, networks = provider.init()
33
34# Wait for VMs to finish booting
35en.wait_for(roles)
36
37# Run same command on all VMs
38results = en.run_command("uname -a", roles=roles)
39for result in results:
40    print(result.payload["stdout"])
41
42
43# Release all Grid'5000 resources
44provider.destroy()

vmong5k_forks.py

Controlling the virtual machines placement#

There are a few ways to control VM placement:

  • each group is considered independently. It means that you can separate VMs on different physical hosts by just declaring them in different groups. They can still share the same roles.

  • you can use vcore_type to control whether VMs will be allocated based on the physical number of cores or physical number of hyper-threads.

 1# Make sure your run this example from a dedicated Grid'5000 control node.
 2# See https://discovery.gitlabpages.inria.fr/enoslib/tutorials/performance_tuning.html
 3
 4import logging
 5import os
 6from pathlib import Path
 7
 8import enoslib as en
 9
10en.init_logging(level=logging.INFO)
11en.check()
12
13# Set very high parallelism to be able to handle a large number of VMs
14en.set_config(ansible_forks=100)
15
16# Enable Ansible pipelining
17os.environ["ANSIBLE_PIPELINING"] = "True"
18
19job_name = Path(__file__).name
20
21CLIENT_FLAVOUR = {"core": 1, "mem": 1024}
22COMPUTE_FLAVOUR = {"core": 8, "mem": 16384}
23DATABASE_FLAVOUR = {"core": 4, "mem": 8192}
24
25# claim the resources
26conf = (
27    en.VMonG5kConf.from_settings(job_name=job_name, walltime="00:45:00")
28    # Put as many client VMs as possible on the same physical nodes. They
29    # should fit on 2 ecotype nodes (each node has 40 hyper-threads).
30    .add_machine(
31        roles=["clients"],
32        cluster="ecotype",
33        number=50,
34        vcore_type="thread",
35        flavour_desc=CLIENT_FLAVOUR,
36    )
37    # CPU-intensive VMs: don't allocate VMs on hyper-threads.  They should
38    # fit on 4 dahu nodes (each node has 32 physical cores).
39    .add_machine(
40        roles=["cpu_intensive"],
41        cluster="dahu",
42        number=16,
43        vcore_type="core",
44        flavour_desc=COMPUTE_FLAVOUR,
45    )
46    # Database cluster: make sure each VM is on a separate physical host.
47    # We ensure this by using multiple groups with the same role.  This
48    # could be done in a loop if necessary.
49    .add_machine(
50        roles=["database"], cluster="ecotype", number=1, flavour_desc=DATABASE_FLAVOUR
51    )
52    .add_machine(
53        roles=["database"], cluster="ecotype", number=1, flavour_desc=DATABASE_FLAVOUR
54    )
55    .add_machine(
56        roles=["database"], cluster="ecotype", number=1, flavour_desc=DATABASE_FLAVOUR
57    )
58)
59
60provider = en.VMonG5k(conf)
61
62roles, networks = provider.init()
63
64en.wait_for(roles)
65
66# Display the mapping from VM to physical nodes
67for role, vms in roles.items():
68    print(f"\n=== {role} ===")
69    for vm in vms:
70        print(f"{vm.alias:20} {vm.address:15} {vm.pm.alias}")

tuto_placement_basic.py

Controlling the virtual machines placement (advanced)#

If you need more control, you will have to reserve the physical resources yourself and then pass them to the VMonG5K provider:

 1import logging
 2from itertools import islice
 3from pathlib import Path
 4
 5import enoslib as en
 6
 7en.init_logging(level=logging.INFO)
 8en.check()
 9
10job_name = Path(__file__).name
11
12CLUSTER = "parasilo"
13SITE = "rennes"
14
15
16conf = (
17    en.G5kConf.from_settings(job_type=[], job_name=job_name)
18    .add_network(
19        id="not_linked_to_any_machine", type="slash_22", roles=["my_subnet"], site=SITE
20    )
21    .add_machine(roles=["role1"], cluster=CLUSTER, nodes=1)
22    .add_machine(roles=["role2"], cluster=CLUSTER, nodes=1)
23)
24
25provider = en.G5k(conf)
26roles, networks = provider.init()
27roles = en.sync_info(roles, networks)
28
29# Retrieving subnets
30subnet = networks["my_subnet"]
31logging.info(subnet)
32
33# We describe the VMs types and placement in the following
34# We build a VMonG5KConf with some extra fields:
35# - undercloud: where the VMs should be placed (round-robin)
36# - macs: list of macs to take: on G5k the dhcp is configured to assign specific
37#   ip based on the configured mac
38
39n_vms = 16
40virt_conf = (
41    en.VMonG5kConf.from_settings(image="/grid5000/virt-images/debian11-x64-base.qcow2")
42    # Starts some vms on a single role
43    # Here that means start the VMs on a single machine
44    .add_machine(
45        roles=["vms"],
46        number=n_vms,
47        undercloud=roles["role1"],
48        macs=list(islice(subnet[0].free_macs, n_vms))
49        # alternative
50        # macs=list(islice(en.mac_range(subnet), n_vms))
51    )
52)
53
54# Start them
55vmroles = en.start_virtualmachines(virt_conf)
56print(vmroles)
57print(networks)

tuto_placement.py

Multisite Support#

You can specify clusters from different sites in the configuration. The provider will take care of reserving nodes and subnet on the different sites and configure the VMs’ network card accordingly.

Mounting your home directory (or a group storage)#

Mounting your home directory within the VMs is a two steps process. It first relies on a white list of IPS allowed to mount the NFS exported home: so you need to add your VM’s IPS to this list. This is done using an REST API call. Second, you need to mount the home inside your VMs.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11# claim the resources
12conf = en.VMonG5kConf.from_settings(job_name=job_name).add_machine(
13    roles=["vms"],
14    cluster="ecotype",
15    number=2,
16    flavour_desc={"core": 1, "mem": 1024},
17)
18
19provider = en.VMonG5k(conf)
20
21roles, networks = provider.init()
22print(roles)
23print(networks)
24
25en.wait_for(roles)
26
27# get the job
28job = provider.g5k_provider.jobs[0]
29
30# get the ips to white list
31ips = [vm.address for vm in roles["vms"]]
32
33# add ips to the white list for the job duration
34en.g5k_api_utils.enable_home_for_job(job, ips)
35
36# mount the home dir and try writing a file
37username = en.g5k_api_utils.get_api_username()
38with en.actions(roles=roles) as a:
39    a.mount(
40        src=f"nfs:/export/home/{username}",
41        path=f"/home/{username}",
42        fstype="nfs",
43        state="mounted",
44    )
45    a.file(
46        path=f"/home/{username}/enoslib-was-here.txt",
47        state="touch",
48    )
49
50
51# Release all Grid'5000 resources
52provider.destroy()

tuto_vmong5k_home.py

Note that you can allow any group storage using enable_group_storage()