Reproducible experiments#

Reproducibility is becoming more and more important in computer science. There are several techniques to ensure your experiments done with Enoslib can stay reproducible in the long term.

Setting up a reproducible environment#

Use a specific version of Enoslib#

Enoslib evolves over time, adding functionality but sometimes also breaking backwards compatibility. As a minimum, you should constrain the version of Enoslib used for your experiment, for instance:

pip install "enoslib>=8.0.0,<9.0.0"

Use a virtualenv and freeze dependencies#

Going further, you might want to use a fixed version of Enoslib and its dependencies (in particular, Ansible). When setting up your experiment:

python3 -m venv ./myvenv
. ./myvenv/bin/activate
python3 -m pip install enoslib
# python3 -m pip install any-other-dependency
python3 -m pip freeze > requirements.txt

Commit this requirements.txt file in the repository of your experiment. Then, whenever you or somebody else tries to reproduce the experiment:

python3 -m venv ./myvenv
. ./myvenv/bin/activate
python3 -m pip install -r requirements.txt

Don’t forget to document all these steps. You should also document which version of Python you used to run your experiment.

Using Guix#

As an alternative to the above you can use the GNU/Guix system to manage your software environment. EnOSlib is known in Guix as python-enoslib.

# spawn a one-off shell to run myscript.py
guix shell python python-enoslib -- python3 myscript.py

Note

  • Refer to the Guix documentation to get started on your environment.

  • On Grid’5000 you can refer to the dedicated tutorial to get started.

Reproducible experiments with Enoslib#

Storing experiment parameters#

Many experiments have parameters: which software version to install, which mode of operation or algorithm to use, how many nodes, which base OS…

For reproducibility, it is recommended to write down these parameters in a separate configuration file for each experiment run, and commit each configuration file in your git repository. Example:

---
garage_version: 0.7.3
garage_metadata_dir: /dev/shm/meta
g5k_env: debian11-min
g5k_walltime: 0:50:00
g5k:
  - cluster: ecotype
    nodes: 2
    roles: [nantes]
  - cluster: paravance
    nodes: 2
    roles: [rennes]

reproducible_parameters.yaml

For convenience, the format of the “g5k” list is exactly the format that Enoslib expects to reserve machines, but you are free to use any format.

Then you would parse this parameters file in your experiment script:

 1import logging
 2import sys
 3from pathlib import Path
 4
 5import yaml
 6
 7import enoslib as en
 8
 9en.init_logging(level=logging.INFO)
10en.check()
11
12job_name = Path(__file__).name
13
14# Parse parameters
15params_file = sys.argv[1]
16with open(params_file) as f:
17    parameters = yaml.safe_load(f)
18
19conf = en.G5kConf().from_settings(
20    job_type=["deploy"],
21    job_name=job_name,
22    env_name=parameters["g5k_env"],
23    walltime=parameters["g5k_walltime"],
24)
25
26# Add machines from parameters
27for machine in parameters["g5k"]:
28    conf = conf.add_machine(**machine)
29
30provider = en.G5k(conf)
31
32# Get actual resources
33roles, networks = provider.init()
34
35# Install packages
36with en.actions(roles=roles) as a:
37    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")
38
39# Do something with the parameters
40print("Garage version:", parameters["garage_version"])
41print("Garage metadata directory:", parameters["garage_metadata_dir"])
42
43
44# Release all Grid'5000 resources
45provider.destroy()

reproducible_parameters.py

Storing logs and output#

Similarly, logs and outputs of your experiment should be stored for long-term archival. Here is an example showing how to configure logging to a file:

 1import logging
 2import os
 3import sys
 4from datetime import datetime
 5from pathlib import Path
 6
 7import yaml
 8
 9import enoslib as en
10
11en.init_logging(level=logging.INFO)
12en.check()
13
14job_name = Path(__file__).name
15
16# Parse parameters
17params_file = sys.argv[1]
18with open(params_file) as f:
19    parameters = yaml.safe_load(f)
20
21# Determine output dir: name of parameter file + date
22current_date = datetime.isoformat(datetime.utcnow(), timespec="seconds")
23params_name = Path(params_file).stem
24output_dir = Path(__file__).parent / f"{params_name}_{current_date}"
25os.mkdir(output_dir)
26
27# Configure logging to copy everything to a file
28handler = logging.FileHandler(output_dir / "logs")
29formatter = logging.Formatter(
30    fmt="%(asctime)s  %(name)-24s %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
31)
32handler.setFormatter(formatter)
33logging.getLogger("").addHandler(handler)
34# Also add a local logger
35logger = logging.getLogger(job_name)
36
37
38# Log parameters
39logger.info("Experiment parameters: %s", parameters)
40
41conf = (
42    en.G5kConf()
43    .from_settings(job_name=job_name, walltime="0:10:00")
44    .add_machine(roles=["dummy"], cluster="paravance", nodes=1)
45)
46
47provider = en.G5k(conf)
48
49# Get actual resources
50roles, networks = provider.init()
51
52# Perform your experiment
53
54# Store output
55with open(output_dir / "output.txt", "w") as f:
56    f.write("Example output\n")
57
58
59# Release all Grid'5000 resources
60provider.destroy()

reproducible_output.py

The results of your experiments should then be committed to git.

Resources selection#

When using hardware resources on supported platforms, try to make sure that they will remain available in the future.

Example for Grid’5000:

  • You need 25 identical nodes for your experiment

  • You initially decide to use the uvb cluster in Sophia

  • However, there are several issues: there are 30 nodes in the cluster, but several nodes are temporarily down because of hardware issues. In addition, the cluster has been installed more than 10 years ago (2011), so it will likely experience more hardware failures in the coming years.

  • Overall, it is likely that fewer than 25 nodes of this cluster will remain available in a few years; the whole cluster might even be decommissioned at some point.

  • In the end, you should use a larger and more recent cluster!

Software environment on nodes#

Make sure to deploy a specific OS environment on your nodes. For instance on Grid’5000, to start with a very minimal Ubuntu 20.04 environment:

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9job_name = Path(__file__).name
10
11conf = (
12    en.G5kConf()
13    .from_settings(
14        job_type=["deploy"],
15        env_name="ubuntu2004-min",
16        job_name=job_name,
17        walltime="00:50:00",
18    )
19    .add_machine(roles=["rennes"], cluster="paravance", nodes=1)
20)
21
22provider = en.G5k(conf)
23
24# Get actual resources
25roles, networks = provider.init()
26
27# Install packages
28with en.actions(roles=roles) as a:
29    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")
30
31
32# Release all Grid'5000 resources
33provider.destroy()

reproducible_g5k_simple.py

Managing third-party software#

When installing third-party software, make sure to install a fixed version. You can also specify the version as a parameter of your experiment.

Third-party software distributed as source code#

If you download and build third-party software, you can download it from https://www.softwareheritage.org/ to be certain of its future availability.

Third-party software distributed as binaries#

Example with a software that is directly downloaded from a website:

GARAGE_VERSION = parameters["garage_version"]
logging.info("Installing Garage version %s", GARAGE_VERSION)
GARAGE_URL = (
    f"https://garagehq.deuxfleurs.fr/_releases/v{GARAGE_VERSION}/"
     "x86_64-unknown-linux-musl/garage"
)
with en.actions(roles=roles["garage"]) as p:
    p.get_url(
        url=GARAGE_URL,
        dest="/tmp/garage",
        mode="755",
        task_name="Download garage",
    )

If you are unsure that this specific version of the software will stay available in the future, you can download it locally, commit it in your experiment repository, and use this local version in your experiment:

mkdir -p artifacts
VERSION="0.7.3"
wget -O artifacts/garage-amd64-${VERSION} "https://garagehq.deuxfleurs.fr/_releases/v${VERSION}/x86_64-unknown-linux-musl/garage"
git add artifacts/*
git commit -m "Add artifacts"
from pathlib import Path

GARAGE_VERSION = parameters["garage_version"]
GARAGE_FILENAME = f"garage-amd64-{GARAGE_VERSION}"
logging.info("Installing Garage version %s from local copy", GARAGE_VERSION)
GARAGE_LOCAL_FILE = str(Path(__file__).parent / "artifacts" / GARAGE_FILENAME)
with en.actions(roles=roles["garage"]) as p:
    p.copy(
        src=GARAGE_LOCAL_FILE,
        dest="/tmp/garage",
        mode="755",
        task_name="Copy garage binary",
    )

Note

If the artifact is large and/or you have many nodes, it may take a long time to copy it to all nodes. In that case, you could copy it first to a single node, and then distribute it to other nodes from there (using scp or a small web server).

Alternatively, you could deposit the artifact on a long-term storage platform such as Zenodo, after making sure that the license allows you to do so.

Debian packages snapshot#

Debian provides “snapshots” of its repository at http://snapshot.debian.org This is useful if you want to really make sure that your experiment will always use the same exact Debian packages.

 1import logging
 2from pathlib import Path
 3
 4import enoslib as en
 5
 6en.init_logging(level=logging.INFO)
 7en.check()
 8
 9# Check out http://snapshot.debian.org/
10DEB_ARCHIVE_VERSION = "20221103"
11ARCHIVE_URL = f"http://snapshot.debian.org/archive/debian/{DEB_ARCHIVE_VERSION}/"
12SECURITY_URL = (
13    f"http://snapshot.debian.org/archive/debian-security/{DEB_ARCHIVE_VERSION}/"
14)
15job_name = Path(__file__).name
16
17conf = (
18    en.G5kConf()
19    .from_settings(
20        job_type=["deploy"],
21        env_name="debian10-min",
22        env_version=2022090510,  # last env version before 2022-11-03
23        job_name=job_name,
24        walltime="00:50:00",
25    )
26    .add_machine(roles=["rennes"], cluster="paravance", nodes=1)
27)
28
29provider = en.G5k(conf)
30
31# Get actual resources
32roles, networks = provider.init()
33
34# Configure Debian repository
35APT_CONF = f"""
36deb [check-valid-until=no] {ARCHIVE_URL} buster main contrib non-free
37deb [check-valid-until=no] {ARCHIVE_URL} buster-updates main contrib non-free
38deb [check-valid-until=no] {ARCHIVE_URL} buster-backports main contrib non-free
39# For bullseye and later, this has changed to e.g. "bullseye-security"
40# instead of "buster/updates".
41deb [check-valid-until=no] {SECURITY_URL} buster/updates main contrib non-free
42"""
43with en.actions(roles=roles) as a:
44    a.copy(
45        task_name="Configure APT",
46        content=APT_CONF,
47        dest="/etc/apt/sources.list",
48    )
49
50# Install packages
51with en.actions(roles=roles) as a:
52    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")
53
54
55# Release all Grid'5000 resources
56provider.destroy()

reproducible_g5k_full.py

Experiment’s shareability#

Sharing experiment requires packaging it and share it somehow. A proof of concept using EnOSlib for a Multi-platform Edge-to-Cloud Experiment Workflow is available as an artifact of the Trovi/Jupyter platform of Chameleon. This work is part of Daniel Rosendo ‘s work on reproducibility of edge to cloud experiments.

Experiment’s shareability#

Sharing experiment requires packaging it and share it somehow. A proof of concept using EnOSlib for a Multi-platform Edge-to-Cloud Experiment Workflow is available as an artifact of the Trovi/Jupyter platform of Chameleon. This work is part of Daniel Rosendo ‘s work on reproducibility of edge to cloud experiments.

Going further#

Do you have more ideas to make experiments with Enoslib reproducible? Come tell us! https://framateam.org/enoslib