Reproducible experiments #

Reproducibility is becoming more and more important in computer science. There are several techniques to ensure your experiments done with Enoslib can stay reproducible in the long term.

Setting up a reproducible environment #

Use a specific version of Enoslib#

Enoslib evolves over time, adding functionality but sometimes also breaking backwards compatibility. As a minimum, you should constrain the version of Enoslib used for your experiment, for instance:

pip install "enoslib>=10,<11"

Use a specific version of Ansible#

Similarly, new versions of Ansible introduce constraints on Python versions (see Ansible / ansible-core mapping and ansible-core support Matrix) and this may affect experiments. To force a specific version of Ansible, here Ansible 8 for example:

pip install "enoslib>=10,<11" "enoslib-ansible>=8,<9"

Use a virtualenv and freeze dependencies#

Going further, you might want to use a fixed version of Enoslib, Ansible and all their dependencies. When setting up your experiment:

python3 -m venv ./myvenv
. ./myvenv/bin/activate
python3 -m pip install enoslib
# python3 -m pip install any-other-dependency
python3 -m pip freeze > requirements.txt

Commit this requirements.txt file in the repository of your experiment. Then, whenever you or somebody else tries to reproduce the experiment:

python3 -m venv ./myvenv
. ./myvenv/bin/activate
python3 -m pip install -r requirements.txt

Don’t forget to document all these steps. You should also document which version of Python you used to run your experiment.

Using Guix#

As an alternative to the above you can use the GNU/Guix system to manage your software environment. EnOSlib is known in Guix as python-enoslib.

# spawn a one-off shell to run myscript.py
guix shell python python-enoslib -- python3 myscript.py

Note

Refer to the Guix documentation to get started on your environment.
On Grid’5000 you can refer to the dedicated tutorial to get started.

Reproducible experiments with Enoslib #

Storing experiment parameters#

Many experiments have parameters: which software version to install, which mode of operation or algorithm to use, how many nodes, which base OS…

For reproducibility, it is recommended to write down these parameters in a separate configuration file for each experiment run, and commit each configuration file in your git repository. Example:

---
garage_version: 0.7.3
garage_metadata_dir: /dev/shm/meta
g5k_env: debian11-min
g5k_walltime: 0:50:00
g5k:
  - cluster: econome
    nodes: 2
    roles: [nantes]
  - cluster: paravance
    nodes: 2
    roles: [rennes]

reproducible_parameters.yaml

For convenience, the format of the “g5k” list is exactly the format that Enoslib expects to reserve machines, but you are free to use any format.

Then you would parse this parameters file in your experiment script:

import logging
import sys
from pathlib import Path

import yaml

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# Parse parameters
params_file = sys.argv[1]
with open(params_file) as f:
    parameters = yaml.safe_load(f)

conf = en.G5kConf().from_settings(
    job_type=["deploy"],
    job_name=job_name,
    env_name=parameters["g5k_env"],
    walltime=parameters["g5k_walltime"],
)

# Add machines from parameters
for machine in parameters["g5k"]:
    conf = conf.add_machine(**machine)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Install packages
with en.actions(roles=roles) as a:
    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")

# Do something with the parameters
print("Garage version:", parameters["garage_version"])
print("Garage metadata directory:", parameters["garage_metadata_dir"])


# Release all Grid'5000 resources
provider.destroy()

reproducible_parameters.py

Storing logs and output#

Similarly, logs and outputs of your experiment should be stored for long-term archival. Here is an example showing how to configure logging to a file:

import logging
import os
import sys
from datetime import datetime
from pathlib import Path

import yaml

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

# Parse parameters
params_file = sys.argv[1]
with open(params_file) as f:
    parameters = yaml.safe_load(f)

# Determine output dir: name of parameter file + date
current_date = datetime.isoformat(datetime.utcnow(), timespec="seconds")
params_name = Path(params_file).stem
output_dir = Path(__file__).parent / f"{params_name}_{current_date}"
os.mkdir(output_dir)

# Configure logging to copy everything to a file
handler = logging.FileHandler(output_dir / "logs")
formatter = logging.Formatter(
    fmt="%(asctime)s  %(name)-24s %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
)
handler.setFormatter(formatter)
logging.getLogger("").addHandler(handler)
# Also add a local logger
logger = logging.getLogger(job_name)


# Log parameters
logger.info("Experiment parameters: %s", parameters)

conf = (
    en.G5kConf()
    .from_settings(job_name=job_name, walltime="0:10:00")
    .add_machine(roles=["dummy"], cluster="paravance", nodes=1)
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Perform your experiment

# Store output
with open(output_dir / "output.txt", "w") as f:
    f.write("Example output\n")


# Release all Grid'5000 resources
provider.destroy()

reproducible_output.py

The results of your experiments should then be committed to git.

Resources selection#

When using hardware resources on supported platforms, try to make sure that they will remain available in the future.

Example for Grid’5000:

Let’s assume you need 25 identical nodes for your experiment
You initially decide to use the uvb cluster in Sophia
However, there are several issues: there are 30 nodes in the cluster, but several nodes are temporarily down because of hardware issues. In addition, the cluster has been installed more than 10 years ago (2011), so it will likely experience more hardware failures in the coming years.
Overall, it is likely that fewer than 25 nodes of this cluster will remain available in a few years; the whole cluster might even be decommissioned at some point.
In the end, you should use a larger and more recent cluster!

Software environment on nodes#

Make sure to deploy a specific OS environment on your nodes. For instance on Grid’5000, to start with a very minimal Ubuntu 20.04 environment:

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

job_name = Path(__file__).name

conf = (
    en.G5kConf()
    .from_settings(
        job_type=["deploy"],
        env_name="ubuntu2004-min",
        job_name=job_name,
        walltime="00:50:00",
    )
    .add_machine(roles=["rennes"], cluster="paravance", nodes=1)
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Install packages
with en.actions(roles=roles) as a:
    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")


# Release all Grid'5000 resources
provider.destroy()

reproducible_g5k_simple.py

Managing third-party software #

When installing third-party software, make sure to install a fixed version. You can also specify the version as a parameter of your experiment.

Third-party software distributed as source code#

If you download and build third-party software, you can download it from https://www.softwareheritage.org/ to be certain of its future availability.

Third-party software distributed as binaries#

Example with a software that is directly downloaded from a website:

GARAGE_VERSION = parameters["garage_version"]
logging.info("Installing Garage version %s", GARAGE_VERSION)
GARAGE_URL = (
    f"https://garagehq.deuxfleurs.fr/_releases/v{GARAGE_VERSION}/"
     "x86_64-unknown-linux-musl/garage"
)
with en.actions(roles=roles["garage"]) as p:
    p.get_url(
        url=GARAGE_URL,
        dest="/tmp/garage",
        mode="755",
        task_name="Download garage",
    )

If you are unsure that this specific version of the software will stay available in the future, you can download it locally, commit it in your experiment repository, and use this local version in your experiment:

mkdir -p artifacts
VERSION="0.7.3"
wget -O artifacts/garage-amd64-${VERSION} "https://garagehq.deuxfleurs.fr/_releases/v${VERSION}/x86_64-unknown-linux-musl/garage"
git add artifacts/*
git commit -m "Add artifacts"

from pathlib import Path

GARAGE_VERSION = parameters["garage_version"]
GARAGE_FILENAME = f"garage-amd64-{GARAGE_VERSION}"
logging.info("Installing Garage version %s from local copy", GARAGE_VERSION)
GARAGE_LOCAL_FILE = str(Path(__file__).parent / "artifacts" / GARAGE_FILENAME)
with en.actions(roles=roles["garage"]) as p:
    p.copy(
        src=GARAGE_LOCAL_FILE,
        dest="/tmp/garage",
        mode="755",
        task_name="Copy garage binary",
    )

Note

If the artifact is large and/or you have many nodes, it may take a long time to copy it to all nodes. In that case, you could copy it first to a single node, and then distribute it to other nodes from there (using scp or a small web server).

Alternatively, you could deposit the artifact on a long-term storage platform such as Zenodo, after making sure that the license allows you to do so.

Debian packages snapshot#

Debian provides “snapshots” of its repository at http://snapshot.debian.org This is useful if you want to really make sure that your experiment will always use the same exact Debian packages.

import logging
from pathlib import Path

import enoslib as en

en.init_logging(level=logging.INFO)
en.check()

# Check out http://snapshot.debian.org/
DEB_ARCHIVE_VERSION = "20221103"
ARCHIVE_URL = f"http://snapshot.debian.org/archive/debian/{DEB_ARCHIVE_VERSION}/"
SECURITY_URL = (
    f"http://snapshot.debian.org/archive/debian-security/{DEB_ARCHIVE_VERSION}/"
)
job_name = Path(__file__).name

conf = (
    en.G5kConf()
    .from_settings(
        job_type=["deploy"],
        env_name="debian10-min",
        env_version=2022090510,  # last env version before 2022-11-03
        job_name=job_name,
        walltime="00:50:00",
    )
    .add_machine(roles=["rennes"], cluster="paravance", nodes=1)
)

provider = en.G5k(conf)

# Get actual resources
roles, networks = provider.init()

# Configure Debian repository
APT_CONF = f"""
deb [check-valid-until=no] {ARCHIVE_URL} buster main contrib non-free
deb [check-valid-until=no] {ARCHIVE_URL} buster-updates main contrib non-free
deb [check-valid-until=no] {ARCHIVE_URL} buster-backports main contrib non-free
# For bullseye and later, this has changed to e.g. "bullseye-security"
# instead of "buster/updates".
deb [check-valid-until=no] {SECURITY_URL} buster/updates main contrib non-free
"""
with en.actions(roles=roles) as a:
    a.copy(
        task_name="Configure APT",
        content=APT_CONF,
        dest="/etc/apt/sources.list",
    )

# Install packages
with en.actions(roles=roles) as a:
    a.apt(name=["htop", "iotop"], state="present", update_cache="yes")


# Release all Grid'5000 resources
provider.destroy()

reproducible_g5k_full.py

Experiment’s shareability #

Sharing experiment requires packaging it and sharing it somehow. A proof of concept using EnOSlib for a Multi-platform Edge-to-Cloud Experiment Workflow is available as an artifact of the Trovi/Jupyter platform of Chameleon. This work is part of Daniel Rosendo ‘s work on reproducibility of edge to cloud experiments.

Going further #

Do you have more ideas to make experiments with Enoslib reproducible? Come tell us! https://framateam.org/enoslib

Reproducible experiments#

Setting up a reproducible environment#

Use a specific version of Enoslib#

Use a specific version of Ansible#

Use a virtualenv and freeze dependencies#

Using Guix#

Reproducible experiments with Enoslib#

Storing experiment parameters#

Storing logs and output#

Resources selection#

Software environment on nodes#

Managing third-party software#

Third-party software distributed as source code#

Third-party software distributed as binaries#

Debian packages snapshot#

Experiment’s shareability#

Going further#

Reproducible experiments #

Setting up a reproducible environment #

Reproducible experiments with Enoslib #

Managing third-party software #

Experiment’s shareability #

Going further #