Remote actions and variables#
Changing the state of remote resources
Website: https://discovery.gitlabpages.inria.fr/enoslib/index.html
Instant chat: https://framateam.org/enoslib
Source code: https://gitlab.inria.fr/discovery/enoslib
Prerequisites#
⚠️ Make sure you’ve run the one time setup for your environment
⚠️ Make sure you’re running this notebook under the right kernel
[ ]:
import enoslib as en
# Display some general information about the library
en.check()
# Enable rich logging
_ = en.init_logging()
Setup on Grid’5000#
EnOSlib uses Providers
to … provide resources. They transform an abstract resource configuration into a concrete one. To do so, they interact with an infrastructure where they get the resources from. There are different providers in EnOSlib:
Vbox/KVM to work with locally hosted virtual machines
Openstack/Chameleon to work with bare-metal resources hosted in the Chameleon platform
FiT/IOT lab to work with sensors or low profile machines
VmonG5k to work with virtual machines on Grid’5000
Distem to work with lxc containers on Grid’5000
Grid’5000 to work with bare-metal resources hosted in the Grid5000 testbed
A providers eases the use of the platform by internalizing some of the configuration tasks (e.g automatically managing the reservation on G5k, network configuration …)
Describing the resources#
For the purpose of the tutorial we’ll reserve 2 nodes in the production environment.
First we build a configuration object describing the wanted resources: machines
and networks
.
[ ]:
conf = (
en.G5kConf.from_settings(job_type=[], job_name="rsd-01")
.add_machine(
roles=["control"], cluster="parasilo", nodes=1
)
.add_machine(
roles=["compute"],
cluster="parasilo",
nodes=1
)
.finalize()
)
conf
💡 Wanted nodes might not be available, check the availability page (non production nodes) (bookmark your favorite sites!)
Reserving the resources#
We can pass the Configuration
object to the G5k
provider.
[ ]:
provider = en.G5k(conf)
roles, networks = provider.init()
💡 Check the status page after the reservation is made (find your job)
💡 Get textual information of your job using the terminal (oarstat, oarstat -u, oarstat -j , oarstat -j -f …)
Inspecting the ressources we own for the experiment’s lifetime:
roles: this is somehow a dictionnary whose keys are the role names and the associated values are the corresponding list of hosts
networks: similar to roles but for networks
[ ]:
roles
[ ]:
# set of hosts on a given role
roles["control"]
[ ]:
# a single host
roles["control"][0]
[ ]:
networks
provider.init
is idempotent. In the Grid’5000 case, you can call it several time in a row. The same reservation will reloaded and the roles and networks will be the same.
[ ]:
roles, networks = provider.init()
roles
[ ]:
# sync some more information in the host data structure (for illustration purpose here)
roles = en.sync_info(roles, networks)
[ ]:
# the hosts have been populated with some new information
roles
Acting on remote nodes#
run a command, filter results#
[ ]:
results = en.run_command("whoami", roles=roles)
results
💡 The run_command
function is polymorphic in its roles
parameter. You can pass a Roles
object as return by provider.init
, a list of Host
or a single Host
. Check its documentation using SHIFT+TAB for instance.
[ ]:
# a list of hosts
results = en.run_command("whoami", roles=roles["control"])
results
💡 The results is a list of result. Each result is an object with host, task, status, payload attributes. You can filter the result given host, task and or status
[ ]:
# filter by host
some_results = results.filter(host=roles["control"][0].alias)
some_results
[ ]:
# take the first result
one_result = some_results[0]
one_result.payload["stdout"]
💡 There are some specific shortcuts when the remote actions is a remote (shell) command: .stdout
, .stderr
, .rc
[ ]:
print(f"stdout = {one_result.stdout}\n", f"stderr={one_result.stderr}\n", f"return code = {one_result.rc}")
💡 By default the user is root
(this is common to all EnOSlib’s provider). If you want to run command as your regular Grid’5000 user you can tell the command to sudo
back to your regular user using run_as
(the SSH login is still root
though)
[ ]:
# get my username on g5k (this line is generic and useful if you share your code with someone else)
my_g5k_login = en.g5k_api_utils.get_api_username()
results = en.run_command("whoami", roles=roles, run_as=my_g5k_login)
results
Filtering hosts on which the command is run#
run_command
acts on remote hosts. Those hosts can be given as a Roles
type (output of provider.init
) or as a list of Host
or a single Host
.
[ ]:
# some roles
en.run_command("date", roles = roles)
[ ]:
# a list of hosts
en.run_command("date", roles = roles["control"])
[ ]:
# a single host
en.run_command("date", roles=roles["control"][0])
A pattern_hosts
can also be supplied. The pattern can be a regexp but other patterns are possible
[ ]:
# co* matches all hosts
en.run_command("date", roles=roles, pattern_hosts="co*")
# com* only matches host with `compute` tags
en.run_command("date", roles=roles, pattern_hosts="com*")
[ ]:
# you can forge some host yourself
# Here we run the command on the frontend: this should work if your SSH parameters are correct
en.run_command("date", roles=en.Host("rennes.grid5000.fr", user=en.g5k_api_utils.get_api_username()))
Dealing with failures#
By default, failures (command failure, host unreachable) raises on exception: this breaks your execution flow. Sometime you just want to allow some failures to happen. For this purpose you can add on_error_continue=True
[ ]:
en.run_command("non existing command", roles=roles, on_error_continue=True)
print("This is printed, so the execution can continue")
Remote actions#
Tools like Ansible, Puppet, Chef, Terraform … are shipped with a set of predefined remote actions to ease the administrator life.
Actions like copying file, adding some users, managing packages, making sure a line is absent from a configuration file, managing docker containers … are first-class citizens actions and brings some nice garantees of correctness and idempotency.
There are 1000+ modules available: https://docs.ansible.com/ansible/2.9/modules/list_of_all_modules.html
EnOSlib wraps Ansible module and let you use them from Python (without writting any YAML file). You can call any module by using the actions
context manager:
In the following we install docker (using g5k provided script) and a docker container. We also need to install the python docker binding on the remote machine so that Ansible can interact with the docker daemons on the remote machines. This block of actions is idempotent.
[ ]:
with en.actions(roles=roles) as a:
# install nginx on the remote machines
a.apt(name="nginx", state="present")
# wait for the connection on the port 80 to be ready
a.wait_for(port=80, state="started")
# keep track of the result of each modules
# not mandatory but nice :)
results = a.results
[ ]:
results.filter(task="apt")[0]
[ ]:
import requests
from IPython.display import HTML
# This is run from the frontend node (or a dedicated node) to a reserved node (same network)
address = roles["control"][0].address
print(f"Making an http request to {address}")
response = requests.get(f"http://{address}:80")
HTML(response.text)
Background actions#
Sometime you need to fire a process on some remote machines that needs to survive the remote connection that started it. EnOSlib provides a keyword
argument for this purpose and can be used when calling modules (when supported).
[ ]:
# synchronous execution, will wait until the end of the shell command
results = en.run_command("for i in $(seq 1 10); do sleep 1; echo toto; done", roles=roles)
results
[ ]:
# The remote command will be daemonize on the remote machines
results = en.run_command("for i in $(seq 1 10); do sleep 1; echo toto; done", roles=roles, background=True)
results
[ ]:
# you can get back the status of the daemonized process by reading the remote results_file
# but we need to wait the completion, so forcing a sleep here (one could poll the status)
import time
# time.sleep(15)
h = roles["control"][0]
result_file = results.filter(host=h.alias)[0].results_file
cat_result = en.run_command(f"cat {result_file}",roles=h)
cat_result
[ ]:
# the result_file content is json encoded so decoding it
import json
print(json.loads(cat_result[0].stdout)["stdout"])
Using variables#
Same variable value for everyone#
Nothing surprising here, you can use regular python interpolation (e.g a f-string
). String are interpolated by the interpreter before being manipulated.
[ ]:
host_to_ping = roles["control"][0].alias
host_to_ping
results = en.run_command(f"ping -c 5 {host_to_ping}", roles=roles)
results
[ ]:
[(r.host, r.stdout) for r in results]
Using templates / Ansible variables#
There’s an alternative way to pass a variable to a task: using extra_vars
. The difference with the previous case (python interpreted variables) is the fact that the variable is interpolated right before execution happens on the remote node. One could imagine that the value is broadcasted to all nodes and replaced right before the execution.
To indicate that we want to use this kind of variables, we need to pass its value using the extra_vars
dictionnary and use a template ({{ ... }}
) in the task description.
[ ]:
host_to_ping = roles["control"][0].alias
host_to_ping
results = en.run_command("ping -c 5 {{ my_template_variable }}", roles=roles, extra_vars=dict(my_template_variable=host_to_ping))
results
Host specific variables#
In the above, we’ve seen how a common value can be broadcasted to all remote nodes. What if we want host specific value ?
For instance in our case we’d like host 1
to ping host 2
and host 2
to ping host 1
. That make the host_to_ping
variable host-specific.
For this purpose you can use the extra
attribute of the Host
objects and use a template as before.
[ ]:
control_host = roles["control"][0]
compute_host = roles["compute"][0]
control_host.set_extra(host_to_ping=compute_host.address)
compute_host.set_extra(host_to_ping=control_host.address)
control_host
[ ]:
compute_host
Note that the
extra
variable can be reset to their initial states withHost.reset_extra()
[ ]:
results = en.run_command("ping -c 5 {{ host_to_ping }}", roles=roles)
results
[ ]:
[(r.host, r.stdout) for r in results]
Cleaning#
[ ]:
provider.destroy()
## Syntactic sugar
Provider can be used as a context manager. - when entering the resources will be reserved and returned - when exiting the resources will be released
[ ]:
with en.G5k(conf) as (roles, networks):
# let's do cross ping
control_host = roles["control"][0]
compute_host = roles["compute"][0]
control_host.set_extra(host_to_ping=compute_host.address)
compute_host.set_extra(host_to_ping=control_host.address)
results = en.run_command("ping -c 5 {{ host_to_ping }}", roles=roles)
[ ]:
results
[ ]: