GPU Access – How to Restrict Access to GPU

cgroupsystemdubuntu-18.04

I'm configuring a small GPU server under Ubuntu 18.04.
It should support both interactive and batch jobs.

It is dedicated to run machine learning tasks by a small team. We have also some tasks for massive parallel data processing on CPU-only.

My goal is to avoid infinite seizing of GPUs by a single individual, that could, for example, launch IPython console, issue import tensorflow in it and leave office, because his working time is over.

From the other side, it is desirable to allow users do some prototyping interactively (on CPU).

Considering all above I've installed SLURM workload manager to support job queues and scheduling.

Now I want to restrict access to GPUs from interactive sessions and move them entirely under control of SLURM.

Restricting access to /dev/nvidia[0-3], as suggested here, does not work, as SLURM changes process users from root to actual task owners.

After some googleing I've come to two variants: cgroups and udev.

Reading this topic in the systemd-devel mailing list suggests that restricting access with cgroups can be done in bypass of systemd, with just issuing commands like echo "c 195:*" > /sys/fs/cgroup/devices/ blah-blah-blah /device.deny

Nevertheless I've read in the manual that systemd supports 3 slices.

I've discovered also that processes, launched by the SLURM daemon, run in system.slice, and processes, launched by users, connected to the server using ssh or remote desktop connection, run in user.slice.

In principle, I can manually edit file /lib/systemd/system/user.slice and add DeviceAllow="char-nvidia-frontend" as described here, but this file could change after next update of packages.

So, what is the correct way to setting device.deny properties in Ubuntu 18.04?

Lennart Pottering, systemd developer says that another option to restrict access is udev.

I've started reading manuals, but it seems that udev is mostly dedicated to hard drives or USB devices, not video cards.

Can anyone share an example config setting permissions?

Best Answer

I think, I've solved my task with the environment variable CUDA_VISIBLE_DEVICES.

I've put in /etc/environment a line CUDA_VISIBLE_DEVICES=-1 and thus "kindly asked" interactive sessions to not use GPU.

Related Topic