I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
A top-like utility for monitoring CUDA activity on a GPU
cudaprocess-monitoringresource-monitor
Related Solutions
The answer to this question consists of two parts:
- A program to detect the presence of a cuda-capable GPU.
- CMake code to compile, run, and interpret the result of that program at configuration time.
For part 1, the gpu sniffing program, I started with the answer provided by fabrizioM because it is so compact. I quickly discovered that I needed many of the details found in unknown's answer to get it to work well. What I ended up with is the following C source file, which I named has_cuda_gpu.c
:
#include <stdio.h>
#include <cuda_runtime.h>
int main() {
int deviceCount, device;
int gpuDeviceCount = 0;
struct cudaDeviceProp properties;
cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);
if (cudaResultCode != cudaSuccess)
deviceCount = 0;
/* machines with no GPUs can still report one emulation device */
for (device = 0; device < deviceCount; ++device) {
cudaGetDeviceProperties(&properties, device);
if (properties.major != 9999) /* 9999 means emulation only */
++gpuDeviceCount;
}
printf("%d GPU CUDA device(s) found\n", gpuDeviceCount);
/* don't just return the number of gpus, because other runtime cuda
errors can also yield non-zero return values */
if (gpuDeviceCount > 0)
return 0; /* success */
else
return 1; /* failure */
}
Notice that the return code is zero in the case where a cuda-enabled GPU is found. This is because on one of my has-cuda-but-no-GPU machines, this program generates a runtime error with non-zero exit code. So any non-zero exit code is interpreted as "cuda does not work on this machine".
You might ask why I don't use cuda emulation mode on non-GPU machines. It is because emulation mode is buggy. I only want to debug my code, and work around bugs in cuda GPU code. I don't have time to debug the emulator.
The second part of the problem is the cmake code to use this test program. After some struggle, I have figured it out. The following block is part of a larger CMakeLists.txt
file:
find_package(CUDA)
if(CUDA_FOUND)
try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR
${CMAKE_BINARY_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c
CMAKE_FLAGS
-DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}
-DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}
COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR
RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)
message("${RUN_OUTPUT_VAR}") # Display number of GPUs found
# COMPILE_RESULT_VAR is TRUE when compile succeeds
# RUN_RESULT_VAR is zero when a GPU is found
if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)
set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")
else()
set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")
endif()
endif(CUDA_FOUND)
This sets a CUDA_HAVE_GPU
boolean variable in cmake that can subsequently be used to trigger conditional operations.
It took me a long time to figure out that the include and link parameters need to go in the CMAKE_FLAGS stanza, and what the syntax should be. The try_run documentation is very light, but there is more information in the try_compile documentation, which is a closely related command. I still needed to scour the web for examples of try_compile and try_run before getting this to work.
Another tricky but important detail is the third argument to try_run
, the "bindir". You should probably always set this to ${CMAKE_BINARY_DIR}
. In particular, do not set it to ${CMAKE_CURRENT_BINARY_DIR}
if you are in a subdirectory of your project. CMake expects to find the subdirectory CMakeFiles/CMakeTmp
within bindir, and spews errors if that directory does not exist. Just use ${CMAKE_BINARY_DIR}
, which is one location where those subdirectories seem to naturally reside.
AFAIK, JavaCL / OpenCL4Java is the only OpenCL binding that is available on all platforms right now (including MacOS X, FreeBSD, Linux, Windows, Solaris, all in Intel 32, 64 bits and ppc variants, thanks to its use of JNA).
It has demos that actually run fine from Java Web Start at least on Mac and Windows (to avoid random crashes on Linux, please see this wiki page, such as this Particles Demo.
It also comes with a few utilities (GPGPU random number generation, basic parallel reduction, linear algebra) and a Scala DSL.
Finally, it's the oldest bindings available (since june 2009) and it has an active user community.
(Disclaimer: I'm JavaCL's author :-))
Best Answer
To get real-time insight on used resources, do:
nvidia-smi -l 1
This will loop and call the view at every second.
If you do not want to keep past traces of the looped call in the console history, you can also do:
watch -n0.1 nvidia-smi
Where 0.1 is the time interval, in seconds.