Simplify GPU Application Development with HMM on SLES or SLE HPC 15 SP5
Recently, NVIDIA has introduced Heterogeneous Memory Management (HMM) in its open source kernel drivers which simplifies GPU Application Development with CUDA. It unifies system memory access across CPUs and GPUs and removes the need to copy memory content between CPU and GPU memory. It extends Unified Memory to cover both system allocated memory as well as memory allocated by cudaMallocManaged()
.
You may ask, “how do I make this work on my system?” If you are a SUSE Linux Enterprise Server (SLES) or SUSE Linux Enterprise High Performance Computing (SLE HPC) 15 SP5 user, the open driver is already available to you. Therefore, if you have an NVIDIA chipset with a GPU System Processor (GSP), ie. NVIDIA Turing or later, we have you covered (openSUSE Leap users, check here). Here is how:
Installation on SLES/SLE HPC 15 SP5
Log into your system as root.
Due to the modular nature of SLES, you need to add two additional modules which are not enabled by default (they are enabled on SLE-HPC, though):
SUSEConnect -p sle-module-desktop-applications/15.5/x86_64 SUSEConnect -p sle-module-development-tools/15.5/x86_64
Note that your SLE system needs to be registered for these commands to work. On SLES and SLE HPC add the NVIDIA compute module now, and install the required packages by running:
SUSEConnect -p sle-module-NVIDIA-compute/15/x86_64 zypper --gpg-auto-import-keys refresh zypper -n install -y --auto-agree-with-licenses --no-recommends nvidia-open-gfxG05-kmp-default cuda
In case you require secure boot or deploy in a public cloud environment, you may want to take advantage of the G06 open kernel drivers which are pre-built and signed by SUSE and are shipped with SLE. To install these drivers, add an additional repository with the following commands instead of the above:
zypper ar https://download.nvidia.com/suse/sle15sp5/ NVIDIA SUSEConnect -p sle-module-NVIDIA-compute/15/x86_64 --gpg-auto-import-keys zypper --gpg-auto-import-keys refresh zypper -n in -y --auto-agree-with-licenses --no-recommends nvidia-open-driver-G06-signed-kmp-default nvidia-drivers-minimal-G06 cuda
This eliminates the need to enroll a separate MOK for secure boot as well as a separate build stage when the kernel drivers are installed or updated. It helps to reduce the size of cloud images since no extra build tools are required. It installs user space driver packages which are not available in the CUDA repository, yet.
Preparation
For chipsets with a display engine (i.e. which have display outputs), the open driver support is still considered alpha. Therefore, you may have to add or uncomment the following option in /etc/modprobe.d/50-nvidia-default.conf
:
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
Once these steps have been performed, you may either reboot the system or run:
modprobe nvidia
as root to load all required kernel modules.
Testing the Installation
To check if HMM is available and enabled, query the ‘Addressing Mode’ property:
nvidia-smi -q | grep Addressing Addressing Mode : HMM
If you see above output, HMM is available on your system.
Compile HMM Sample Code
NVIDIA discusses some code examples for HMM in its blog post. The examples can be found here on GitHub. Some these need a newer gcc than the stock version shipped with SLE 15, which you can install with:
zypper -n in -y gcc12-c++
In order to compile the examples, the PATH
environment variable needs to be extended to point to the CUDA binaries:
export PATH=/usr/local/cuda/bin/:${PATH}
You may now compile the examples under the path src
using the following commands:
nvcc -std=c++20 -ccbin=/usr/bin/g++-12 atomic_flag.cpp -o atomic_flag nvcc -std=c++20 -ccbin=/usr/bin/g++-12 file_after.cpp -o file_after nvcc -std=c++20 -ccbin=/usr/bin/g++-12 file_before.cpp -o file_before nvcc -std=c++20 -ccbin=/usr/bin/g++-12 ticket_lock.cpp -o ticket_lock
‘weather_app’ Example
For this example application, the system gcc
compiler is sufficient. Only $PATH has to be set to:
export PATH=/usr/local/cuda/bin/:${PATH}
Now, build the binary weather_app
by running:
make
The blog by NVIDIA describes how to obtain the data required to run the app. If you’re unable to download the ~1.3 TB of data, you may also use the random data generator from this PR on GitHub. The random data app can be compiled with:
g++ create_random_data.cpp -o create_random_data -O2 -Wall
The application has no command line parameters, and the start and end year for the random data has to be set in the source code itself.
NOTE: If your graphic card doesn’t have sufficient VRAM to run the original sample code, you may scale down the data size by reducing the input_grid_height
and input_grid_width
parameters in both create_random_data.cpp
and weather_app.cu
.
To do a sample run:
mkdir binary_1hr_all ./weather_app ./weather_app 1981 1982 binary_1hr_all/
NOTE: The Makefile
doesn’t compile CUDA kernels for the NVIDIA Turing GPUs and also has a faulty error message handling. You might want check out https://github.com/NVIDIA/HMM_sample_code/pull/2 which fixes this issues.
- The NVIDIA open driver provides HMM (Heterogeneous Memory Management) which extends the simplicity of the CUDA Unified Memory programming model even further on supported chipsets by including system allocated memory.
- HMM is available for SLES and SLE HPC 15 SP5
- The open driver allows for pre-built kernel drivers signed by SUSE.
- This greatly simplifies the installation in a secure boot environment.
- It streamlines the installation in public cloud environments by eliminating an extra build stage and reducing the size of the final image.
- We have demonstrated how to install and test HMM on SLES and SLE HPC 15 SP5.