CUDA and HIP
rules_ll
fully automates the setups for heterogeneous toolchains. This lets
you build CUDA and HIP code with minimal adjustments to your build files.
At the moment rules_ll
supports Nvidia and AMD GPUs.
You can find examples at rules_ll/examples
.
Warning
This feature is still under heavy development. APIs will frequently change.
Prerequisites
You do not need to install the CUDA Toolkit or the ROCm stack to use the
heterogeneous toolchains - rules_ll
does that for you. You do need to
install an Nvidia or AMD driver on the system that runs the executables though.
Since rules_ll
bumps versions rather aggressively make sure to use the latest
drivers.
You don't need a GPU on the machine that builds the heterogeneous targets.
Example
Heterogeneous targets need to know about three things:
- The framework that you used to write your code. At the moment you can use CUDA and HIP. OpenMP and SYCL planned for future releases.
- The GPU target architecture. At the moment,
rules_ll
supportsnvptx
for Nvidia GPUs andamdgpu
for AMD GPUs.spirv
for Intel GPUs planned for future releases. - The offload architectures of the target GPU models. Also known as compute capability. You can find a list of Nvidia GPU models and corresponding compute capabilities here and a list of AMD GPU models and corresponding compute capabilities here.
The ll_library
and ll_binary
rules have a compilation_mode
attribute which
you can set according to the scheme <framework>_<target_arch>
:
Framework | Target Architecture | compilation_mode |
---|---|---|
CUDA | NVPTX | cuda_nvptx |
HIP | NVPTX | hip_nvptx |
HIP | AMDGPU | hip_amdgpu |
To offload to specific architectures, add the corresponding architecture to
compile_flags
with the --offload-arch
flag.
Target Architecture | Supported offload-arch |
Example compile_flags |
---|---|---|
NVPTX | 5.2 to 9.0 | --offload-arch=sm_52 |
AMDGPU | GFX8 to GFX11 | --offload-arch=gfx1103 |
For instance, to build HIP code for an Nvidia Titan V with compute capability 7.0 you could write a target like this:
load("@rules_ll//ll:defs.bzl", "ll_binary")
ll_binary(
name = "my_hip_nvptx_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
],
)
For an AMD RX 7900 XT with compute capability GFX11 you could write a target like this:
load("@rules_ll//ll:defs.bzl", "ll_binary")
ll_binary(
name = "my_hip_amdgpu_target",
srcs = ["main.cpp"],
compilation_mode = "hip_amdgpu",
compile_flags = [
"--offload-arch=gfx1100",
],
)
Targeting all available architectures
Use the OFFLOAD_ALL_NVPTX
shortcut to target all supported NVPTX offload
architectures:
load("@rules_ll//ll:defs.bzl", "OFFLOAD_ALL_NVPTX", "ll_binary")
ll_binary(
name = "my_hip_nvptx_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = OFFLOAD_ALL_NVPTX,
)
Use the OFFLOAD_ALL_AMDGPU
shortcut to target all supported AMDGPU offload
architectures:
load("@rules_ll//ll:defs.bzl", "OFFLOAD_ALL_AMDGPU", "ll_binary")
ll_binary(
name = "my_hip_amdgpu_target",
srcs = ["main.cpp"],
compilation_mode = "hip_amdgpu",
compile_flags = OFFLOAD_ALL_AMDGPU,
)
Relocatable device code
To build relocatable device code,
add -fgpu-rdc
to compile_flags
. This lets you split device code into
different files for a cleaner repository layout. Note that this comes at the
cost of an often negligible runtime performance penalty:
ll_library(
name = "my_device_code",
srcs = ["device_code.cpp"],
exposed_hdrs = ["device_code_declaration.hpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
"-fgpu-rdc",
],
)
ll_binary(
name = "my_hip_nvidia_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
"-fgpu-rdc",
],
deps = [
":my_device_code",
],
)
Caveats
C++ modules don't work with heterogeneous code yet.
Targeting both NVPTX and AMDGPU in a single codebase requires separate targets,
making build files somewhat verbose. rules_ll
plans to change the API for
heterogeneous compilation to use platforms so that select
becomes viable for
such use cases.
Confusingly, the compilation_mode
flag in ll_*
targets has the name as the
unrelated --compilation_mode
flag for Bazel. Planned to change in the future.