CUDA and HIP
rules_ll fully automates the setups for heterogeneous toolchains. This lets
you build CUDA and HIP code with minimal adjustments to your build files.
At the moment rules_ll supports Nvidia and AMD GPUs.
You can find examples at rules_ll/examples.
Warning
This feature is still under heavy development. APIs will frequently change.
Prerequisites
You do not need to install the CUDA Toolkit or the ROCm stack to use the
heterogeneous toolchains - rules_ll does that for you. You do need to
install an Nvidia or AMD driver on the system that runs the executables though.
Since rules_ll bumps versions rather aggressively make sure to use the latest
drivers.
You don't need a GPU on the machine that builds the heterogeneous targets.
Example
Heterogeneous targets need to know about three things:
- The framework that you used to write your code. At the moment you can use CUDA and HIP. OpenMP and SYCL planned for future releases.
- The GPU target architecture. At the moment,
rules_llsupportsnvptxfor Nvidia GPUs andamdgpufor AMD GPUs.spirvfor Intel GPUs planned for future releases. - The offload architectures of the target GPU models. Also known as compute capability. You can find a list of Nvidia GPU models and corresponding compute capabilities here and a list of AMD GPU models and corresponding compute capabilities here.
The ll_library and ll_binary rules have a compilation_mode attribute which
you can set according to the scheme <framework>_<target_arch>:
| Framework | Target Architecture | compilation_mode |
|---|---|---|
| CUDA | NVPTX | cuda_nvptx |
| HIP | NVPTX | hip_nvptx |
| HIP | AMDGPU | hip_amdgpu |
To offload to specific architectures, add the corresponding architecture to
compile_flags with the --offload-arch flag.
| Target Architecture | Supported offload-arch |
Example compile_flags |
|---|---|---|
| NVPTX | 5.2 to 9.0 | --offload-arch=sm_52 |
| AMDGPU | GFX8 to GFX11 | --offload-arch=gfx1103 |
For instance, to build HIP code for an Nvidia Titan V with compute capability 7.0 you could write a target like this:
load("@rules_ll//ll:defs.bzl", "ll_binary")
ll_binary(
name = "my_hip_nvptx_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
],
)
For an AMD RX 7900 XT with compute capability GFX11 you could write a target like this:
load("@rules_ll//ll:defs.bzl", "ll_binary")
ll_binary(
name = "my_hip_amdgpu_target",
srcs = ["main.cpp"],
compilation_mode = "hip_amdgpu",
compile_flags = [
"--offload-arch=gfx1100",
],
)
Targeting all available architectures
Use the OFFLOAD_ALL_NVPTX shortcut to target all supported NVPTX offload
architectures:
load("@rules_ll//ll:defs.bzl", "OFFLOAD_ALL_NVPTX", "ll_binary")
ll_binary(
name = "my_hip_nvptx_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = OFFLOAD_ALL_NVPTX,
)
Use the OFFLOAD_ALL_AMDGPU shortcut to target all supported AMDGPU offload
architectures:
load("@rules_ll//ll:defs.bzl", "OFFLOAD_ALL_AMDGPU", "ll_binary")
ll_binary(
name = "my_hip_amdgpu_target",
srcs = ["main.cpp"],
compilation_mode = "hip_amdgpu",
compile_flags = OFFLOAD_ALL_AMDGPU,
)
Relocatable device code
To build relocatable device code,
add -fgpu-rdc to compile_flags. This lets you split device code into
different files for a cleaner repository layout. Note that this comes at the
cost of an often negligible runtime performance penalty:
ll_library(
name = "my_device_code",
srcs = ["device_code.cpp"],
exposed_hdrs = ["device_code_declaration.hpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
"-fgpu-rdc",
],
)
ll_binary(
name = "my_hip_nvidia_target",
srcs = ["main.cpp"],
compilation_mode = "hip_nvptx",
compile_flags = [
"--offload-arch=sm_70",
"-fgpu-rdc",
],
deps = [
":my_device_code",
],
)
Caveats
C++ modules don't work with heterogeneous code yet.
Targeting both NVPTX and AMDGPU in a single codebase requires separate targets,
making build files somewhat verbose. rules_ll plans to change the API for
heterogeneous compilation to use platforms so that select becomes viable for
such use cases.
Confusingly, the compilation_mode flag in ll_* targets has the name as the
unrelated --compilation_mode flag for Bazel. Planned to change in the future.