Emitters#
Common#
Common utilities for emitting CUTLASS kernels
PyTorch#
Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel.
If specified, the extension can be JIT compiled via PyTorch’s cpp_extension.load
method.
Example usage with JIT compilation:
plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
mod = cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=True)
# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]
# Run the module
D = mod.run(A, B, C)
Example usage without JIT compilation:
plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=False, sourcedir='output')
After this call, the directory output
contains setup.py
,
cutlass_gemm.cpp
, and cutlass_gemm_kernel.cu
. The module can be built from
within output
by running: TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user
.
The module can later be used in Python via:
import torch
import cutlass_gemm
# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]
# Run the module
D = cutlass_gemm.run(A, B, C)
- cutlass.emit.pytorch.pytorch(op, name, cc, jit=False, sourcedir='')[source]#
Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel specified by
op
. If thejit
parameter is set to true, the module is just-in-time compiled, loaded, and returned.The result of this method is files within
sourcedir
that can be used for building a PyTorch module.- Parameters:
op – operation to emit in the module
name (str) – name of the module to generate
cc (int) – compute capability of the device the module should target
jit (bool) – whether the module should be just-in-time compiled
sourcedir (str) – directory to which generated source files should be written
- Returns:
loaded PyTorch module (if
jit=True
) or None