Emitters#

Common#

Common utilities for emitting CUTLASS kernels

PyTorch#

Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel. If specified, the extension can be JIT compiled via PyTorch’s cpp_extension.load method.

Example usage with JIT compilation:

plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
mod = cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=True)

# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]

# Run the module
D = mod.run(A, B, C)

Example usage without JIT compilation:

plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=False, sourcedir='output')

After this call, the directory output contains setup.py, cutlass_gemm.cpp, and cutlass_gemm_kernel.cu. The module can be built from within output by running: TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user.

The module can later be used in Python via:

import torch
import cutlass_gemm

# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]

# Run the module
D = cutlass_gemm.run(A, B, C)
cutlass.emit.pytorch.pytorch(op, name, cc, jit=False, sourcedir='')[source]#

Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel specified by op. If the jit parameter is set to true, the module is just-in-time compiled, loaded, and returned.

The result of this method is files within sourcedir that can be used for building a PyTorch module.

Parameters:
  • op – operation to emit in the module

  • name (str) – name of the module to generate

  • cc (int) – compute capability of the device the module should target

  • jit (bool) – whether the module should be just-in-time compiled

  • sourcedir (str) – directory to which generated source files should be written

Returns:

loaded PyTorch module (if jit=True) or None