vllm.model_executor.kernels.linear.mxfp8.Mxfp8LinearKernel ¶
Mxfp8LinearKernel ¶
Bases: ABC
Base class for MXFP8 quantized linear kernels.
Each subclass implements a specific GEMM backend (FlashInfer CUTLASS, Marlin, emulation).
Source code in vllm/model_executor/kernels/linear/mxfp8/Mxfp8LinearKernel.py
Mxfp8LinearLayerConfig dataclass ¶
Configuration for an MXFP8 linear layer.
All MXFP8 layers share the same structure: FP8-E4M3 weights with uint8 (E8M0) per-block scales at block size 32.