Skip to content

vllm.model_executor.kernels.linear.mxfp8

Modules:

Name Description
Mxfp8LinearKernel
emulation
flashinfer
marlin
xpu

Mxfp8LinearLayerConfig dataclass

Configuration for an MXFP8 linear layer.

All MXFP8 layers share the same structure: FP8-E4M3 weights with uint8 (E8M0) per-block scales at block size 32.

Source code in vllm/model_executor/kernels/linear/mxfp8/Mxfp8LinearKernel.py
@dataclass
class Mxfp8LinearLayerConfig:
    """Configuration for an MXFP8 linear layer.

    All MXFP8 layers share the same structure: FP8-E4M3 weights with
    uint8 (E8M0) per-block scales at block size 32.
    """

    pass