vllm.model_executor.kernels.linear.mxfp8 ¶
Modules:
| Name | Description |
|---|---|
Mxfp8LinearKernel | |
emulation | |
flashinfer | |
marlin | |
xpu | |
Mxfp8LinearLayerConfig dataclass ¶
Configuration for an MXFP8 linear layer.
All MXFP8 layers share the same structure: FP8-E4M3 weights with uint8 (E8M0) per-block scales at block size 32.