vllm.v1.worker.encoder_cudagraph_defs ¶
Data transfer objects for encoder CUDA graph management.
EncoderCudaGraphCaptureInputs dataclass ¶
Everything needed for one CUDA graph capture.
Returned by prepare_encoder_cudagraph_capture_inputs().
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
buffers instance-attribute ¶
Precomputed tensor buffers that will be recorded into the CUDA graph. The manager stores references to these exact tensor objects and copies new data into them before each graph.replay() call (buffer identity invariant).
EncoderCudaGraphConfig dataclass ¶
Configuration for encoder CUDA graph management.
Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
buffer_keys instance-attribute ¶
Keys for the tensor buffers recorded into the CUDA graph. Before replay the manager zeros then slice-copies new data into these buffers.
input_key_by_modality instance-attribute ¶
Per-modality input tensor key mapping, e.g. {"image": "pixel_values", "video": "pixel_values_videos"}.
EncoderCudaGraphReplayBuffers dataclass ¶
New buffer values for graph replay, computed by the model from actual batch inputs.
Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.