vllm.utils.cpu_triton_utils ¶
Contains replacement functions to fallback Triton usages in CPU backend
_copy_and_expand_eagle_inputs_kernel_impl ¶
_copy_and_expand_eagle_inputs_kernel_impl(
target_token_ids_ptr,
target_positions_ptr,
next_token_ids_ptr,
out_input_ids_ptr,
out_positions_ptr,
out_is_rejected_token_mask_ptr,
out_is_masked_token_mask_ptr,
out_new_token_indices_ptr,
out_hidden_state_mapping_ptr,
query_start_loc_ptr,
query_end_loc_ptr,
padding_token_id,
parallel_drafting_token_id,
total_input_tokens,
num_padding_slots_per_request,
shift_input_ids,
BLOCK_SIZE_TOKENS=None,
BLOCK_SIZE_REQS=None,
)
Adapter between Triton kernel call convention and C++ implementation.
The Triton kernel uses '_ptr' suffixed parameter names and compile-time constants (BLOCK_SIZE_TOKENS, BLOCK_SIZE_REQS) which are not needed by the C++ implementation. C++ reads token id tensors as int64_t*. Output tensors that are int32 need copy-back after C++ writes int64.