vllm.v1.worker.ubatch_splitting ¶
check_ubatch_thresholds ¶
check_ubatch_thresholds(
config: ParallelConfig,
num_tokens: int,
uniform_decode: bool,
) -> bool
Source code in vllm/v1/worker/ubatch_splitting.py
create_ubatch_slices ¶
create_ubatch_slices(
num_scheduled_tokens: ndarray, split_point: int
) -> UBatchSlices
Source code in vllm/v1/worker/ubatch_splitting.py
get_dp_padding_ubatch ¶
get_dp_padding_ubatch(
num_tokens_unpadded: int,
num_tokens_padded: int,
should_attempt_ubatching: bool,
vllm_config: VllmConfig,
) -> tuple[bool, Optional[Tensor]]
-
Decides if each DP rank is going to microbatch. Either all ranks run with microbatching or none of them do. If this function decides not to run with microbatching. It will "abort" meaning that no padding information will be returned to the caller. It will return (False, None)
-
Determines the total number of tokens that each rank will run. All ranks will be padded out so that the run with the same number of tokens
tuple[
Name | Type | Description |
---|---|---|
should_ubatch | bool | Are all DP ranks going to microbatch |
num_tokens_after_padding | Optional[Tensor] | A tensor containing the total number of |
tuple[bool, Optional[Tensor]] | tokens per-microbatch for each DP rank including padding. Will be | |
tuple[bool, Optional[Tensor]] | None if should_ubatch if False |
]
Source code in vllm/v1/worker/ubatch_splitting.py
should_ubatch_with_num_tokens ¶
should_ubatch_with_num_tokens(
should_ubatch: bool,
orig_num_tokens_per_ubatch: int,
padded_num_tokens_per_ubatch: int,
vllm_config: VllmConfig,
) -> tuple[bool, Optional[Tensor]]
Source code in vllm/v1/worker/ubatch_splitting.py
ubatch_split ¶
ubatch_split(
num_scheduled_tokens_per_request: ndarray,
num_tokens_unpadded: int,
num_tokens_padded: int,
uniform_decode: bool,
vllm_config: VllmConfig,
) -> tuple[Optional[UBatchSlices], Optional[Tensor]]
Coordinates amongst all DP ranks to determine if and how the full batch should be split into microbatches.
tuple[
Name | Type | Description |
---|---|---|
ubatch_slices | Optional[UBatchSlices] | if this is set then all DP ranks have agreed to |
Optional[Tensor] | microbatch | |
num_tokens_after_padding | tuple[Optional[UBatchSlices], Optional[Tensor]] | A tensor containing the total number of |
tuple[Optional[UBatchSlices], Optional[Tensor]] | tokens per-microbatch for each DP rank including padding. Will be | |
tuple[Optional[UBatchSlices], Optional[Tensor]] | None if ubatch_slices is None |
]