vllm.v1.outputs ¶
EMPTY_MODEL_RUNNER_OUTPUT module-attribute
¶
EMPTY_MODEL_RUNNER_OUTPUT = ModelRunnerOutput(
req_ids=[],
req_id_to_index={},
sampled_token_ids=[],
logprobs=None,
prompt_logprobs_dict={},
pooler_output=[],
num_nans_in_logits=None,
)
AsyncModelRunnerOutput ¶
Bases: ABC
Source code in vllm/v1/outputs.py
get_output abstractmethod
¶
get_output() -> ModelRunnerOutput
Get the ModelRunnerOutput for this async output.
This is a blocking call that waits until the results are ready, which might involve copying device tensors to the host. This method should only be called once per AsyncModelRunnerOutput.
Source code in vllm/v1/outputs.py
DraftTokenIds dataclass
¶
Source code in vllm/v1/outputs.py
KVConnectorOutput dataclass
¶
Source code in vllm/v1/outputs.py
LogprobsLists ¶
Bases: NamedTuple
Source code in vllm/v1/outputs.py
LogprobsTensors ¶
Bases: NamedTuple
Source code in vllm/v1/outputs.py
empty_cpu staticmethod
¶
empty_cpu(
num_positions: int, num_tokens_per_position: int
) -> LogprobsTensors
Create empty LogprobsTensors on CPU.
Source code in vllm/v1/outputs.py
ModelRunnerOutput dataclass
¶
Source code in vllm/v1/outputs.py
kv_connector_output class-attribute
instance-attribute
¶
kv_connector_output: Optional[KVConnectorOutput] = None
num_nans_in_logits class-attribute
instance-attribute
¶
prompt_logprobs_dict instance-attribute
¶
prompt_logprobs_dict: dict[str, Optional[LogprobsTensors]]
__init__ ¶
__init__(
req_ids: list[str],
req_id_to_index: dict[str, int],
sampled_token_ids: list[list[int]],
logprobs: Optional[LogprobsLists],
prompt_logprobs_dict: dict[
str, Optional[LogprobsTensors]
],
pooler_output: list[Optional[Tensor]],
kv_connector_output: Optional[KVConnectorOutput] = None,
num_nans_in_logits: Optional[dict[str, int]] = None,
) -> None
SamplerOutput dataclass
¶
Source code in vllm/v1/outputs.py
__init__ ¶
__init__(
sampled_token_ids: Tensor,
logprobs_tensors: Optional[LogprobsTensors],
) -> None