vllm.outputs ¶
ClassificationOutput dataclass
¶
The output data of one classification output of a request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
probs | list[float] | The probability vector, which is a list of floats. Its length depends on the number of classes. | required |
Source code in vllm/outputs.py
ClassificationRequestOutput ¶
Bases: PoolingRequestOutput[ClassificationOutput]
Source code in vllm/outputs.py
from_base staticmethod
¶
from_base(request_output: PoolingRequestOutput)
Source code in vllm/outputs.py
CompletionOutput dataclass
¶
The output data of one completion output of a request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index | int | The index of the output in the request. | required |
text | str | The generated output text. | required |
token_ids | Sequence[int] | The token IDs of the generated output text. | required |
cumulative_logprob | Optional[float] | The cumulative log probability of the generated output text. | required |
logprobs | Optional[SampleLogprobs] | The log probabilities of the top probability words at each position if the logprobs are requested. | required |
finish_reason | Optional[str] | The reason why the sequence is finished. | None |
stop_reason | Union[int, str, None] | The stop string or token id that caused the completion to stop, None if the completion finished for some other reason including encountering the EOS token. | None |
lora_request | Optional[LoRARequest] | The LoRA request that was used to generate the output. | None |
Source code in vllm/outputs.py
__init__ ¶
__init__(
index: int,
text: str,
token_ids: Sequence[int],
cumulative_logprob: Optional[float],
logprobs: Optional[SampleLogprobs],
finish_reason: Optional[str] = None,
stop_reason: Union[int, str, None] = None,
lora_request: Optional[LoRARequest] = None,
) -> None
EmbeddingOutput dataclass
¶
The output data of one embedding output of a request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding | list[float] | The embedding vector, which is a list of floats. Its length depends on the hidden dimension of the model. | required |
Source code in vllm/outputs.py
from_base staticmethod
¶
from_base(pooling_output: PoolingOutput)
EmbeddingRequestOutput ¶
Bases: PoolingRequestOutput[EmbeddingOutput]
Source code in vllm/outputs.py
from_base staticmethod
¶
from_base(request_output: PoolingRequestOutput)
Source code in vllm/outputs.py
PoolingOutput dataclass
¶
The output data of one pooling output of a request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data | Tensor | The extracted hidden states. | required |
Source code in vllm/outputs.py
__eq__ ¶
PoolingRequestOutput ¶
The output data of a pooling request to the LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
request_id | str | A unique identifier for the pooling request. | required |
outputs | PoolingOutput | The pooling results for the given input. | required |
prompt_token_ids | list[int] | A list of token IDs used in the prompt. | required |
finished | bool | A flag indicating whether the pooling is completed. | required |
Source code in vllm/outputs.py
__init__ ¶
RequestOutput ¶
The output data of a completion request to the LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
request_id | str | The unique ID of the request. | required |
prompt | Optional[str] | The prompt string of the request. For encoder/decoder models, this is the decoder input prompt. | required |
prompt_token_ids | Optional[list[int]] | The token IDs of the prompt. For encoder/decoder models, this is the decoder input prompt token ids. | required |
prompt_logprobs | Optional[PromptLogprobs] | The log probabilities to return per prompt token. | required |
outputs | list[CompletionOutput] | The output sequences of the request. | required |
finished | bool | Whether the whole request is finished. | required |
metrics | Optional[RequestMetrics] | Metrics associated with the request. | None |
lora_request | Optional[LoRARequest] | The LoRA request that was used to generate the output. | None |
encoder_prompt | Optional[str] | The encoder prompt string of the request. None if decoder-only. | None |
encoder_prompt_token_ids | Optional[list[int]] | The token IDs of the encoder prompt. None if decoder-only. | None |
num_cached_tokens | Optional[int] | The number of tokens with prefix cache hit. | None |
kv_transfer_params | Optional[dict[str, Any]] | The params for remote K/V transfer. | None |
Source code in vllm/outputs.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
multi_modal_placeholders instance-attribute
¶
__init__ ¶
__init__(
request_id: str,
prompt: Optional[str],
prompt_token_ids: Optional[list[int]],
prompt_logprobs: Optional[PromptLogprobs],
outputs: list[CompletionOutput],
finished: bool,
metrics: Optional[RequestMetrics] = None,
lora_request: Optional[LoRARequest] = None,
encoder_prompt: Optional[str] = None,
encoder_prompt_token_ids: Optional[list[int]] = None,
num_cached_tokens: Optional[int] = None,
*,
multi_modal_placeholders: Optional[
MultiModalPlaceholderDict
] = None,
kv_transfer_params: Optional[dict[str, Any]] = None,
**kwargs: Any,
) -> None
Source code in vllm/outputs.py
__repr__ ¶
__repr__() -> str
Source code in vllm/outputs.py
add ¶
add(next_output: RequestOutput, aggregate: bool) -> None
Merge subsequent RequestOutput into this one
Source code in vllm/outputs.py
ScoringOutput dataclass
¶
The output data of one scoring output of a request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
score | float | The similarity score, which is a scalar value. | required |
Source code in vllm/outputs.py
from_base staticmethod
¶
from_base(pooling_output: PoolingOutput)
Source code in vllm/outputs.py
ScoringRequestOutput ¶
Bases: PoolingRequestOutput[ScoringOutput]
Source code in vllm/outputs.py
from_base staticmethod
¶
from_base(request_output: PoolingRequestOutput)