vllm.config.pooler ¶
PoolerConfig ¶
Controls the behavior of output pooling in pooling models.
Source code in vllm/config/pooler.py
activation class-attribute
instance-attribute
¶
Whether to apply activation function to the classification outputs. Defaults to True.
dimensions class-attribute
instance-attribute
¶
Reduce the dimensions of embeddings if model support matryoshka representation. Defaults to None.
enable_chunked_processing class-attribute
instance-attribute
¶
Whether to enable chunked processing for long inputs that exceed the model's maximum position embeddings. When enabled, long inputs will be split into chunks, processed separately, and then aggregated using weighted averaging. This allows embedding models to handle arbitrarily long text without CUDA errors. Defaults to False.
logit_bias class-attribute
instance-attribute
¶
If provided, apply classification logit biases. Defaults to None.
max_embed_len class-attribute
instance-attribute
¶
Maximum input length allowed for embedding generation. When set, allows inputs longer than max_embed_len to be accepted for embedding models. When an input exceeds max_embed_len, it will be handled according to the original max_model_len validation logic. Defaults to None (i.e. set to max_model_len).
normalize class-attribute
instance-attribute
¶
Whether to normalize the embeddings outputs. Defaults to True.
pooling_type class-attribute
instance-attribute
¶
The pooling method of the pooling model. This should be a key in vllm.model_executor.layers.pooler.PoolingType
.
returned_token_ids class-attribute
instance-attribute
¶
A list of indices for the vocabulary dimensions to be extracted, such as the token IDs of good_token
and bad_token
in the math-shepherd-mistral-7b-prm
model.
softmax class-attribute
instance-attribute
¶
Whether to apply softmax to the reward outputs. Defaults to True.
step_tag_id class-attribute
instance-attribute
¶
If set, only the score corresponding to the step_tag_id
in the generated sentence should be returned. Otherwise, the scores for all tokens are returned.
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.