vllm.v1.kv_offload.spec ¶
OffloadingSpec ¶
Bases: ABC
Spec for an offloading connector
Source code in vllm/v1/kv_offload/spec.py
offloaded_block_size instance-attribute
¶
offloaded_block_size = int(
get("block_size", gpu_block_size)
)
__init__ ¶
__init__(vllm_config: VllmConfig)
Source code in vllm/v1/kv_offload/spec.py
get_handlers abstractmethod
¶
get_handlers(
kv_caches: dict[str, Tensor],
) -> Iterator[
tuple[
type[LoadStoreSpec],
type[LoadStoreSpec],
OffloadingHandler,
]
]
Get offloading handlers along with their respective src and dst types.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kv_caches | dict[str, Tensor] | A dictionary of layer_name -> gpu_kv_cache tensor. | required |
Yields:
Type | Description |
---|---|
tuple[type[LoadStoreSpec], type[LoadStoreSpec], OffloadingHandler] | Tuples of (src_type, dst_type, offloading_handler). |
Source code in vllm/v1/kv_offload/spec.py
get_manager abstractmethod
¶
get_manager() -> OffloadingManager
Get an OffloadingManager that will be used by the scheduler-side offloading connector to track offloaded blocks and manage evictions.