vllm.v1.kv_offload.backend ¶
Backend ¶
Bases: ABC
An abstract class for allocating and returning specs for writing KV blocks to some backend.
Source code in vllm/v1/kv_offload/backend.py
__init__ ¶
allocate_blocks abstractmethod
¶
allocate_blocks(
block_hashes: list[BlockHash],
) -> list[BlockStatus]
Allocate space for writing blocks. This method assumes there is enough space for allocation. It is unsafe to use without checking get_num_free_blocks beforehand.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | list[BlockHash] | the hashes identifying the blocks to be written. | required |
Returns:
Type | Description |
---|---|
list[BlockStatus] | A list of BlockStatus for the allocated blocks. |
list[BlockStatus] | The ref_cnt of each returned item will be -1, meaning the block |
list[BlockStatus] | is not yet ready to be read. |
Source code in vllm/v1/kv_offload/backend.py
free abstractmethod
¶
free(block: BlockStatus)
Free a previously allocated block. You should only call this function with blocks returned by allocate_blocks, and only once per each block.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block | BlockStatus | The block to be freed. | required |
Source code in vllm/v1/kv_offload/backend.py
get_load_store_spec ¶
get_load_store_spec(
block_hashes: Iterable[BlockHash],
blocks: Iterable[BlockStatus],
) -> LoadStoreSpec
Get backend-specific information on how to read/write blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | Iterable[BlockHash] | the list of block hashes identifying the blocks. | required |
blocks | Iterable[BlockStatus] | the list of blocks. | required |
Returns:
Type | Description |
---|---|
LoadStoreSpec | A LoadStoreSpec that can be used by a worker |
LoadStoreSpec | to read/write the blocks. |
Source code in vllm/v1/kv_offload/backend.py
get_num_free_blocks abstractmethod
¶
BlockStatus ¶
Bases: Structure
Offloading status for a single block of KV data. Holds the following information:
ref_cnt - the current number of transfers using this block as a source. A value of -1 indicates the block is not yet ready to be read. load_store_spec - backend-specific information on how to actually read/write the block.