vllm.v1.kv_offload.abstract ¶
OffloadingManager class for managing KV data offloading in vLLM v1
This class runs in the scheduler, tracks which blocks are offloaded and their address.
The class provides the following primitives
lookup() - find the length of the maximal series of blocks, starting from the first one, that are all offloaded. prepare_load() - prepare given blocks to be read. The given blocks will be protected from eviction. This function returns a LoadSpec which encapsulates information required for performing the load. touch() - marks the give blocks as recently used. Can be used to track block's LRU. This function is separated from the prepare_load function to allow setting block recency even for blocks which do not need reading from the cache, such as blocks that are cached by the GPU prefix cache. complete_load() - mark blocks which were previously prepared to be loaded as done loading. This is to re-allow their eviction. prepare_store() - prepare the given blocks to be written. Returns a StoreSpec encapsulating offloading information, as well as a list of blocks that were evicted as a result. complete_store() - marks a previous store as completed. Following this call, the given blocks will become loadable.
LoadStoreSpec ¶
OffloadingEvent dataclass
¶
Source code in vllm/v1/kv_offload/abstract.py
OffloadingManager ¶
Bases: ABC
Source code in vllm/v1/kv_offload/abstract.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
complete_load ¶
complete_store ¶
Marks blocks which were previously prepared to be stored, as stored. Following this call, the blocks become loadable. If if_success is False, blocks that were not marked as stored will be removed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | Iterable[BlockHash] | the hashes identifying the blocks. | required |
success | bool | whether the blocks were stored successfully. | True |
Source code in vllm/v1/kv_offload/abstract.py
lookup abstractmethod
¶
Finds the length of the maximal series of blocks, starting from the first one, that are all offloaded.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | Iterable[BlockHash] | the hashes identifying the blocks to lookup. | required |
Returns:
Type | Description |
---|---|
int | An integer representing the maximal number of blocks that |
int | are currently offloaded. |
Source code in vllm/v1/kv_offload/abstract.py
prepare_load abstractmethod
¶
prepare_load(
block_hashes: Iterable[BlockHash],
) -> LoadStoreSpec
Prepare the given blocks to be read. The given blocks will be protected from eviction until complete_load is called. It assumes all given blocks are offloaded.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | Iterable[BlockHash] | the hashes identifying the blocks. | required |
Returns:
Type | Description |
---|---|
LoadStoreSpec | A LoadStoreSpec that can be used by a worker to locate and load |
LoadStoreSpec | the actual offloaded KV data. |
Source code in vllm/v1/kv_offload/abstract.py
prepare_store abstractmethod
¶
prepare_store(
block_hashes: Iterable[BlockHash],
) -> Optional[PrepareStoreOutput]
Prepare the given blocks to be offloaded. The given blocks will be protected from eviction until complete_store is called.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
block_hashes | Iterable[BlockHash] | the hashes identifying the blocks. | required |
Returns:
Type | Description |
---|---|
Optional[PrepareStoreOutput] | A PrepareStoreOutput indicating which blocks need storing, |
Optional[PrepareStoreOutput] | where to store them (LoadStoreSpec), and list of blocks that |
Optional[PrepareStoreOutput] | were evicted as a result. |
Optional[PrepareStoreOutput] | None is returned if the blocks cannot be stored. |
Source code in vllm/v1/kv_offload/abstract.py
take_events ¶
take_events() -> Iterable[OffloadingEvent]
Take the offloading events from the manager.
Yields:
Type | Description |
---|---|
Iterable[OffloadingEvent] | New OffloadingEvents collected since the last call. |