vllm.config.kv_events ¶
KVEventsConfig ¶
Configuration for KV event publishing.
Source code in vllm/config/kv_events.py
buffer_steps class-attribute
instance-attribute
¶
buffer_steps: int = 10000
The number of steps to cache for replay endpoint. Will only save events from the last N steps for the replay endpoint.
enable_kv_cache_events class-attribute
instance-attribute
¶
enable_kv_cache_events: bool = False
If True, enable KV cache events for tracking block storage and removal. Events can be published externally by zmq using the event publisher config.
endpoint class-attribute
instance-attribute
¶
endpoint: str = 'tcp://*:5557'
The zmq endpoint to use for publishing kv events.
hwm class-attribute
instance-attribute
¶
hwm: int = 100000
The zmq high water mark for the event publisher. After queueing N events, events will start dropping if the consumer is not keeping up.
max_queue_size class-attribute
instance-attribute
¶
max_queue_size: int = 100000
The maximum number of events to queue while waiting for publishing.
publisher class-attribute
instance-attribute
¶
publisher: str = 'null'
The publisher to use for publishing kv events. Can be "null", "zmq".
replay_endpoint class-attribute
instance-attribute
¶
The zmq endpoint to use for replaying kv events.