vllm.config.lora ¶
LoRAConfig ¶
Configuration for LoRA.
Source code in vllm/config/lora.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
bias_enabled class-attribute
instance-attribute
¶
bias_enabled: bool = False
[DEPRECATED] Enable bias for LoRA adapters. This option will be removed in v0.12.0.
default_mm_loras class-attribute
instance-attribute
¶
Dictionary mapping specific modalities to LoRA model paths; this field is only applicable to multimodal models and should be leveraged when a model always expects a LoRA to be active when a given modality is present. Note that currently, if a request provides multiple additional modalities, each of which have their own LoRA, we do NOT apply default_mm_loras because we currently only support one lora adapter per prompt. When run in offline mode, the lora IDs for n modalities will be automatically assigned to 1-n with the names of the modalities in alphabetic order.
fully_sharded_loras class-attribute
instance-attribute
¶
fully_sharded_loras: bool = False
By default, only half of the LoRA computation is sharded with tensor parallelism. Enabling this will use the fully sharded layers. At high sequence length, max rank or tensor parallel size, this is likely faster.
lora_dtype class-attribute
instance-attribute
¶
Data type for LoRA. If auto, will default to base model dtype.
lora_extra_vocab_size class-attribute
instance-attribute
¶
lora_extra_vocab_size: int = 256
(Deprecated) Maximum size of extra vocabulary that can be present in a LoRA adapter. Will be removed in v0.12.0.
lora_vocab_padding_size class-attribute
¶
lora_vocab_padding_size: int = get_lora_vocab_padding_size()
max_cpu_loras class-attribute
instance-attribute
¶
Maximum number of LoRAs to store in CPU memory. Must be >= than max_loras
.
max_loras class-attribute
instance-attribute
¶
max_loras: int = 1
Max number of LoRAs in a single batch.
__post_init__ ¶
Source code in vllm/config/lora.py
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.
Source code in vllm/config/lora.py
verify_with_cache_config ¶
verify_with_cache_config(cache_config: CacheConfig)
verify_with_model_config ¶
verify_with_model_config(model_config: ModelConfig)