vllm.model_executor.model_loader.weight_utils ¶
Utilities for downloading and initializing model weights.
_BAR_FORMAT module-attribute
¶
_BAR_FORMAT = "{desc}: {percentage:3.0f}% Completed | {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}]\n"
runai_model_streamer module-attribute
¶
runai_model_streamer = PlaceholderModule(
"runai_model_streamer"
)
DisabledTqdm ¶
Bases: tqdm
Source code in vllm/model_executor/model_loader/weight_utils.py
_init_loader ¶
Source code in vllm/model_executor/model_loader/weight_utils.py
_shared_pointers ¶
Source code in vllm/model_executor/model_loader/weight_utils.py
atomic_writer ¶
atomic_writer(
filepath: Union[str, Path],
mode: str = "w",
encoding: Optional[str] = None,
) -> Generator[IO]
Context manager that provides an atomic file writing routine.
The context manager writes to a temporary file and, if successful, atomically replaces the original file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath | str or Path | The path to the file to write. | required |
mode | str | The file mode for the temporary file (e.g., 'w', 'wb'). | 'w' |
encoding | str | The encoding for text mode. | None |
Yields:
Type | Description |
---|---|
Generator[IO] | file object: A handle to the temporary file. |
Source code in vllm/model_executor/model_loader/weight_utils.py
composed_weight_loader ¶
composed_weight_loader(
loader: LoaderFunction, fn: Callable[[Tensor], Tensor]
) -> LoaderFunction
Create a weight loader that post-processes the weights after loading
Source code in vllm/model_executor/model_loader/weight_utils.py
convert_bin_to_safetensor_file ¶
Source code in vllm/model_executor/model_loader/weight_utils.py
convert_pyslice_to_tensor ¶
convert PySafeSlice object from safetensors to torch.Tensor
PySafeSlice object supports indexing, which is done before loading the actual tensor and can reduce the amount of memory being read into the memory. However, it does not support more advanced functionalities like .view()
or .t()
. Therefore, if we need to modify the loaded tensor with these more complicated operators, we need to convert to tensor first.
Source code in vllm/model_executor/model_loader/weight_utils.py
default_weight_loader ¶
Default weight loader.
Source code in vllm/model_executor/model_loader/weight_utils.py
download_safetensors_index_file_from_hf ¶
download_safetensors_index_file_from_hf(
model_name_or_path: str,
index_file: str,
cache_dir: Optional[str],
revision: Optional[str] = None,
) -> None
Download hf safetensors index file from Hugging Face Hub.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name_or_path | str | The model name or path. | required |
index_file | str | The safetensors index file name | required |
cache_dir | Optional[str] | The cache directory to store the model weights. If None, will use HF defaults. | required |
revision | Optional[str] | The revision of the model. | None |
Source code in vllm/model_executor/model_loader/weight_utils.py
download_weights_from_hf ¶
download_weights_from_hf(
model_name_or_path: str,
cache_dir: Optional[str],
allow_patterns: list[str],
revision: Optional[str] = None,
ignore_patterns: Optional[Union[str, list[str]]] = None,
) -> str
Download model weights from Hugging Face Hub.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name_or_path | str | The model name or path. | required |
cache_dir | Optional[str] | The cache directory to store the model weights. If None, will use HF defaults. | required |
allow_patterns | list[str] | The allowed patterns for the weight files. Files matched by any of the patterns will be downloaded. | required |
revision | Optional[str] | The revision of the model. | None |
ignore_patterns | Optional[Union[str, list[str]]] | The patterns to filter out the weight files. Files matched by any of the patterns will be ignored. | None |
Returns:
Name | Type | Description |
---|---|---|
str | str | The path to the downloaded model weights. |
Source code in vllm/model_executor/model_loader/weight_utils.py
enable_hf_transfer ¶
automatically activates hf_transfer
Source code in vllm/model_executor/model_loader/weight_utils.py
fastsafetensors_weights_iterator ¶
fastsafetensors_weights_iterator(
hf_weights_files: list[str], use_tqdm_on_load: bool
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files using fastsafetensor library.
Source code in vllm/model_executor/model_loader/weight_utils.py
filter_duplicate_safetensors_files ¶
filter_duplicate_safetensors_files(
hf_weights_files: list[str],
hf_folder: str,
index_file: str,
) -> list[str]
Source code in vllm/model_executor/model_loader/weight_utils.py
filter_files_not_needed_for_inference ¶
Exclude files that are not needed for inference.
See https://github.com/huggingface/transformers/blob/v4.34.0/src/transformers/trainer.py#L227-L233
Source code in vllm/model_executor/model_loader/weight_utils.py
get_gguf_extra_tensor_names ¶
Source code in vllm/model_executor/model_loader/weight_utils.py
get_gguf_weight_type_map ¶
Return GGUF mapped weight's name and its quant type
Source code in vllm/model_executor/model_loader/weight_utils.py
get_lock ¶
Source code in vllm/model_executor/model_loader/weight_utils.py
get_quant_config ¶
get_quant_config(
model_config: ModelConfig, load_config: LoadConfig
) -> QuantizationConfig
Source code in vllm/model_executor/model_loader/weight_utils.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 |
|
get_sparse_attention_config ¶
get_sparse_attention_config(
model_config: ModelConfig,
load_config: LoadConfig,
sparse_attention_config_filename: str = "sparse_attention_config.json",
) -> dict[str, Any]
Source code in vllm/model_executor/model_loader/weight_utils.py
gguf_quant_weights_iterator ¶
gguf_quant_weights_iterator(
gguf_file: str, gguf_to_hf_name_map: dict[str, str]
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the quant weights in the model gguf files and convert them to torch tensors
Source code in vllm/model_executor/model_loader/weight_utils.py
initialize_dummy_weights ¶
initialize_dummy_weights(
model: Module,
low: float = -0.001,
high: float = 0.001,
seed: int = 1234,
) -> None
Initialize model weights with random values.
The model weights must be randomly initialized for accurate performance measurements. Additionally, the model weights should not cause NaNs in the forward pass. We empirically found that initializing the weights with values between -1e-3 and 1e-3 works well for most models.
We use per-parameter random seed, so that dummy weights are consistent, even if the model is partitioned across multiple devices. When the seed is fixed, the random values generated by this function only depends on the parameter's number of elements and its data type.
Source code in vllm/model_executor/model_loader/weight_utils.py
maybe_download_from_modelscope ¶
maybe_download_from_modelscope(
model: str,
revision: Optional[str] = None,
download_dir: Optional[str] = None,
ignore_patterns: Optional[Union[str, list[str]]] = None,
allow_patterns: Optional[Union[list[str], str]] = None,
) -> Optional[str]
Download model from ModelScope hub if VLLM_USE_MODELSCOPE is True.
Returns the path to the downloaded model, or None if the model is not downloaded from ModelScope.
Source code in vllm/model_executor/model_loader/weight_utils.py
maybe_remap_kv_scale_name ¶
Remap the name of FP8 k/v_scale parameters.
This function handles the remapping of FP8 k/v_scale parameter names. It detects if the given name ends with a suffix and attempts to remap it to the expected name format in the model. If the remapped name is not found in the params_dict, a warning is printed and None is returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name | str | The original loaded checkpoint parameter name. | required |
params_dict | dict | Dictionary containing the model's named parameters. | required |
Returns:
Name | Type | Description |
---|---|---|
str | Optional[str] | The remapped parameter name if successful, or the original name if no remapping is needed. |
None | Optional[str] | If the remapped name is not found in params_dict. |
Source code in vllm/model_executor/model_loader/weight_utils.py
945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 |
|
multi_thread_pt_weights_iterator ¶
multi_thread_pt_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
pt_load_map_location: Union[
str, dict[str, str]
] = "cpu",
max_workers: int = 4,
) -> Generator[tuple[str, Tensor], None, None]
Multi-Thread iterate over the weights in the model bin/pt files.
Source code in vllm/model_executor/model_loader/weight_utils.py
multi_thread_safetensors_weights_iterator ¶
multi_thread_safetensors_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
max_workers: int = 4,
) -> Generator[tuple[str, Tensor], None, None]
Multi-Thread iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
np_cache_weights_iterator ¶
np_cache_weights_iterator(
model_name_or_path: str,
cache_dir: Optional[str],
hf_folder: str,
hf_weights_files: list[str],
use_tqdm_on_load: bool,
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model np files.
Will dump the model weights to numpy files if they are not already dumped.
Source code in vllm/model_executor/model_loader/weight_utils.py
pt_weights_iterator ¶
pt_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
pt_load_map_location: Union[
str, dict[str, str]
] = "cpu",
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model bin/pt files.
Source code in vllm/model_executor/model_loader/weight_utils.py
row_parallel_weight_loader ¶
Load weights that are row-parallelized.
Source code in vllm/model_executor/model_loader/weight_utils.py
runai_safetensors_weights_iterator ¶
runai_safetensors_weights_iterator(
hf_weights_files: list[str], use_tqdm_on_load: bool
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
safetensors_weights_iterator ¶
safetensors_weights_iterator(
hf_weights_files: list[str],
use_tqdm_on_load: bool,
safetensors_load_strategy: str = "lazy",
) -> Generator[tuple[str, Tensor], None, None]
Iterate over the weights in the model safetensor files.
Source code in vllm/model_executor/model_loader/weight_utils.py
sharded_weight_loader ¶
sharded_weight_loader(shard_axis: int) -> LoaderFunction
Create a weight loader that shards the weights along the given axis