vllm.model_executor ¶
Modules:
Name | Description |
---|---|
custom_op | |
layers | |
model_loader | |
models | |
parameter | |
utils | Utils for model executor. |
warmup | |
__all__ module-attribute
¶
BasevLLMParameter ¶
Bases: Parameter
Base parameter for vLLM linear layers. Extends the torch.nn.parameter by taking in a linear weight loader. Will copy the loaded weight into the parameter when the provided weight loader is called.
Source code in vllm/model_executor/parameter.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
__init__ ¶
Initialize the BasevLLMParameter
:param data: torch tensor with the parameter data :param weight_loader: weight loader callable
:returns: a torch.nn.parameter
Source code in vllm/model_executor/parameter.py
__new__ ¶
__torch_function__ classmethod
¶
_shard_id_as_int ¶
Source code in vllm/model_executor/parameter.py
PackedvLLMParameter ¶
Bases: ModelWeightParameter
Parameter for model weights which are packed on disk. Example: GPTQ Marlin weights are int4 or int8, packed into int32. Extends the ModelWeightParameter to take in the packed factor, the packed dimension, and optionally, marlin tile size for marlin kernels. Adjusts the shard_size and shard_offset for fused linear layers model weight loading by accounting for packing and optionally, marlin tile size.
Source code in vllm/model_executor/parameter.py
__init__ ¶
__init__(
packed_factor: Union[int, Fraction],
packed_dim: int,
marlin_tile_size: Optional[int] = None,
bitblas_tile_size: Optional[int] = None,
**kwargs,
)