vllm.compilation.fusion ¶
FUSED_OPS module-attribute
¶
FUSED_OPS: dict[FusedRMSQuantKey, OpOverload] = {
FusedRMSQuantKey(kFp8StaticTensorSym, False): default,
FusedRMSQuantKey(kFp8StaticTensorSym, True): default,
FusedRMSQuantKey(kFp8DynamicTokenSym, False): default,
FusedRMSQuantKey(kFp8DynamicTokenSym, True): default,
}
QUANT_OPS module-attribute
¶
QUANT_OPS: dict[QuantKey, OpOverload] = {
kFp8StaticTensorSym: default,
kFp8DynamicTensorSym: default,
kFp8DynamicTokenSym: default,
}
FusedAddRMSNormDynamicQuantPattern ¶
Bases: RMSNormQuantPattern
Source code in vllm/compilation/fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
group_shape: GroupShape = PER_TOKEN,
symmetric=True,
)
Source code in vllm/compilation/fusion.py
register ¶
Source code in vllm/compilation/fusion.py
FusedAddRMSNormStaticQuantPattern ¶
Bases: RMSNormQuantPattern
Source code in vllm/compilation/fusion.py
__init__ ¶
Source code in vllm/compilation/fusion.py
register ¶
Source code in vllm/compilation/fusion.py
FusedRMSQuantKey ¶
Bases: NamedTuple
Named tuple for identifying the type of RMSNorm + quant fusion. quant: type of quantization fused_add: does the op also perform the residual add
Source code in vllm/compilation/fusion.py
RMSNormDynamicQuantPattern ¶
Bases: RMSNormQuantPattern
Source code in vllm/compilation/fusion.py
__init__ ¶
__init__(
epsilon: float,
quant_dtype: dtype,
group_shape: GroupShape = PER_TOKEN,
symmetric=True,
)
Source code in vllm/compilation/fusion.py
register ¶
Source code in vllm/compilation/fusion.py
RMSNormQuantFusionPass ¶
Bases: VllmPatternMatcherPass
This pass fuses rms_norm & quant custom ops into a fused rms_norm_quant op. It also supports fused_add_rms_norm.
Source code in vllm/compilation/fusion.py
patterns instance-attribute
¶
__init__ ¶
__init__(config: VllmConfig)
Source code in vllm/compilation/fusion.py
RMSNormQuantPattern ¶
Source code in vllm/compilation/fusion.py
__init__ ¶
__init__(epsilon: float, key: FusedRMSQuantKey)
Source code in vllm/compilation/fusion.py
RMSNormStaticQuantPattern ¶
Bases: RMSNormQuantPattern