vllm.transformers_utils.configs.midashenglm ¶
DashengConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/midashenglm.py
__init__ ¶
__init__(
embed_dim: int = 768,
outputdim: int = 527,
patch_size: Union[int, tuple[int, int]] = 16,
patch_stride: Union[int, tuple[int, int]] = 16,
input_channels: int = 1,
target_length: int = 1012,
depth: int = 12,
num_heads: int = 12,
mlp_ratio: float = 4.0,
qkv_bias: bool = True,
init_values: Optional[float] = None,
drop_rate: float = 0.0,
attn_drop_rate: float = 0.0,
f_min: float = 0.0,
f_max: float = 8000.0,
center: bool = True,
win_length: int = 512,
hop_length: int = 160,
sample_rate: int = 16000,
n_fft: int = 512,
n_mels: int = 64,
**kwargs,
)
Source code in vllm/transformers_utils/configs/midashenglm.py
MiDashengLMConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/midashenglm.py
audio_encoder_config instance-attribute
¶
audio_encoder_config = DashengConfig(
**(audio_encoder_config or {})
)
text_config instance-attribute
¶
__init__ ¶
__init__(
audio_encoder_config: Optional[dict] = None,
subsample_factor: int = 5,
text_config: Optional[dict] = None,
audio_token_id: Optional[int] = None,
**kwargs,
)