vllm.transformers_utils.configs.radio ¶
Radio vision model configuration
VIT_TIMM_DIM_BY_NAME module-attribute
¶
VIT_TIMM_DIM_BY_NAME: dict[
str, tuple[int, int, int, int]
] = {
"vit_small_patch16_224": (384, 12, 6, 1536),
"vit_base_patch16_224": (768, 12, 12, 3072),
"vit_large_patch16_224": (1024, 24, 16, 4096),
"vit_huge_patch16_224": (1280, 32, 16, 5120),
}
RadioConfig ¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of a Radio vision model. It is used to instantiate a Radio model according to the specified arguments, defining the model architecture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name | str | Name of the vision transformer model (e.g., "vit_base_patch16_224"). Used to determine architecture dimensions from | required |
image_size | int | The size (resolution) of each image. | 224 |
patch_size | int | The size (resolution) of each patch. | 16 |
qkv_bias | bool | Whether to add a bias to the queries, keys and values. | True |
qk_normalization | bool | Whether to apply normalization to queries and keys. | False |
norm_type | str | The normalization type to use. | 'layer_norm' |
layer_norm_eps | float | The epsilon used by the layer normalization layers. | 1e-06 |
initializer_factor | float | A factor for initializing all weight matrices. | 1.0 |
hidden_act | str | The non-linear activation function in the encoder. | 'gelu' |
max_img_size | int | Maximum image size for position embeddings. | 2048 |
norm_mean | Union[tuple[float, float, float], list] | Mean values for image normalization (RGB channels). Defaults to (0.48145466, 0.4578275, 0.40821073)). | OPENAI_CLIP_MEAN |
norm_std | Union[tuple[float, float, float], list] | Standard deviation values for image normalization (RGB channels). Defaults to (0.26862954, 0.26130258, 0.27577711)). | OPENAI_CLIP_STD |
reg_tokens | Optional[int] | Number of register tokens to use. | None |
Source code in vllm/transformers_utils/configs/radio.py
norm_mean instance-attribute
¶
norm_mean = (
list(norm_mean)
if isinstance(norm_mean, (tuple, list))
else norm_mean
)
norm_std instance-attribute
¶
norm_std = (
list(norm_std)
if isinstance(norm_std, (tuple, list))
else norm_std
)
__init__ ¶
__init__(
model_name: str,
image_size: int = 224,
patch_size: int = 16,
qkv_bias: bool = True,
qk_normalization: bool = False,
norm_type: str = "layer_norm",
layer_norm_eps: float = 1e-06,
initializer_factor: float = 1.0,
hidden_act: str = "gelu",
max_img_size: int = 2048,
norm_mean: Union[
tuple[float, float, float], list
] = OPENAI_CLIP_MEAN,
norm_std: Union[
tuple[float, float, float], list
] = OPENAI_CLIP_STD,
reg_tokens: Optional[int] = None,
**kwargs,
)