vllm.model_executor.layers.rotary_embedding.common ¶
_flashinfer_rotary_embedding ¶
_flashinfer_rotary_embedding(
positions: Tensor,
query: Tensor,
key: Tensor,
head_size: int,
cos_sin_cache: Tensor,
is_neox: bool,
) -> None
Custom op wrapper for flashinfer's rotary embedding.
This is an in-place operation that modifies query and key tensors directly.
Source code in vllm/model_executor/layers/rotary_embedding/common.py
_flashinfer_rotary_embedding_fake ¶
apply_rotary_emb_dispatch ¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x | Tensor | [num_tokens, num_heads, head_size] | required |
cos | Tensor | [num_tokens, head_size // 2] | required |
sin | Tensor | [num_tokens, head_size // 2] | required |
is_neox_style | bool | Whether to use the Neox-style or GPT-J-style rotary positional embeddings. | required |
Source code in vllm/model_executor/layers/rotary_embedding/common.py
apply_rotary_emb_torch ¶
Source code in vllm/model_executor/layers/rotary_embedding/common.py
rotate_gptj ¶
rotate_neox ¶
yarn_find_correction_dim ¶
yarn_find_correction_dim(
num_rotations: int,
dim: int,
base: float = 10000,
max_position_embeddings: int = 2048,
) -> float
Source code in vllm/model_executor/layers/rotary_embedding/common.py
yarn_find_correction_range ¶
yarn_find_correction_range(
low_rot: int,
high_rot: int,
dim: int,
base: float = 10000,
max_position_embeddings: int = 2048,
) -> tuple[int, int]