vllm.entrypoints.cli.serve ¶
DESCRIPTION module-attribute
¶
DESCRIPTION = "Launch a local OpenAI-compatible API server to serve LLM\ncompletions via HTTP. Defaults to Qwen/Qwen3-0.6B if no model is specified.\n\nSearch by using: `--help=<ConfigGroup>` to explore options by section (e.g.,\n--help=ModelConfig, --help=Frontend)\n Use `--help=all` to show all available flags at once.\n"
ServeSubcommand ¶
Bases: CLISubcommand
The serve
subcommand for the vLLM CLI.
Source code in vllm/entrypoints/cli/serve.py
cmd staticmethod
¶
cmd(args: Namespace) -> None
Source code in vllm/entrypoints/cli/serve.py
subparser_init ¶
subparser_init(
subparsers: _SubParsersAction,
) -> FlexibleArgumentParser
Source code in vllm/entrypoints/cli/serve.py
cmd_init ¶
cmd_init() -> list[CLISubcommand]
run_api_server_worker_proc ¶
run_api_server_worker_proc(
listen_address,
sock,
args,
client_config=None,
**uvicorn_kwargs,
) -> None
Entrypoint for individual API server worker processes.
Source code in vllm/entrypoints/cli/serve.py
run_headless ¶
run_headless(args: Namespace)
Source code in vllm/entrypoints/cli/serve.py
run_multi_api_server ¶
run_multi_api_server(args: Namespace)