Llama Stack¶
vLLM is also available via Llama Stack.
To install Llama Stack, run
Inference using OpenAI-Compatible API¶
Then start the Llama Stack server and configure it to point to your vLLM server with the following settings:
Please refer to this guide for more details on this remote vLLM provider.
Inference using Embedded vLLM¶
An inline provider is also available. This is a sample of configuration using that method: