Language Models
The first step in any DSPy code is to set up your language model. For example, you can configure OpenAI's GPT-4o-mini as your default LM as follows.
A few different LMs
You can authenticate by setting the OPENAI_API_KEY
env variable or passing api_key
below.
You can authenticate by setting the ANTHROPIC_API_KEY env variable or passing api_key
below.
If you're on the Databricks platform, authentication is automatic via their SDK. If not, you can set the env variables DATABRICKS_API_KEY
and DATABRICKS_API_BASE
, or pass api_key
and api_base
below.
First, install SGLang and launch its server with your LM.
> pip install "sglang[all]"
> pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
> CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --port 7501 --model-path meta-llama/Meta-Llama-3-8B-Instruct
Then, connect to it from your DSPy code as an OpenAI-compatible endpoint.
First, install Ollama and launch its server with your LM.
Then, connect to it from your DSPy code.
In DSPy, you can use any of the dozens of LLM providers supported by LiteLLM. Simply follow their instructions for which {PROVIDER}_API_KEY
to set and how to write pass the {provider_name}/{model_name}
to the constructor.
Some examples:
anyscale/mistralai/Mistral-7B-Instruct-v0.1
, withANYSCALE_API_KEY
together_ai/togethercomputer/llama-2-70b-chat
, withTOGETHERAI_API_KEY
sagemaker/<your-endpoint-name>
, withAWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
, andAWS_REGION_NAME
azure/<your_deployment_name>
, withAZURE_API_KEY
,AZURE_API_BASE
,AZURE_API_VERSION
, and the optionalAZURE_AD_TOKEN
andAZURE_API_TYPE
If your provider offers an OpenAI-compatible endpoint, just add an openai/
prefix to your full model name.
Calling the LM directly.
It's easy to call the lm
you configured above directly. This gives you a unified API and lets you benefit from utilities like automatic caching.
Using the LM with DSPy modules.
Idiomatic DSPy involves using modules, which we discuss in the next guide.
Using multiple LMs.
You can change the default LM globally with dspy.configure
or change it inside a block of code with dspy.context
.
Tip
Using dspy.configure
and dspy.context
is thread-safe!
GPT-4o: The number of floors in the castle David Gregory inherited cannot be determined with the information provided.
GPT-3.5-turbo: The castle David Gregory inherited has 7 floors.
Configuring LM generation.
For any LM, you can configure any of the following attributes at initialization or in each subsequent call.
By default LMs in DSPy are cached. If you repeat the same call, you will get the same outputs. But you can turn off caching by setting cache=False
.
Inspecting output and usage metadata.
Every LM object maintains the history of its interactions, including inputs, outputs, token usage (and $$$ cost), and metadata.
Output:
Advanced: Building customer LMs and writing your own Adapters.
Though rarely needed, you can write custom LMs by inheriting from dspy.BaseLM
. Another advanced layer in the DSPy ecosystem is that of adapters, which sit between DSPy signatures and LMs. A future version of this guide will discuss these advanced features, though you likely don't need them.