Language Models

The first step in any DSPy code is to set up your language model. For example, you can configure OpenAI's GPT-4o-mini as your default LM as follows.

# Authenticate via `OPENAI_API_KEY` env: import os; os.environ['OPENAI_API_KEY'] = 'here'
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

A few different LMs

OpenAIGemini (AI Studio)AnthropicDatabricksLocal LMs on a GPU serverLocal LMs on your laptopOther providers

You can authenticate by setting the OPENAI_API_KEY env variable or passing api_key below.

import dspy
lm = dspy.LM('openai/gpt-4o-mini', api_key='YOUR_OPENAI_API_KEY')
dspy.configure(lm=lm)

You can authenticate by setting the GEMINI_API_KEY env variable or passing api_key below.

import dspy
lm = dspy.LM('gemini/gemini-2.5-pro-preview-03-25', api_key='GEMINI_API_KEY')
dspy.configure(lm=lm)

You can authenticate by setting the ANTHROPIC_API_KEY env variable or passing api_key below.

import dspy
lm = dspy.LM('anthropic/claude-3-opus-20240229', api_key='YOUR_ANTHROPIC_API_KEY')
dspy.configure(lm=lm)

If you're on the Databricks platform, authentication is automatic via their SDK. If not, you can set the env variables DATABRICKS_API_KEY and DATABRICKS_API_BASE, or pass api_key and api_base below.

import dspy
lm = dspy.LM('databricks/databricks-meta-llama-3-1-70b-instruct')
dspy.configure(lm=lm)

First, install SGLang and launch its server with your LM.

> pip install "sglang[all]"
> pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ 

> CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --port 7501 --model-path meta-llama/Meta-Llama-3-8B-Instruct

Then, connect to it from your DSPy code as an OpenAI-compatible endpoint.

lm = dspy.LM("openai/meta-llama/Meta-Llama-3-8B-Instruct",
                 api_base="http://localhost:7501/v1",  # ensure this points to your port
                 api_key="", model_type='chat')
dspy.configure(lm=lm)

First, install Ollama and launch its server with your LM.

> curl -fsSL https://ollama.ai/install.sh | sh
> ollama run llama3.2:1b

Then, connect to it from your DSPy code.

import dspy
lm = dspy.LM('ollama_chat/llama3.2', api_base='http://localhost:11434', api_key='')
dspy.configure(lm=lm)

In DSPy, you can use any of the dozens of LLM providers supported by LiteLLM. Simply follow their instructions for which {PROVIDER}_API_KEY to set and how to write pass the {provider_name}/{model_name} to the constructor.

Some examples:

anyscale/mistralai/Mistral-7B-Instruct-v0.1, with ANYSCALE_API_KEY
together_ai/togethercomputer/llama-2-70b-chat, with TOGETHERAI_API_KEY
sagemaker/<your-endpoint-name>, with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION_NAME
azure/<your_deployment_name>, with AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION, and the optional AZURE_AD_TOKEN and AZURE_API_TYPE as environment variables. If you are initiating external models without setting environment variables, use the following: lm = dspy.LM('azure/<your_deployment_name>', api_key = 'AZURE_API_KEY' , api_base = 'AZURE_API_BASE', api_version = 'AZURE_API_VERSION')

If your provider offers an OpenAI-compatible endpoint, just add an openai/ prefix to your full model name.

import dspy
lm = dspy.LM('openai/your-model-name', api_key='PROVIDER_API_KEY', api_base='YOUR_PROVIDER_URL')
dspy.configure(lm=lm)

If you run into errors, please refer to the LiteLLM Docs to verify if you are using the same variable names/following the right procedure.

Calling the LM directly.

It's easy to call the lm you configured above directly. This gives you a unified API and lets you benefit from utilities like automatic caching.

lm("Say this is a test!", temperature=0.7)  # => ['This is a test!']
lm(messages=[{"role": "user", "content": "Say this is a test!"}])  # => ['This is a test!']

Using the LM with DSPy modules.

Idiomatic DSPy involves using modules, which we discuss in the next guide.

# Define a module (ChainOfThought) and assign it a signature (return an answer, given a question).
qa = dspy.ChainOfThought('question -> answer')

# Run with the default LM configured with `dspy.configure` above.
response = qa(question="How many floors are in the castle David Gregory inherited?")
print(response.answer)

Possible Output:

The castle David Gregory inherited has 7 floors.

Using multiple LMs.

You can change the default LM globally with dspy.configure or change it inside a block of code with dspy.context.

Tip

Using dspy.configure and dspy.context is thread-safe!

dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
response = qa(question="How many floors are in the castle David Gregory inherited?")
print('GPT-4o-mini:', response.answer)

with dspy.context(lm=dspy.LM('openai/gpt-3.5-turbo')):
    response = qa(question="How many floors are in the castle David Gregory inherited?")
    print('GPT-3.5-turbo:', response.answer)

Possible Output:

GPT-4o-mini: The number of floors in the castle David Gregory inherited cannot be determined with the information provided.
GPT-3.5-turbo: The castle David Gregory inherited has 7 floors.

Configuring LM generation.

For any LM, you can configure any of the following attributes at initialization or in each subsequent call.

gpt_4o_mini = dspy.LM('openai/gpt-4o-mini', temperature=0.9, max_tokens=3000, stop=None, cache=False)

By default LMs in DSPy are cached. If you repeat the same call, you will get the same outputs. But you can turn off caching by setting cache=False.

Inspecting output and usage metadata.

Every LM object maintains the history of its interactions, including inputs, outputs, token usage (and $$$ cost), and metadata.

len(lm.history)  # e.g., 3 calls to the LM

lm.history[-1].keys()  # access the last call to the LM, with all metadata

Output:

dict_keys(['prompt', 'messages', 'kwargs', 'response', 'outputs', 'usage', 'cost'])

Advanced: Building custom LMs and writing your own Adapters.

Though rarely needed, you can write custom LMs by inheriting from dspy.BaseLM. Another advanced layer in the DSPy ecosystem is that of adapters, which sit between DSPy signatures and LMs. A future version of this guide will discuss these advanced features, though you likely don't need them.