dspy.BetterTogether¶
BetterTogether is a meta-optimizer proposed in the paper Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together by Dilara Soylu, Christopher Potts, and Omar Khattab. It combines prompt optimization and weight optimization (fine-tuning) by applying them in a configurable sequence, allowing a student program to iteratively improve both its prompts and model parameters. The core insight is that prompt and weight optimization can complement each other: prompt optimization can potentially discover effective task decompositions and reasoning strategies, while weight optimization can specialize the model to execute these patterns more efficiently. Using these approaches together in sequences (e.g., prompt optimization then weight optimization) may allow each to build on the improvements made by the other.
dspy.BetterTogether(metric: Callable, **optimizers: Teleprompter)
¶
Bases: Teleprompter
A meta-optimizer that combines prompt and weight optimization in configurable sequences.
BetterTogether is a meta-optimizer proposed in the paper Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together. It combines prompt optimization and weight optimization (fine-tuning) by applying them in a configurable sequence, allowing a student program to iteratively improve both its prompts and model parameters.
The core insight is that prompt and weight optimization can complement each other: prompt optimization can potentially discover effective task decompositions and reasoning strategies, while weight optimization can specialize the model to execute these patterns more efficiently. Using these approaches together in sequences (e.g., prompt optimization then weight optimization) may allow each to build on the improvements made by the other. Empirically, this approach often outperforms either strategy alone, even with state-of-the-art optimizers. For example, a Databricks case study shows that combining BetterTogether with GEPA and fine-tuning outperforms either approach alone.
The optimizer is initialized with a metric and custom optimizers. For example, you can combine
GEPA for prompt optimization with BootstrapFinetune for weight optimization:
BetterTogether(metric=metric, p=GEPA(...), w=BootstrapFinetune(...)). The compile()
method takes a student program, trainset, and strategy string where strategy keys correspond
to the optimizer names from initialization. It executes each optimizer in the specified sequence.
When a validation set is provided, the best performing program is returned; otherwise, the
latest program is returned. Note: Weight optimizers like BootstrapFinetune require student
programs to have LMs explicitly set (not relying on global dspy.settings.lm), and BetterTogether
mirrors this requirement for simplicity. Therefore we call set_lm before compiling.
>>> from dspy.teleprompt import GEPA, BootstrapFinetune
>>>
>>> # Combine GEPA for prompt optimization with BootstrapFinetune for weight optimization
>>> optimizer = BetterTogether(
... metric=metric,
... p=GEPA(metric=metric, auto="medium"),
... w=BootstrapFinetune(metric=metric)
... )
>>>
>>> student.set_lm(lm)
>>> compiled = optimizer.compile(
... student,
... trainset=trainset,
... valset=valset,
... strategy="p -> w"
... )
You can pass optimizer-specific arguments to each optimizer's compile() method using
optimizer_compile_args. This allows you to customize each optimizer's behavior:
>>> from dspy.teleprompt import MIPROv2
>>>
>>> # Use MIPROv2 for prompt optimization with custom parameters
>>> optimizer = BetterTogether(
... metric=metric,
... p=MIPROv2(metric=metric),
... w=BootstrapFinetune(metric=metric)
... )
>>>
>>> student.set_lm(lm)
>>> compiled = optimizer.compile(
... student,
... trainset=trainset,
... valset=valset,
... strategy="p -> w",
... optimizer_compile_args={
... "p": {"num_trials": 10, "max_bootstrapped_demos": 8}, # Configure MIPROv2's compile arguments
... }
... )
Since BetterTogether is a meta-optimizer that can run arbitrary optimizers in sequence, any sequence of optimizers can be combined together. The optimizer names used in the strategy string correspond to the keyword arguments specified in the constructor. For example, different prompt optimizers can be alternated multiple times (though note this is just an illustration of BetterTogether's flexibility, not a recommended configuration):
>>> from dspy.teleprompt import MIPROv2, GEPA
>>>
>>> # Chain two optimizers three times: MIPROv2 -> GEPA -> MIPROv2
>>> optimizer = BetterTogether(
... metric=metric,
... mipro=MIPROv2(metric=metric, auto="light"),
... gepa=GEPA(metric=metric, auto="light")
... )
>>>
>>> student.set_lm(lm)
>>> compiled = optimizer.compile(
... student,
... trainset=trainset,
... valset=valset,
... strategy="mipro -> gepa -> mipro"
... )
Note
Output Attributes: The returned program includes two additional attributes:
candidate_programs and flag_compilation_error_occurred. The candidate_programs
attribute is a list of dicts, each containing 'program', 'score', and 'strategy' (e.g.,
'', 'p', 'p -> w', 'p -> w -> p'), sorted by descending score (similar to
dspy.MIPROv2.candidate_programs). If any optimizer step fails,
flag_compilation_error_occurred is set to True and the best program found so far is
returned.
Model Lifecycle Management: BetterTogether automatically manages language model lifecycle (launching, killing, and relaunching after fine-tuning), which are no-ops for most API-based LMs. This is particularly important when using weight optimizers like BootstrapFinetune with local providers (e.g., dspy.LocalProvider), as it handles model initialization and cleanup between optimization steps.
Initialize BetterTogether with a metric and custom optimizers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
Callable
|
Evaluation metric function for scoring programs. Should accept
|
required |
**optimizers
|
Teleprompter
|
Custom optimizers as keyword arguments, where keys become the optimizer
names used in the strategy string. For example, |
{}
|
Examples:
>>> # Use custom optimizers
>>> from dspy.teleprompt import GEPA, BootstrapFinetune
>>> optimizer = BetterTogether(
... metric=metric,
... p=GEPA(metric=metric, auto="medium"),
... w=BootstrapFinetune(metric=metric)
... )
>>>
>>> # Use default optimizers
>>> optimizer = BetterTogether(metric=metric)
Source code in .venv/lib/python3.14/site-packages/dspy/teleprompt/bettertogether.py
Functions¶
compile(student: Module, *, trainset: list[Example], teacher: Module | list[Module] | None = None, valset: list[Example] | None = None, num_threads: int | None = None, max_errors: int | None = None, provide_traceback: bool | None = None, seed: int | None = None, valset_ratio: float = 0.1, shuffle_trainset_between_steps: bool = True, strategy: str = 'p -> w -> p', optimizer_compile_args: dict[str, dict[str, Any]] | None = None) -> Module
¶
Compile and optimize a student program using a sequence of optimization strategies.
Executes the optimizers specified in the strategy string sequentially, evaluating each intermediate result and returning the best performing program.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
student
|
Module
|
DSPy program to optimize. All predictors must have language models assigned. program.set_lm(lm) can be used to assign a language model to all modules of a program. |
required |
trainset
|
list[Example]
|
Training examples for optimization. Each optimizer receives the full trainset
(or a shuffled version if |
required |
teacher
|
Module | list[Module] | None
|
Optional teacher module(s) for bootstrapping. Can be a single module or list. Passed to optimizers. |
None
|
valset
|
list[Example] | None
|
Validation set for evaluating optimization steps. If not provided, a portion of
trainset is held out (controlled by |
None
|
num_threads
|
int | None
|
Number of parallel evaluation threads. Default is None, which means sequential evaluation. |
None
|
max_errors
|
int | None
|
Maximum errors to tolerate during evaluation. Defaults to
|
None
|
provide_traceback
|
bool | None
|
Whether to show detailed tracebacks for evaluation errors. |
None
|
seed
|
int | None
|
Random seed for reproducibility. Controls trainset shuffling and evaluation sampling. |
None
|
valset_ratio
|
float
|
Fraction of trainset to hold out as validation (range [0, 1)). For example, 0.1 holds out 10%. Set to 0 to skip validation. Default is 0.1. |
0.1
|
shuffle_trainset_between_steps
|
bool
|
Whether to shuffle trainset before each optimization step. Helps prevent overfitting to example ordering. Default is True. |
True
|
strategy
|
str
|
Sequence of optimizers to apply, separated by |
'p -> w -> p'
|
optimizer_compile_args
|
dict[str, dict[str, Any]] | None
|
Optional dict mapping optimizer keys to their |
None
|
Returns:
| Type | Description |
|---|---|
Module
|
Optimized student program with two additional attributes: |
Module
|
|
Module
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If trainset is empty, valset_ratio not in [0, 1), strategy is empty or contains invalid optimizer keys, or optimizer_compile_args contains invalid arguments. |
TypeError
|
If optimizer_compile_args contains a 'student' key (not allowed). |
Examples:
>>> optimizer = BetterTogether(
... metric=metric,
... p=GEPA(metric=metric),
... w=BootstrapFinetune(metric=metric)
... )
>>> student.set_lm(lm)
>>> compiled = optimizer.compile(
... student,
... trainset=trainset,
... valset=valset,
... strategy="p -> w"
... )
>>> print(f"Best score: {compiled.candidate_programs[0]['score']}")
Source code in .venv/lib/python3.14/site-packages/dspy/teleprompt/bettertogether.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 | |
get_params() -> dict[str, Any]
¶
Get the parameters of the teleprompter.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
The parameters of the teleprompter. |
Usage Examples¶
See BetterTogether usage tutorials in BetterTogether Tutorials.
How BetterTogether Works¶
BetterTogether executes optimizers in a configurable sequence, evaluating each intermediate result and returning the best performing program. Here's how it works:
1. Initialization with Custom Optimizers¶
When initialized, BetterTogether accepts any DSPy optimizers (Teleprompters) as keyword arguments. The keys become the optimizer names used in the strategy string:
optimizer = BetterTogether(
metric=metric,
p=GEPA(...), # 'p' can be used in strategy
w=BootstrapFinetune(...) # 'w' can be used in strategy
)
If no optimizers are provided, BetterTogether defaults to BootstrapFewShotWithRandomSearch (key: 'p') and BootstrapFinetune (key: 'w').
2. Strategy Execution¶
The strategy string defines the sequence of optimizers to apply. For example:
This strategy "p -> w -> p" means:
- Run prompt optimizer ('p')
- Run weight optimizer ('w') on the result
- Run prompt optimizer ('p') again on the result
At each step:
- The trainset is shuffled (if shuffle_trainset_between_steps=True)
- The optimizer is run on the current student program
- The result is evaluated on the validation set
- The candidate program and score are recorded
Since BetterTogether is a meta-optimizer, any sequence of optimizers can be combined. The optimizer names in the strategy string correspond to the keyword arguments from initialization. For example, you can sequence different prompt optimizers (note: this illustrates BetterTogether's flexibility, not necessarily a recommended configuration):
optimizer = BetterTogether(
metric=metric,
mipro=MIPROv2(metric=metric, auto="light"),
gepa=GEPA(metric=metric, auto="light")
)
compiled = optimizer.compile(
student,
trainset=trainset,
valset=valset,
strategy="mipro -> gepa -> mipro"
)
3. Validation and Program Selection¶
BetterTogether can use a validation set in three ways:
- Explicit valset: If
valsetis provided, it's used for evaluation - Auto-split: If
valset_ratio > 0, a portion of trainset is held out for validation - No validation: If both
valsetandvalset_ratioare None/0, no validation occurs
After all optimization steps complete, the best program is selected based on validation set availability:
- With validation: The program with the best score is returned (with earlier programs winning ties)
- Without validation: The latest program is returned
If an optimization step fails:
- The error is logged with full traceback
- Optimization stops early
- The best program found so far is returned
- flag_compilation_error_occurred is set to True
The returned program includes two additional attributes:
candidate_programs: List of all evaluated programs with their scores and strategies, sorted by score (best first). When an error occurs, this contains all successfully evaluated programs up to the point of failure.flag_compilation_error_occurred: Boolean indicating if any optimization step failed during compilation.
4. Further Details¶
Model Lifecycle Management: For local models (like LocalLM), BetterTogether automatically launches models before first use, kills them after optimization completes, and relaunches them after fine-tuning when model names change. These operations are no-ops for API-based LMs but needed for local model serving.
Custom Compile Arguments: You can pass custom compile arguments to specific optimizers using the optimizer_compile_args parameter:
- Override default arguments: Pass custom trainset/valset/teacher to specific optimizers
- Customize per optimizer: Each optimizer can have different compile arguments (e.g.,
num_trials,max_bootstrapped_demos)
Note: The student argument cannot be included in optimizer_compile_args - BetterTogether manages the student program for all optimizers. See the compile() method docstring for detailed argument documentation.
Best Practices¶
When to Use BetterTogether¶
BetterTogether is the right optimizer when:
- You want to squeeze every bit of performance: Prompt optimization is often the best bang for buck, quickly discovering high-level strategies. When opportunity allows, adding weight optimization on top compounds these gains, yielding benefits that exceed either approach alone.
- You have fine-tuning capabilities: Weight optimizers like
BootstrapFinetunerequire LMs with a fine-tuning interface. Currently supported:LocalProvider,DatabricksProvider, andOpenAIProvider. You can extend theProviderclass for custom use cases, or use BetterTogether to combine prompt optimizers only.
The Databricks case study demonstrates this effectiveness. They evaluated on IE Bench—a comprehensive suite spanning enterprise domains (finance, legal, commerce, healthcare) with complex challenges: 100+ page documents, 70+ extraction fields, and hierarchical schemas. Using GPT-4.1:
- SFT alone: +1.9 points over baseline
- GEPA alone: +2.1 points over baseline (slightly exceeding SFT)
- GEPA + SFT (BetterTogether): +4.8 points over baseline
This demonstrates that prompt optimization can match or surpass supervised fine-tuning, and combining these techniques yields strong compounding benefits.
Common Strategies and Optimizers¶
Common strategies:
"p -> w": Optimize prompts first, then fine-tune (simple and often effective)"p -> w -> p": Optimize prompts, fine-tune, then optimize prompts again (can build on fine-tuning improvements)"w -> p": Fine-tune first, then optimize prompts
Example optimizer combinations:
- GEPA + BootstrapFinetune: Prompt optimization with fine-tuning
- MIPROv2 + BootstrapFinetune: Prompt optimization with fine-tuning
- Multiple prompt optimizers: Alternate between different prompt optimization approaches (experimental)
Further Reading¶
- BetterTogether Paper: arxiv:2407.10930
- Databricks Case Study - Real-world application combining BetterTogether with GEPA
- DSPy Optimizers Overview