# Modules: composing your own

## Intent

A `dspy.Module` is the unit of composition: subclass it, define some sub-modules in `__init__`, and write a `forward()` that pipes inputs through them. Modules are how custom logic — control flow, branching, multiple LM calls, post-processing — fits into the DSPy ecosystem in a way that optimizers, save/load, settings overrides, and parallel execution all keep working.

Read this when you’ve outgrown `Predict` / `ChainOfThought` / `ReAct` on their own and want to compose them, or when you want to understand what `__call__` does to your `forward()` behind the scenes.

## Design decisions

### 1. Subclass and override `forward()`

That’s the contract: define sub-modules in `__init__`, do the work in `forward`, let `__call__` wrap it. The split exists because the framework needs predictable places to attach infrastructure. `__init__` is where sub-modules are visible for tree-walking and serialization; `forward` is where the runtime concerns (settings stack, usage tracking, callbacks) wrap your code. Mixing the two muddies what the optimizer sees and what `dump_state` saves.

### 2. Sub-modules are discovered by walking `self.__dict__`, not registered

Assignment is the registration. There’s no `register_module()` call and no inheritance trick. The tree-walk methods (`named_predictors`, `named_parameters`, `named_sub_modules`) inspect `self.__dict__` at call time and recurse into lists and dicts as well as direct attributes. Discovery being lazy means you don’t have to remember to register, but it also means a sub-module hidden inside a closure or a non-walked container is invisible to optimizers and to `dump_state`.

### 3. `Parameter` is the marker that flags an attribute as optimizer-visible

When you assign `self.max_iters = 5`, the tree walk skips it. When you assign `self.predict = dspy.Predict(...)`, it’s a `Parameter` and gets included. This keeps `dump_state` small and the optimizer’s search space focused — only the things that should learn are exposed. `Predict` is a `Parameter`; `Retrieve` is too; plain Python attributes are not.

### 4. `__call__` wraps `forward()` with infrastructure

Going through `__call__` is what pushes `self` onto `settings.caller_modules`, opens a usage-tracking context when `settings.track_usage` is true, and runs the callback decorator chain. Calling `forward()` directly bypasses all of that — usage doesn’t get tallied, callbacks don’t fire, the caller stack doesn’t update. The deprecation warning on direct `forward` calls is there for that reason. Always invoke a module as `module(...)`.

### 5. Async sits alongside sync via `acall()` / `aforward()`

`acall` mirrors `__call__`‘s infrastructure for async; `aforward` is the async counterpart to `forward`. A module that doesn’t define `aforward` falls back to running the sync `forward` in a worker thread via `asyncify`. That gives you a path from “sync only” to “fully async” without rewriting the world — async one sub-module at a time, leave the rest as-is.

### 6. Settings propagate via context, not constructor args

Sub-modules read `dspy.settings.lm` (and `.adapter`, `.callbacks`) at call time. The consequence: `with dspy.context(lm=other_lm): result = my_module(...)` swaps the LM for every sub-module inside, no rewiring needed. The price is implicit state — a sub-module’s behavior depends on whoever calls it, not only on what was assigned in `__init__`. The composition story is worth the tradeoff.

### 7. `_compiled=True` freezes a sub-module’s parameters from optimizers

`named_parameters` checks the `_compiled` flag and skips compiled children. This lets you optimize a sub-module on its own, save the result, drop it into a larger program, and run a fresh optimizer on the outer program without disturbing the inner one. You rarely set this by hand; teleprompts set it when they finish compiling.

### 8. The metaclass `ProgramMeta` enforces `super().__init__()`

`ProgramMeta` installs a fallback `_base_init` that runs before your `__init__`, so the bookkeeping (history, callbacks, the `_compiled` flag) is in place even if your subclass forgets `super().__init__()`. You won’t subclass or instantiate the metaclass; it’s a guardrail. When an error trace lands inside it, the cause is usually a missing `super().__init__()` call.

### 9. `deepcopy()` is hand-written

Modules can hold closures and callbacks that confuse Python’s default `copy.deepcopy`. `Module.deepcopy()` tries `copy.deepcopy` first and falls back to attribute-by-attribute copying on `TypeError`. Optimizers use this to fork candidate programs, and the fallback is what makes that reliable across the variety of things a `Module` subclass might hold.

## API walkthrough

Grouped by what you’re trying to do.

### Defining a module

**`dspy.Module(callbacks=None)`**\
Subclass it. The base stores `callbacks`, `history`, and the `_compiled` flag — nothing more. The interesting behavior is in `__call__` and the tree-walk methods.

**`Module.__call__(*args, **kwargs)`**\
Pushes `self` onto `settings.caller_modules`, opens a usage-tracking context if `settings.track_usage`, runs `forward` (decorated with `@with_callbacks`), and returns the `Prediction`. If `forward` returns something other than a `Prediction`, you’ll get a warning — usage tracking relies on the `Prediction` return type. Direct calls to `module.forward(...)` log a deprecation warning.

**`Module.forward(**kwargs)`**\
What you override. Accept signature inputs as keyword arguments, do your work, return a `Prediction`. Don’t put settings reads in `__init__` — read them inside `forward` so the current LM and adapter are picked up at call time, not at construction time.

**`Module.acall(*args, **kwargs)` / `Module.aforward(**kwargs)`**\
Async variants. `acall` is the public entry; `aforward` is what you override. If you don’t define `aforward`, `acall` falls back to running `forward` inside `asyncify`. Useful when one sub-module is async-only — you can `await` it from `aforward` without making the rest of the program async.

**`ProgramMeta`**\
The metaclass on `Module`. You won’t subclass or instantiate it. It installs `_base_init` as a fallback so `super().__init__()` isn’t strictly required. When an error trace lands inside it, the cause is usually a missing `super().__init__()`.

### Walking the module tree

All of these rely on the same primitive: walk `self.__dict__`, recurse into lists and dicts, yield anything that matches the target type.

**`Module.named_predictors()` / `Module.predictors()`**\
Discovers every `Predict` instance under this module. Used by optimizers to find what to optimize. Order is the order of discovery (depth-first through `__dict__`).

**`Module.named_parameters()` / `Module.parameters()`**\
Discovers every `Parameter`. Honors `_compiled`: a sub-module marked compiled is skipped, and the optimizer leaves its parameters alone.

**`Module.named_sub_modules(type_=BaseModule, skip_compiled=False)`**\
The generator the others build on. Use it directly when you want a custom walk — every `ReAct` instance, every sub-module of a specific user class, anything you can filter by `type_`.

**`Module.map_named_predictors(func)`**\
Applies `func` to each predictor and replaces it in place. The hook teleprompts use when swapping in optimized predictors. You’ll rarely need it directly.

### Setting and getting the LM

**`Module.set_lm(lm)`**\
Walks every predictor in the tree and assigns `lm`. Useful when you’ve built a program with one LM and want to re-evaluate it on another without changing settings globally.

**`Module.get_lm()` → `LM`**\
Returns the LM if every predictor agrees on one. Raises `ValueError` on a mix. A sanity check before fine-tuning or saving, when you want to confirm the whole program is on one model.

### State and copies

The save/load surface. The full lifecycle — JSON vs PKL, full-program vs state-only, `save_program=True` — lives in [Saving and loading](https://dspy.ai/diving-deeper/saving-and-loading/index.md); the methods here are what that page is about.

**`Module.dump_state(json_mode=True)` / `Module.load_state(state, allow_unsafe_lm_state=False)`**\
Round-trips the optimizer-visible state: per-predictor demos, traces, signature instructions, LM config. Not the Python class itself.

**`Module.save(path, save_program=False, modules_to_serialize=None)` / `Module.load(path, allow_pickle=False, allow_unsafe_lm_state=False)`**\
The user-facing pair. `save_program=True` cloudpickles the whole module to a directory; the default writes state-only JSON or PKL.

**`Module.deepcopy()`**\
Tries `copy.deepcopy`, falls back to attribute-by-attribute copy on `TypeError`. Optimizers use this to fork candidate programs.

**`Module.reset_copy()`**\
`deepcopy()` then `reset()` on every parameter. Use it for a fresh program — same structure, no optimizer state.

### Batch execution

**`Module.batch(examples, num_threads=None, max_errors=None, return_failed_examples=False, ...)`**\
Runs the module against many examples in parallel. Internally wraps `dspy.Parallel`, which snapshots `dspy.settings` and re-applies the snapshot inside each worker thread (so an outer `dspy.context(lm=...)` is honored). Returns predictions in input order. Use it for evaluation, demo collection, and any embarrassingly-parallel inference.

### Debugging

**`Module.history` / `Module.inspect_history(n=1, file=None)`**\
`history` is the list of LM call records attached to this module. `inspect_history` pretty-prints the last `n`. Both are excluded from pickled and JSON state, so a multi-megabyte history doesn’t ride along with your saved program.

**`Module.callbacks`**\
Registered callback handlers. Also excluded from saved state.

### The return type

**`dspy.Prediction(**fields)`**\
A field container. Access with `result.haiku` or `result["haiku"]`. Inherits `dspy.Example`‘s storage but drops the input/output split.

**`dspy.Prediction.from_completions(list_or_dict, signature=None)`**\
Builds a Prediction from multiple LM completions. `BestOfN`, `MultiChainComparison`, and any module that samples more than once uses this.

**`Prediction.completions`**\
A `Completions` object when the prediction came from `n > 1` samples; `pred.completions[i]` returns the i-th completion as its own `Prediction`. `None` for single-sample calls.

**`Prediction.get_lm_usage()` / `Prediction.set_lm_usage(tokens_dict)`**\
Token-usage accounting. Populated when the prediction ran under `dspy.settings(track_usage=True)`.

**Comparison to `dspy.Example`.** Same storage layer; different intent. `Example` is for training data — it carries `_demos` and `_input_keys`, and `.with_inputs` returns a filtered copy. `Prediction` is for module outputs — no input/output split, plus completions and usage. Use `Example` for what goes in, `Prediction` for what comes out.

## Cross-links

- Each built-in module subclass has its own DD page: see the predictor zoo, ReAct, and the optimizer-specific pages.
- [Saving and loading](https://dspy.ai/diving-deeper/saving-and-loading/index.md) covers the persistence story in detail.
- [Settings and context()](https://dspy.ai/diving-deeper/settings-and-context/index.md) covers how LM / adapter / callback overrides propagate into a module’s sub-modules.
