Signatures
When we assign tasks to LMs in DSPy, we specify the behavior we need as a Signature.
A signature is a declarative specification of input/output behavior of a DSPy module. Signatures allow you to tell the LM what it needs to do, rather than specify how we should ask the LM to do it.
You're probably familiar with function signatures, which specify the input and output arguments and their types. DSPy signatures are similar, but with a couple of differences. While typical function signatures just describe things, DSPy Signatures declare and initialize the behavior of modules. Moreover, the field names matter in DSPy Signatures. You express semantic roles in plain English: a question
is different from an answer
, a sql_query
is different from python_code
.
Why should I use a DSPy Signature?
For modular and clean code, in which LM calls can be optimized into high-quality prompts (or automatic finetunes). Most people coerce LMs to do tasks by hacking long, brittle prompts. Or by collecting/generating data for fine-tuning. Writing signatures is far more modular, adaptive, and reproducible than hacking at prompts or finetunes. The DSPy compiler will figure out how to build a highly-optimized prompt for your LM (or finetune your small LM) for your signature, on your data, and within your pipeline. In many cases, we found that compiling leads to better prompts than humans write. Not because DSPy optimizers are more creative than humans, but simply because they can try more things and tune the metrics directly.
Inline DSPy Signatures
Signatures can be defined as a short string, with argument names and optional types that define semantic roles for inputs/outputs.
-
Question Answering:
"question -> answer"
, which is equivalent to"question: str -> answer: str"
as the default type is alwaysstr
-
Sentiment Classification:
"sentence -> sentiment: bool"
, e.g.True
if positive -
Summarization:
"document -> summary"
Your signatures can also have multiple input/output fields with types:
-
Retrieval-Augmented Question Answering:
"context: list[str], question: str -> answer: str"
-
Multiple-Choice Question Answering with Reasoning:
"question, choices: list[str] -> reasoning: str, selection: int"
Tip: For fields, any valid variable names work! Field names should be semantically meaningful, but start simple and don't prematurely optimize keywords! Leave that kind of hacking to the DSPy compiler. For example, for summarization, it's probably fine to say "document -> summary"
, "text -> gist"
, or "long_context -> tldr"
.
Example A: Sentiment Classification
sentence = "it's a charming and often affecting journey." # example from the SST-2 dataset.
classify = dspy.Predict('sentence -> sentiment: bool') # we'll see an example with Literal[] later
classify(sentence=sentence).sentiment
Example B: Summarization
# Example from the XSum dataset.
document = """The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."""
summarize = dspy.ChainOfThought('document -> summary')
response = summarize(document=document)
print(response.summary)
The 21-year-old Lee made seven appearances and scored one goal for West Ham last season. He had loan spells in League One with Blackpool and Colchester United, scoring twice for the latter. He has now signed a contract with Barnsley, but the length of the contract has not been revealed.
Many DSPy modules (except dspy.Predict
) return auxiliary information by expanding your signature under the hood.
For example, dspy.ChainOfThought
also adds a reasoning
field that includes the LM's reasoning before it generates the output summary
.
Reasoning: We need to highlight Lee's performance for West Ham, his loan spells in League One, and his new contract with Barnsley. We also need to mention that his contract length has not been disclosed.
Class-based DSPy Signatures
For some advanced tasks, you need more verbose signatures. This is typically to:
-
Clarify something about the nature of the task (expressed below as a
docstring
). -
Supply hints on the nature of an input field, expressed as a
desc
keyword argument fordspy.InputField
. -
Supply constraints on an output field, expressed as a
desc
keyword argument fordspy.OutputField
.
Example C: Classification
from typing import Literal
class Emotion(dspy.Signature):
"""Classify emotion."""
sentence: str = dspy.InputField()
sentiment: Literal['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'] = dspy.OutputField()
sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me" # from dair-ai/emotion
classify = dspy.Predict(Emotion)
classify(sentence=sentence)
Tip: There's nothing wrong with specifying your requests to the LM more clearly. Class-based Signatures help you with that. However, don't prematurely tune the keywords of your signature by hand. The DSPy optimizers will likely do a better job (and will transfer better across LMs).
Example D: A metric that evaluates faithfulness to citations
class CheckCitationFaithfulness(dspy.Signature):
"""Verify that the text is based on the provided context."""
context: str = dspy.InputField(desc="facts here are assumed to be true")
text: str = dspy.InputField()
faithfulness: bool = dspy.OutputField()
evidence: dict[str, list[str]] = dspy.OutputField(desc="Supporting evidence for claims")
context = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."
text = "Lee scored 3 goals for Colchester United."
faithfulness = dspy.ChainOfThought(CheckCitationFaithfulness)
faithfulness(context=context, text=text)
Prediction(
reasoning="Let's check the claims against the context. The text states Lee scored 3 goals for Colchester United, but the context clearly states 'He scored twice for the U's'. This is a direct contradiction.",
faithfulness=False,
evidence={'goal_count': ["scored twice for the U's"]}
)
Using signatures to build modules & compiling them
While signatures are convenient for prototyping with structured inputs/outputs, that's not the only reason to use them!
You should compose multiple signatures into bigger DSPy modules and compile these modules into optimized prompts and finetunes.