Tutorial: Math Reasoning¶
Let's walk through a quick example of setting up a dspy.ChainOfThought
module and optimizing it for answering algebra questions.
Install the latest DSPy via pip install -U dspy
and follow along.
Let's tell DSPy that we will use OpenAI's gpt-4o-mini
in our modules. To authenticate, DSPy will look into your OPENAI_API_KEY
. You can easily swap this out for other providers or local models.
import dspy
gpt4o_mini = dspy.LM('openai/gpt-4o-mini', max_tokens=2000)
gpt4o = dspy.LM('openai/gpt-4o', max_tokens=2000)
dspy.configure(lm=gpt4o_mini) # we'll use gpt-4o-mini as the default LM, unless otherwise specified
Next, let's load some data examples from the MATH benchmark. We'll use a training split for optimization and evaluate it on a held-out dev set.
Please note that the following step will require:
%pip install git+https://github.com/hendrycks/math.git
from dspy.datasets import MATH
dataset = MATH(subset='algebra')
print(len(dataset.train), len(dataset.dev))
350 350
Let's inspect one example from the training set.
example = dataset.train[0]
print("Question:", example.question)
print("Answer:", example.answer)
Question: The doctor has told Cal O'Ree that during his ten weeks of working out at the gym, he can expect each week's weight loss to be $1\%$ of his weight at the end of the previous week. His weight at the beginning of the workouts is $244$ pounds. How many pounds does he expect to weigh at the end of the ten weeks? Express your answer to the nearest whole number. Answer: 221
Now let's define our module. It's extremely simple: just a chain-of-thought step that takes a question
and produces an answer
.
module = dspy.ChainOfThought("question -> answer")
module(question=example.question)
Prediction( reasoning="Cal O'Ree's weight loss each week is $1\\%$ of his weight at the end of the previous week. This means that at the end of each week, he retains $99\\%$ of his weight from the previous week. \n\nIf we denote his weight at the beginning as \\( W_0 = 244 \\) pounds, then his weight at the end of week \\( n \\) can be expressed as:\n\\[\nW_n = W_{n-1} \\times 0.99\n\\]\nThis can be simplified to:\n\\[\nW_n = W_0 \\times (0.99)^n\n\\]\nAfter 10 weeks, his weight will be:\n\\[\nW_{10} = 244 \\times (0.99)^{10}\n\\]\n\nNow, we calculate \\( (0.99)^{10} \\):\n\\[\n(0.99)^{10} \\approx 0.904382\n\\]\n\nNow, we can calculate his expected weight after 10 weeks:\n\\[\nW_{10} \\approx 244 \\times 0.904382 \\approx 220.5\n\\]\n\nRounding to the nearest whole number, Cal O'Ree can expect to weigh approximately \\( 221 \\) pounds at the end of the ten weeks.", answer='221' )
Next, let's set up an evaluator for the zero-shot module above, before prompt optimization.
THREADS = 24
kwargs = dict(num_threads=THREADS, display_progress=True, display_table=5)
evaluate = dspy.Evaluate(devset=dataset.dev, metric=dataset.metric, **kwargs)
evaluate(module)
Average Metric: 259.00 / 350 (74.0%): 100%|██████████| 350/350 [01:30<00:00, 3.85it/s]
2024/11/28 18:41:55 INFO dspy.evaluate.evaluate: Average Metric: 259 / 350 (74.0%)
question | example_reasoning | example_answer | pred_reasoning | pred_answer | method | |
---|---|---|---|---|---|---|
0 | What is the smallest integer value of $c$ such that the function $... | The given function has a domain of all real numbers if and only if... | 1 | To determine the smallest integer value of \( c \) such that the f... | 1 | ✔️ [True] |
1 | What is the least value of $x$ that is a solution of $|{-x+3}|=7$? | In order to have $|{-x+3}| = 7$, we must have $-x + 3 = 7$ or $-x ... | -4 | To solve the equation \( |{-x+3}|=7 \), we need to consider the de... | -4 | ✔️ [True] |
2 | Evaluate $\left\lceil -\frac{7}{4}\right\rceil$. | $-\frac{7}{4}$ is between $-1$ and $-2$, so $\left\lceil -\frac{7}... | -1 | To evaluate \(\left\lceil -\frac{7}{4}\right\rceil\), we first nee... | -1 | ✔️ [True] |
3 | A triangle has vertices at coordinates $(11,1)$, $(2,3)$ and $(3,7... | We must find the distance between each pair of points by using the... | 10 | To find the length of the longest side of the triangle with vertic... | 10 | ✔️ [True] |
4 | Let $f(x) = x + 2$ and $g(x) = 1/f(x)$. What is $g(f(-3))$? | First, we find that $f(-3) = (-3) + 2 = -1$. Then, $$g(f(-3)) = g(... | 1 | To find \( g(f(-3)) \), we first need to evaluate \( f(-3) \). The... | 1 | ✔️ [True] |
74.0
And lastly let's optimize our module. Since we want strong reasoning, we'll use the large GPT-4o as the teacher model (used to bootstrap reasoning for the small LM at optimization time) but not as the prompt model (used to craft instructions) or the task model (trained).
GPT-4o will be invoked only a small number of times. The model involved directly in optimization and in the resulting (optimized) program will be GPT-4o-mini.
We will also specify max_bootstrapped_demos=4
which means we want at most four bootstrapped examples in the prompt and max_labeled_demos=4
which means that, in total between bootstrapped and pre-labeled examples, we want at most four.
kwargs = dict(num_threads=THREADS, teacher_settings=dict(lm=gpt4o), prompt_model=gpt4o_mini)
optimizer = dspy.MIPROv2(metric=dataset.metric, auto="medium", **kwargs)
kwargs = dict(requires_permission_to_run=False, max_bootstrapped_demos=4, max_labeled_demos=4)
optimized_module = optimizer.compile(module, trainset=dataset.train, **kwargs)
evaluate(optimized_module)
Average Metric: 310.00 / 350 (88.6%): 100%|██████████| 350/350 [01:31<00:00, 3.84it/s]
2024/11/28 18:59:19 INFO dspy.evaluate.evaluate: Average Metric: 310 / 350 (88.6%)
question | example_reasoning | example_answer | pred_reasoning | pred_answer | method | |
---|---|---|---|---|---|---|
0 | What is the smallest integer value of $c$ such that the function $... | The given function has a domain of all real numbers if and only if... | 1 | The function \( f(x) = \frac{x^2 + 1}{x^2 - x + c} \) will have a ... | 1 | ✔️ [True] |
1 | What is the least value of $x$ that is a solution of $|{-x+3}|=7$? | In order to have $|{-x+3}| = 7$, we must have $-x + 3 = 7$ or $-x ... | -4 | The equation \( |{-x+3}|=7 \) implies two possible cases: 1. \(-x ... | -4 | ✔️ [True] |
2 | Evaluate $\left\lceil -\frac{7}{4}\right\rceil$. | $-\frac{7}{4}$ is between $-1$ and $-2$, so $\left\lceil -\frac{7}... | -1 | To evaluate \(\left\lceil -\frac{7}{4}\right\rceil\), we first nee... | -1 | ✔️ [True] |
3 | A triangle has vertices at coordinates $(11,1)$, $(2,3)$ and $(3,7... | We must find the distance between each pair of points by using the... | 10 | To find the length of the sides of the triangle formed by the vert... | 10 | ✔️ [True] |
4 | Let $f(x) = x + 2$ and $g(x) = 1/f(x)$. What is $g(f(-3))$? | First, we find that $f(-3) = (-3) + 2 = -1$. Then, $$g(f(-3)) = g(... | 1 | To find \( g(f(-3)) \), we first need to evaluate \( f(-3) \). Usi... | 1 | ✔️ [True] |
88.57
Neat. It was pretty straightforward to improve quality from 74% to over 88% on a held-out set here.
That said, for reasoning tasks like this, you will often want to consider more advanced strategies, like:
- A
dspy.ReAct
module with access to a calculator function ordspy.PythonInterpreter
- Ensembling multiple optimized prompts with a majority vote (or an Aggregator module) on top
Just to understand what changed, let's view the prompt after optimization.
dspy.inspect_history()
[2024-11-28T18:59:19.176586] System message: Your input fields are: 1. `question` (str) Your output fields are: 1. `reasoning` (str) 2. `answer` (str) All interactions will be structured in the following way, with the appropriate values filled in. [[ ## question ## ]] {question} [[ ## reasoning ## ]] {reasoning} [[ ## answer ## ]] {answer} [[ ## completed ## ]] In adhering to this structure, your objective is: Analyze the `question` provided, and systematically apply mathematical reasoning to derive the `answer`. Ensure to articulate each step of your thought process in the `reasoning` field, detailing how you identify relationships and formulate equations to arrive at the solution. User message: [[ ## question ## ]] If $|4x+2|=10$ and $x<0$, what is the value of $x$? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] The equation $|4x+2|=10$ implies two possible equations: $4x+2=10$ or $4x+2=-10$. We solve each equation separately. 1. For $4x+2=10$: \[ 4x + 2 = 10 \\ 4x = 8 \\ x = 2 \] However, $x=2$ does not satisfy the condition $x<0$. 2. For $4x+2=-10$: \[ 4x + 2 = -10 \\ 4x = -12 \\ x = -3 \] The solution $x = -3$ satisfies the condition $x<0$. Therefore, the value of $x$ is $\boxed{-3}$. [[ ## answer ## ]] -3 [[ ## completed ## ]] User message: [[ ## question ## ]] Given that $-4$ is a solution to $x^2 + bx -36 = 0$, what is the value of $b$? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] The product of the roots of this quadratic is $-36/1=-36$, so the other solution must be $-36/-4=9$. That means that the sum of the solutions is $-4+9=5$. The sum of the solutions is also $-b/1=-b$. Thus, $-b=5$ and $b=\boxed{-5}$. [[ ## answer ## ]] -5 [[ ## completed ## ]] User message: [[ ## question ## ]] Richard is building a rectangular playground from 200 feet of fencing. The fencing must entirely enclose the playground. What is the maximum area of this playground? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] Let the length of the playground be $l$ and the width be $w$. We have the equation $2l+2w=200 \Rightarrow l + w = 100$. We want to maximize the area of this rectangular playground, which is given by $lw$. From our equation, we know that $l=100-w$. Substituting this in to our expression for area, we have \[(100-w)(w)=100w-w^2\]We will now complete the square to find the maximum value of this expression. Factoring a $-1$ out, we have \[-(w^2-100w)\]In order for the expression inside the parenthesis to be a perfect square, we need to add and subtract $(100/2)^2=2500$ inside the parenthesis. Doing this, we get \[-(w^2-100w+2500-2500) \Rightarrow -(w-50)^2+2500\]Since the maximum value of $-(w-50)^2$ is 0 (perfect squares are always nonnegative), the maximum value of the entire expression is 2500, which is achieved when $w=50$ and $l=100-w=50$ (the playground is a square). Thus, the maximum area of the playground is $\boxed{2500}$ square feet. [[ ## answer ## ]] 2500 [[ ## completed ## ]] User message: [[ ## question ## ]] Alice and Bob are playing a game. Alice starts first. On Alice's turn, she flips a coin. If she gets a heads, she wins. If not, it becomes Bob's turn. On Bob's turn, he flips a coin. If he gets a tails, he wins. If not, it becomes Alice's turn. What is the probability that Alice wins the game? Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Assistant message: [[ ## reasoning ## ]] Alice has a $1/2$ chance of winning the game on her first turn. If she doesn't, then the probability that she wins the game on her second turn is $1/8,$ since she must not win on her first flip ($1/2$ chance), Bob must not win on his first flip ($1/2$ chance), and then Alice must win on her second flip ($1/2$ chance). The probability that she wins the game on her third turn is $1/32,$ and in general, the probability that she wins the game on her $k^\text{th}$ turn is $(1/2)^{2k-1}.$ Thus, the probability that Alice wins is an infinite geometric series with first term $1/2$ and common ratio $1/4.$ So, the probability that Alice wins the game is $$\frac{\frac12}{1-\frac14} = \boxed{\frac{2}{3}}.$$OR Note that the only difference between the odds of Alice or Bob winning is who goes first. Because Bob goes second, the odds of him winning on his $k^\text{th}$ flip is half of the odds that Alice wins on her $k^\text{th}$ flip, since Alice must first get a tails before Bob gets a chance to win. Thus, if $a$ is Alice's chance of winning, and $b$ is Bob's chance of winning, then $a = 2b.$ Also, since someone must win, $a + b = 1.$ It follows that $a = 2/3$ and $b = 1/3,$ so Alice has a $\boxed{\frac{2}{3}}$ chance of winning the game. [[ ## answer ## ]] \frac{2}{3} [[ ## completed ## ]] User message: [[ ## question ## ]] Find the smallest possible value of $$\frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)},$$ where $x,y,$ and $z$ are distinct real numbers. Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`. Response: [[ ## reasoning ## ]] Let \( S = \frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)} \). To find the minimum value of \( S \), we can use the method of symmetry and the Cauchy-Schwarz inequality. By applying the Cauchy-Schwarz inequality, we have: \[ \left( \frac{(y-x)^2}{(y-z)(z-x)} + \frac{(z-y)^2}{(z-x)(x-y)} + \frac{(x-z)^2}{(x-y)(y-z)} \right) \left( (y-z)(z-x) + (z-x)(x-y) + (x-y)(y-z) \right) \geq \left( (y-x) + (z-y) + (x-z) \right)^2. \] The right-hand side simplifies to: \[ (y-x) + (z-y) + (x-z) = 0, \] which means the right-hand side is \( 0 \). Therefore, we have: \[ S \cdot \left( (y-z)(z-x) + (z-x)(x-y) + (x-y)(y-z) \right) \geq 0. \] This implies that \( S \) is non-negative. Next, we can test specific values for \( x, y, z \) to find the minimum. Let's set \( x = 0, y = 1, z = 2 \): \[ S = \frac{(1-0)^2}{(1-2)(2-0)} + \frac{(2-1)^2}{(2-0)(0-1)} + \frac{(0-2)^2}{(0-1)(1-2)}. \] Calculating each term: 1. The first term: \[ \frac{1^2}{(-1)(2)} = \frac{1}{-2} = -\frac{1}{2}. \] 2. The second term: \[ \frac{1^2}{(2)(-1)} = \frac{1}{-2} = -\frac{1}{2}. \] 3. The third term: \[ \frac{(-2)^2}{(-1)(-1)} = \frac{4}{1} = 4. \] Thus, \[ S = -\frac{1}{2} - \frac{1}{2} + 4 = 3. \] To check if this is indeed the minimum, we can also consider the case when \( x, y, z \) are the roots of a quadratic polynomial. By symmetry and the nature of the expression, we can conclude that the minimum value of \( S \) is indeed \( 3 \). Therefore, the smallest possible value of \( S \) is \( \boxed{3} \). [[ ## answer ## ]] 3 [[ ## completed ## ]]