How can we make an AI that is really good at mathematics? Some AIs are okay at math right now. GPT-4 can do okay on the math SATs. But personally, intuitively, I feel like math is like chess. A math program should be able to completely dominate humans at math, the way that chess programs dominate humans at chess.

Is scaling all we need?

Maybe Nvidia launches the H200, H300, H400, H500, all the numbers get bigger, GPT-8 just naturally answers every math question correctly, and there’s no use for anything else?

My gut suspicion is this will not happen. Math is like chess. GPT-4 can play chess, but it has a very broad and shallow understanding. I think this is inherent in the architecture. I don’t think the large language models are going to naturally end up with the deepest understanding of narrow topics, even if there is a better search layer on top of them.

I don’t want to ignore the rise of LLMs, though. ChatGPT and friends are useful right now for math questions, and they are going to keep getting better. They also have a great interface for many things - you just ask your question with English and math interspersed.

Architecture

I think the way to architect a math AI at a high level is with separate “translator” and “solver” components.

A human has a math question, expressed in English, like “If f(x+1) = 2 * f(x), what could f be?” (This seems to be too hard for the LLMs right now.) The translator speaks English and also speaks some sort of “math language”. It translates the problem into the math language. The solver only speaks the math language, like a chess program only understands chess positions.

It’s important for the system to speak English, both to be useful to actual humans, and possibly for contests like the first AIMO progress prize where the questions will be phrased in plain English.

A translator should probably be based on an open-source LLM. Hopefully all of those improve as fast as possible. The more open-source LLMs understand math, the better, so that they can do things like disambiguate unclear problem statements.

As a matter of strategy, there are already so many people working on LLMs, I don’t want to directly compete in that world. I want to be working on the parts that are complementary to the ongoing LLM revolution. So I’ll try to think most about the solver.

Some interesting math solvers exist already. I’d like to take the good ideas from a few in particular:

The process of writing a proof is like the process of playing chess. You have a bunch of decisions for what to do next with your proof (called a “tactic” here), that makes a tree, you search through the tree. (They call it a hypertree, but hypertrees are basically the same thing as trees. None of these are really trees because you cache, anyway.)

These trees need an underlying representation of mathematical statements. The best one is Lean, a “proof assistant / programming language”.

There are two core problems that make me think that the right strategy is not just scaling up HTPS. One is that Lean tactics are slow and in a sense inconsistent. A single Lean “tactic”, one step in the tree, can be running an arbitrary program. This includes, like, a full solver for systems of linear equations. Or it could just be swapping in x = 2 to an existing formula.

The other core problem is that each tactic is being generated by a transformer. This seems inefficient to me. It’s like using a transformer to generate all the possible moves in chess. Given Lean, though, there isn’t really an alternative. There’s no straightforward way to generate all legal tactics from a single Lean proof state.

Both of these design decisions make total sense in order to get something working. Lean exists, that’s one very nice quality it has.

But, they make me think that if we had a mathematical language that was optimized for the common operations in a proof search process, it would work a lot better than a language that was optimized for arbitrary human programming, like Lean tactics.

There are two specific parts of proof search that I think are worth optimizing. In Lean world, “library search” and “rewrite search”. This post is already getting long so I won’t dig into them here.

AlphaGeometry

This one is world-class at geometry contest problems. A lot of the method is geometry-specific, but I think some parts will generalize.

My favorite idea is what they call “pulling a rabbit out of the hat”. This video is very good.

They have a low-level solver, which can solve one class of geometry problems. But the low-level solver only really works with known entities. So they have a higher-level component above that, not a large language model, but a “small language model”, something with transformers, to add in the “rabbits”.

A rabbit could be something like, your geometry problem has these three triangles in it. So you were naturally considering all the corners of the triangles. But perhaps you should also consider the midpoint of the top of triangle 1 and the top of triangle 2?

It’s inserting some extra mathematical entity into the problem. The transformer suspects this entity might be relevant. But it might not be. It’s more like a mathematical intuition than a mathematical calculation.

I think this rabbit idea is going to apply more generally. You build an efficient classical algorithm to solve problems that are “straightforward” in some way, but you still need transformers to generate some “intuitive” steps in your reasoning. So, some steps in the proof are generated by transformers. Just not most of them.

The E Theorem Prover

This is old, it’s not really modern AI at all. And it isn’t powerful enough to handle general mathematics. But, the authors have done a good job explaining how they build their theorem prover, how they represent expressions, how they normalize expressions, and how they optimize their data structures for common operations. Like checking if a generalization of a statement has already been proved. And they essentially have their own methods for optimizing the “library search” and “rewrite search” problems. There are good ideas to borrow here.

Conclusion

The main goal of this blog post is to organize my own thoughts. I hope you’ve found it interesting to read along!

I’d like to write more about “library search” and “rewrite search”. I think they could play a similar role to the low-level solver in AlphaGeometry, and you can encompass a lot of mathematics with these two mechanisms. So perhaps I’ll write a blog post next month about them.