• Alien Clay

    I picked up Alien Clay because I like Hugo nominees and I like Adrian Tchaikovsky.

    Adrian’s great love is hideous nonhumanoid aliens, either buglike or tentacle-based or both. He relishes the biological details, from the properties of humanoid skin, to the lifecycle of terrestrial fish parasites.

    How could an ecosystem develop so that for example grasping limbs were actually symbiotic creatures? He’s just super enthusiastic about this sort of science fiction question. He wants to visualize these things, like a high schooler doodling alien monsters in his notebook.

    There also also humans around. It’s an authoritarian dystopia. The bad guys sound like Effective Altruists. No, sorry, it’s completely different. They’re “Scientific Philanthropists.”

    Their justification for doing everything they do is that they have a logical, rational piece of thinking, which means it’s the best way to do things for the greatest number of people. So they love science, because it gives them permission to do all the shit they do. Right up to the point someone puts together an inconvenient but cogent argument that gets in the way of how they want the universe to be. They want very specific answers from science. Black and white answers to complex questions. Everything sorted into predetermined boxes.

    At first I was surprised to see this critique, because Adrian Tchaikovsky seems very pro-shrimp. But his critique is less about shrimp welfare, and more about the problem of expecting too much from rational reasoning. Promising certainty when you don’t have certainty, and then becoming unable to admit you were wrong, corrupting what should have been your intellectual principles.

    They spend way too much time performatively writing memos.

    Would Tchaikovsky actually care about shrimp welfare? I think no. Shrimp yes, but welfare no. He seems to be keenly aware of the rules of the animal kingdom, and to respect those rules in a deontological way. If anything he wants to raise the status of shrimp. Not to rank species “according to how close to human they are.”

    I think the author fantasizes about being infested with alien symbiotes and turned into some sort of bug monster. If you have this fantasy too, you should read this book. Otherwise, I did like the book, but I think Tchaikovsky’s best is Children of Time followed by Children of Ruin, so I would start with that series, and read Alien Clay if you want even more.

    The “human dystopia” part of the story is okay, but it feels like he started with the aliens and worked backwards to the prison planet. The plot needs an excuse for why a cutting edge biologist would be doing lots of manual labor.

    If you’re interesting in reading a novel set in an authoritarian prison camp, you should read Life and Fate instead. Real prison camp stories seem like the offenses were often completely trivial or nonexistent, and that has its own sort of horror, the evil springing up out of nothing at all. In Alien Clay, most of the prisoners actually had been planning a violent rebellion, albeit in a young adult Disneyfied way. “On the day of the uprising I personally didn’t carry a gun because that isn’t my style but I was still super important and helpful.” Nobody who is pointlessly denounced, or never figures out why they were there.

    In the end, I feel like he doesn’t really answer the question he poses. Why would an alien ecosystem evolve so that tentacles were symbiotic creatures that could install themselves into anything that needed an extra tentacle? I still don’t know. But I think Adrian Tchaikovsky has some more novels in him, ready to grapple with this fundamental question.

  • Every Love Story Is A Ghost Story

    David Foster Wallace was one of my earliest “real literature” dives. From the days when, sometimes after a long day of work, I just didn’t want to do more work on the Google shuttle. I got back into reading after not reading so much in a personally-directed way in college. Recently, flipping through Brief Interviews with Hideous Men, I found a business card from Google that I had been using as a bookmark.

    Every Love Story Is A Ghost Story is a DFW biography. (DFW = David Foster Wallace.) Whenever I read DFW in particular it makes me want to put parentheses in my writing. (Not footnotes… parentheses seem more appropriate for someone who spends more time working with programming languages than human ones.)

    It’s interesting to learn more about his writing process. I hadn’t realized how much he blurred fiction and nonfiction. There is a lot of fictional stuff in his “nonfiction”. Many incidents that seem to not have happened, but they made for better reading. It reminds me of the Hemingway quote. “All good books are alike in that they are truer than if they had really happened.” Like Labatut’s style.

    Also the other way around. So much of his short fiction and Infinite Jest was based on his own life. From tennis camps to halfway houses to the “brutalist sun” of Arizona. And this relates to his difficulty in writing a second great novel. You pour your whole life into the first novel, and then what’s left? Like Eminem struggling for material once he is no longer an underappreciated rebel.

    The Pale King is really great. It’s a literary tragedy that he never completed it. I wish the rest of it existed. Wouldn’t it be amazing if one day AI could finish it? Your first instinct is probably to shudder, but I mean finish it in a good way. I used to think this was impossible, but I actually thought the Sanderson part of the Wheel of Time was better than the original. Let’s just consider this my personal benchmark for literary superintelligence.

    In some sense this post is a book review. But really, the book you should get if you are intrigued is The Pale King. Not this biography.

    The topic of The Pale King I never would have expected. Boredom. The IRS. Endurance in a mental sense, jobs that require grinding through analysis. I wonder if reflected his struggle with his own work, like Knausgard or Levrero. The Pale King is the hero we need in our modern era, achieving professional triumph through his superhuman ability to endure boredom.

    Infinite Jest was prescient. He wrote it based on the media environment of cable television, but he was either brilliant, or got lucky, or both, because the internet era only enhanced the danger and the power of the technologically-boosted-entertainment. Both the “web” and the “feed” models. Infinite Jest zoomed into the dystopia of this world. And maybe The Pale King has a vision of a way out. To focus on a goal and use your own individual, human powers to transform the boring work on it into something greater.

    These two poles, the temptation of entertainment and the value of boredom, they both reflect a sort of difficulty in two parts of the mind coming to terms with each other. The short term instincts and the long term goals. And DFW’s style seems perfect for this, as a lens or digging tool, the recursive thought patterns or branching textual structure combining or overlaying the multiple different sorts of thoughts and desires. A technique to reflect a mind torn between its different impulses.

    I like to daydream that today, in some perfect parallel universe, DFW, after a record-breaking, pathologically extended period of procrastination, in his 60s, has finally finished The Pale King.

  • Rewrite Search

    Sometimes you know things because of theorems whose form is, when you know one true thing, then you know another true thing. P implies Q. If x is even, then x^6 is divisible by 64. And 10 is even, therefore 10^6 must be divisible by 64. Figuring out when this applies is the “library search” problem I wrote about last month.

    Rewrite search is a bit different. A rewrite happens when you know that two different expressions are equal. So you can replace one of those expressions with the other, inside a longer statement. 10^6 equals a million. And 10^6 is divisible by 64, so a million is divisible by 64.

    The basic “rewrite search” problem is, given two different expressions, figure out how to make them equal through a series of rewrites.

    Why is rewrite search important?

    I think most “normal person mathematics” can be thought of as a series of rewrites.

    Consider a question like “what are the factors of 27?” (I found this by googling “math question” and taking the first one that looked like a pure math question.)

    The normal way to solve this is roughly, first you consider that 27 = 3^3. Then you perhaps know that for a prime p, the factors of p^n are p^k for k in [0, n]. You can write a series of expressions like

    factors(27)
    
    factors(3^3)
    
    3^k for k in [0, 3]
    
    3^0, 3^1, 3^2, 3^3
    
    1, 3, 9, 27
    

    where each expression is a rewrite of the expression in the previous row.

    There are a few more details here that make this not purely a rewrite question. For example, we didn’t have a theorem for the factors of 3^3 specifically. We had a theorem for the factors of p^n, where p was prime. So we need to be able to rewrite based on a formula, and to apply some conditions to the formula. We need to know that 3 is prime, and that isn’t a “rewrite search” type of problem, that’s a “library search” type of problem.

    Another detail is that we didn’t start off with a particular destination in mind. We didn’t have the problem of, rewrite factors(27) into [1, 3, 9, 27]. We just wanted to rewrite factors(27) in a simpler form.

    Another detail is simplifying the “for loop”, which maybe happens in multiple steps.

    That said, the essence of the problem is rewriting. Most questions of the form “solve for x” or “what does this expression evaluate to” are fundamentally about rewriting. As opposed to questions of the form “prove that X is true”, which are often more implicationy.

    What can we do about it?

    I worked on the Lean rewrite_search tactic for a while. It didn’t end up as useful as I had hoped. The main problem is that there are so many possible ways to rewrite a formula, you can’t use a plain breadth-first search and get very far. We need to be using AI heuristics. In Lean we were hoping to get mathematicians to tag theorems based on how good they were for rewriting, but this was just too much of a hurdle.

    Intuitively, I think this makes sense, that we should have a heuristic sense for how good expressions are to rewrite into other expressions. You see 27 in a math problem, you immediately think that’s 3^3. You see 6401 in a math problem, you immediately think that’s 80^2 + 1. These potential rewrites move forward in your mind, even if these numbers are just a small part of a larger expression.

    The other thing we need to do tactically is to be using rewrites starting at every expression in a problem. A bidirectional search. When you’re searching for a path of length 2n from A to B, with a branching factor of b choices at each step, and you start just at A, you have to search O(b^2n) nodes. If you start forwards from A and backwards at B, you only have to search O(2b^n) nodes, which looks similar but is far better.

    Conclusion

    I think these two forms of reasoning, library search and rewrite search, plus basic propositional logic, are capable of solving a large amount of mathematics problems.

    Now what? In a sense this has been a “bottom up” view, answering what tactics are useful for solving math problems. We also need a “top down” view. Can we build a math AI that starts with a narrow focus on a small set of math problems, nails that, and expands its domain of mastery over time? I’d like to write more on this topic next month.

    Thanks for reading!

  • Library Search

    Last month I wrote about building a math AI and mentioned that I think we need to optimize the “library search” problem. In this post I’d like to explain what library search is, why it’s important, and what we can do about it.

    Motivation

    A few years ago I got excited about the IMO Grand Challenge. I thought it would be interesting to look for a team of smart people working on it and see if I could help out. One group of people was focused on using the Lean theorem prover, and one place it seemed like I could help out was formalizing IMO problems in Lean, to use as training data. I formalized some of them, for example here and here, but it seemed like progress was so slow, the whole strategy was unsound. If it takes you more than a day to code review a single point of training data, it’s going to be hard to train an AI. So, I gave up on this approach. But, I did learn a lot about the annoying parts of formalizing proofs.

    Lean has many different tactics for solving proofs. One general operation that you do all the time is just “applying a theorem”.

    For example, let’s say you have two existing theorems. Foo implies bar, and bar implies baz. Now you want to prove that foo implies baz. In Lean this proof looks like:

    theorem foo_implies_baz : foo → baz :=
    begin
      intro _,
      apply bar_implies_baz,
      apply foo_implies_bar,
    end
    

    In particular, you need to know the names of the existing theorems you want to use, foo_implies_bar and bar_implies_baz. “library search” is the name of a Lean tactic that tries to figure this out for you, but is slow and fails a lot. (The tactic is now renamed exact?, but it’ll always be library_search! in my heart.)

    Why This Is Hard For Humans

    It’s hard for humans to remember all these theorem names.

    This might be unintuitive. For a programmer, it doesn’t seem unreasonable that you have to know the name of a function you’re calling. How could it be any other way?

    For a mathematician, a way of manipulating a formula doesn’t typically have a name. Imagine you’re doing long division. You can remember the whole process, but you don’t give a name to every step of the way. There are too many different steps, only slightly different from each other.

    It becomes harder as the size of the library increases. It’s like the category of jokes where one mathematician calls a problem “trivial” but another mathematician is stumped by it.

    Take a look at these theorems and it’ll make more sense why it’s hard. There is just an explosion of basic facts about numbers, all of which a reasonably competent mathematician would just say, “this is trivially true”.

    For example, really do click that link above. Then take a look at eq_mul_of_div_eq_left. In English, “if a is a multiple of b, and a divided by b rounded down is c, then c times b is a.”

    For enlightenment, go ahead and read the definitions of div_eq_iff_eq_mul_left, add_mod_eq_add_mod_left, and the dozens of others of similarly named theorems.

    In programming, a library with 400 rarely-used functions would be unusable. In mathematics, though, it’s fine, in a sense. As long as everything is true, and a human looking at it thinks it’s obviously true, then the only problem is when you have to know these names.

    Why This Is Hard For Computers

    It’s hard for humans to memorize a whole dictionary, too. For computers, it’s easy. So why isn’t this lookup easy for computers? Why isn’t library search nicely solved in existing proof assistants?

    I do think this is solvable. The basic problem is the inexact nature of the lookup. Theorems have variables in them. When you know that a < b implies a + c < b + c, this theorem is true for any values of a, b, and c. As you have more facts available, and more theorems in your library, this gets slower. In the most general case, you have to do an exponential matching process.

    That said, people have found some reasonable data structures for doing this sort of lookup. The discrimination tree and fingerprint indexing from the E theorem prover are good examples. And I think a lot of it is the sort of performance low-hanging fruit where if you specifically design data structures and profile for this, you’ll be able to do better than tacking a lookup structure on after the fact.

    Will LLMs just solve this automatically, with scale, by putting the entire library inside their context window? It’s possible. Maybe there’s a good argument against this. I’m honestly not sure how to predict the future path of LLM skills. At some point I have to fall back on my thesis of “scaling LLMs will not solve all outstanding problems in everything”, if only by a sort of Pascal’s reasoning.

    The Ideal Proof

    A formalized proof should not have to be constantly citing the names of theorems. For either a human or an AI, rather than writing and validating this proof:

    theorem foo_implies_baz : foo → baz :=
    begin
      intro _,
      apply bar_implies_baz,
      apply foo_implies_bar,
    end
    

    you should be able to write the proof the way a human writes a proof in prose, by just stating true things that follow from the premises without giving an explicit rationale for them.

    theorem foo_implies_baz: foo -> baz {
      bar
    }
    

    In 99% of cases, the compiler should just be automatically handling the library search for you. I think natural exceptions would be fancy tactics like induction. In most human-written proofs, when you’re using induction, you do say, “now I’m using induction”.

    You can think of this like a mini proof search. It’s just that you are searching just a few steps. One retrieval from the library, plus a bit of normalizing between logically equivalent expressions. It’s like a base case that the proof search needs to handle before it can expand to more complicated cases.

    Conclusion

    I think automatically handling library search will make proofs easier for both humans and AIs. You shouldn’t need to use the names of theorems.

    There’s a related, somewhat harder problem of “rewrite search”. Originally I was going to write about that, too, but this post is getting long, so I’ll cut off here. Thanks for reading! Next month I’ll write about the rewrite search problem.

  • How to Build Math AI

    How can we make an AI that is really good at mathematics? Some AIs are okay at math right now. GPT-4 can do okay on the math SATs. But personally, intuitively, I feel like math is like chess. A math program should be able to completely dominate humans at math, the way that chess programs dominate humans at chess.

    Is scaling all we need?

    Maybe Nvidia launches the H200, H300, H400, H500, all the numbers get bigger, GPT-8 just naturally answers every math question correctly, and there’s no use for anything else?

    My gut suspicion is this will not happen. Math is like chess. GPT-4 can play chess, but it has a very broad and shallow understanding. I think this is inherent in the architecture. I don’t think the large language models are going to naturally end up with the deepest understanding of narrow topics, even if there is a better search layer on top of them.

    I don’t want to ignore the rise of LLMs, though. ChatGPT and friends are useful right now for math questions, and they are going to keep getting better. They also have a great interface for many things - you just ask your question with English and math interspersed.

    Architecture

    I think the way to architect a math AI at a high level is with separate “translator” and “solver” components.

    A human has a math question, expressed in English, like “If f(x+1) = 2 * f(x), what could f be?” (This seems to be too hard for the LLMs right now.) The translator speaks English and also speaks some sort of “math language”. It translates the problem into the math language. The solver only speaks the math language, like a chess program only understands chess positions.

    It’s important for the system to speak English, both to be useful to actual humans, and possibly for contests like the first AIMO progress prize where the questions will be phrased in plain English.

    A translator should probably be based on an open-source LLM. Hopefully all of those improve as fast as possible. The more open-source LLMs understand math, the better, so that they can do things like disambiguate unclear problem statements.

    As a matter of strategy, there are already so many people working on LLMs, I don’t want to directly compete in that world. I want to be working on the parts that are complementary to the ongoing LLM revolution. So I’ll try to think most about the solver.

    Some interesting math solvers exist already. I’d like to take the good ideas from a few in particular:

    The process of writing a proof is like the process of playing chess. You have a bunch of decisions for what to do next with your proof (called a “tactic” here), that makes a tree, you search through the tree. (They call it a hypertree, but hypertrees are basically the same thing as trees. None of these are really trees because you cache, anyway.)

    These trees need an underlying representation of mathematical statements. The best one is Lean, a “proof assistant / programming language”.

    There are two core problems that make me think that the right strategy is not just scaling up HTPS. One is that Lean tactics are slow and in a sense inconsistent. A single Lean “tactic”, one step in the tree, can be running an arbitrary program. This includes, like, a full solver for systems of linear equations. Or it could just be swapping in x = 2 to an existing formula.

    The other core problem is that each tactic is being generated by a transformer. This seems inefficient to me. It’s like using a transformer to generate all the possible moves in chess. Given Lean, though, there isn’t really an alternative. There’s no straightforward way to generate all legal tactics from a single Lean proof state.

    Both of these design decisions make total sense in order to get something working. Lean exists, that’s one very nice quality it has.

    But, they make me think that if we had a mathematical language that was optimized for the common operations in a proof search process, it would work a lot better than a language that was optimized for arbitrary human programming, like Lean tactics.

    There are two specific parts of proof search that I think are worth optimizing. In Lean world, “library search” and “rewrite search”. This post is already getting long so I won’t dig into them here.

    AlphaGeometry

    This one is world-class at geometry contest problems. A lot of the method is geometry-specific, but I think some parts will generalize.

    My favorite idea is what they call “pulling a rabbit out of the hat”. This video is very good.

    They have a low-level solver, which can solve one class of geometry problems. But the low-level solver only really works with known entities. So they have a higher-level component above that, not a large language model, but a “small language model”, something with transformers, to add in the “rabbits”.

    A rabbit could be something like, your geometry problem has these three triangles in it. So you were naturally considering all the corners of the triangles. But perhaps you should also consider the midpoint of the top of triangle 1 and the top of triangle 2?

    It’s inserting some extra mathematical entity into the problem. The transformer suspects this entity might be relevant. But it might not be. It’s more like a mathematical intuition than a mathematical calculation.

    I think this rabbit idea is going to apply more generally. You build an efficient classical algorithm to solve problems that are “straightforward” in some way, but you still need transformers to generate some “intuitive” steps in your reasoning. So, some steps in the proof are generated by transformers. Just not most of them.

    The E Theorem Prover

    This is old, it’s not really modern AI at all. And it isn’t powerful enough to handle general mathematics. But, the authors have done a good job explaining how they build their theorem prover, how they represent expressions, how they normalize expressions, and how they optimize their data structures for common operations. Like checking if a generalization of a statement has already been proved. And they essentially have their own methods for optimizing the “library search” and “rewrite search” problems. There are good ideas to borrow here.

    Conclusion

    The main goal of this blog post is to organize my own thoughts. I hope you’ve found it interesting to read along!

    I’d like to write more about “library search” and “rewrite search”. I think they could play a similar role to the low-level solver in AlphaGeometry, and you can encompass a lot of mathematics with these two mechanisms. So perhaps I’ll write a blog post next month about them.


...