• Reason And Revelation

    Earlier I was reading this biography of Abraham Lincoln and the section about Deism was interesting. Lincoln was attacked by his opponents at some points for being an atheist, because he briefly dabbled with considering himself a “Deist”. I remembered reading sometime before that some founding father types were Deists - as I check up via Google it sounds like Benjamin Franklin, Thomas Jefferson and George Washington were. So there is a bit of American president tradition there.

    So I was curious, why bother? Because deism doesn’t seem like a big thing nowadays; people simply call themselves atheists or just don’t really talk about it and answer surveys “no religion” while also not really calling themselves atheists. I thought I’d read a pro-deism book to see.

    The Age Of Reason

    Man, this book is really something. This is the most anti-Bible book I ever read. “New atheism” isn’t even talking about the Bible any more, Dawkins and Sam Harris aren’t really arguing against the Bible, they’re arguing against people who feel abstractly like “religion is good” but even so are loosely affiliated.

    The Age Of Reason is much more like, hey we need to stop treating this book like it’s the word of God! The book of Mark disagrees with the book of Luke in these seven points! You can tell by an analysis of pronouns used that the gospel of Matthew was not actually written by Matthew! Moses advocates the rape and slaughter of thousands of innocent children in Numbers 31:13-47! Thomas Paine doesn’t literally use a lot of exclamation points but it feels like he wants to.

    This isn’t really a debate I can even engage with. It’s no longer in the Overton window, at least where I live in California and with the set of people I pay attention to on Twitter. It’s more like a portal into the past, giving me a glimpse of a world where the most intelligent American and European statesmen and scientists would frequently refer to the Bible as a source of factual knowledge.

    There is an interesting idea of “reason vs revelation”. Revelation is another concept I rarely think of nowadays. Thomas Paine is primarily making the point that revelation to a character in history is no longer revelation for someone else; now it’s trusting the word of another.

    Sometimes you read two seemingly unrelated books and a concept ties them together. Later on I read…


    Wilson’s thesis in this book is roughly that math, science, the social sciences, arts, ethics, religion, all of these things should fuse into one unified endeavour. It’s an interesting book and I recommend it to people interested in science but not only interested in science.

    The part that reminded me of the “reason vs revelation” concept was a discussion of why scientists liked doing science. Science is fundamentally about reason, about using reason to solve problems and figure out the natural world. But why you do science can’t be justified in the same way. In practice different scientists have very different motivations for wanting to do it in the first place. Some are religious, some aren’t. But the combination basically works.

    Perhaps some amount of revelation is necessary. With reason alone, can you ever get an “ought” from an “is”? Even if you rationally piece together why some course of action is correct, you find yourself working back to axioms where you get stuck. How much should we worry about animal welfare? You can boil it down to a question of one human life is equal to X chicken lives. What is X? How can this possibly be answered with reason alone?

    Aumann’s Agreement Theorem

    This “theorem” has always bugged me. I notice that I blogged about it seven years ago. It states that rational people should never disagree, and tries to prove it with some Bayesian reasoning and mathematical arguments.

    In practice, groups of smart people very often disagree with each other. A group of 100 people where everyone agrees on everything, that isn’t a gathering of smart, rational people, those people are groupthinking in some way.

    Perhaps this too is related to the reason vs revelation breakdown. If a person cannot operate entirely on reason, if they need at least a little bit of revelation as well, then you cannot entirely agree with other people. This little bit you disagree with them on, you don’t know if it’s because they’re smart and rational and they know something you don’t, or if their revelation is at work, putting its finger on the scale. And they don’t really know either. And it’s okay. You just need to figure out some pluralist way to work together despite not entirely agreeing on everything.

    Startups need a little bit of revelation. You need to be a little too excited, to believe in your idea a little too much, and it doesn’t work to rationally think “Startups require you to believe in your idea a little too much, so rationally I’m going to take my belief level and kick it up a notch.” It’s like Pascal’s wager; real belief doesn’t work that way.

    So What?

    Do you need more revelation? Do you want more revelation? It’s hard to just sit down and get excited about something. But sometimes it’s contagious, sometimes it’s worth seeking out other people who are a little too excited about their weird ideas. Sometimes it will spread to you.

  • A Math AI Minisurvey

    I’m curious about the idea of using AI to discover math proofs, so I thought I’d write up my thoughts on some of the relevant papers. This is very much not a complete list, but I feel like these papers define my “mental frontier” in some way.

    If there are other papers you think are equally cool and relevant, I’m interested to hear about it.


    The general strategy here is to hook up a text-based transformer to a proof assistant. They chose Metamath for the proof assistant, and it’s interesting that rather than use an existing Metamath implementation they basically wrote their own in Python. There’s a search 128 steps into the graph, with each step guided by the transformer.

    The “graph” here is like the move tree in chess or go. Each step is something like, prove statement X given all this stuff we know to be true. So you don’t really change the state of the search as time goes on, you just add to it. The tree in chess is also not really a tree once you consider that some boards you can reach via multiple move orders, but it’s logically more tree-like.

    Choosing Metamath is tough. Metamath is kind of like the assembly language of proof assistants. A single step in a language like Lean can take many steps in Metamath. The core problem here is that it’s annoying and buggy to hook up any external logic to a proof assistant. All the proof assistants are using their own weird languages, their own weird compilers, these really strange toolchains. So this does make a ton of sense as a practical decision - Metamath was simple enough they could just implement their own.

    Still, if you can only search for 128 steps, there are single Lean tactics that will expand to more Metamath steps than that.

    Polu 2022

    This one is a cooler successor. It’s using Lean instead of Metamath - presumably someone did a bunch of grunt work to hook it up. Props.

    There’s a transformer here trained on CommonCrawl and the “WebMath” corpus. I’m impressed that anything like this works at all. Maybe it’s just fine to train on a huge blob of data that contains a nonsense superset, there’s a bunch of extra space used but if you have the space then logically I guess the capability is the same.

    They train the model to find the next tactic used in a proof, and to minimize the size of the resulting proof. Makes sense to minimize the size of a proof - proofs have a weird property where technically you can add all sorts of extraneous statements into a proof, and the proof is equally valid. If you aren’t sure what to do, you do have some incentive to just write down a slew of true statements. (I got a 3/7 partial credit on an IMO geometry problem for doing this, and accidentally doing something useful.) When learning you really should prefer a shorter, more beautiful proof, though.

    Fine-tuning used a dataset which I believe is derived from lean’s mathlib. The lean mathlib community is quite impressive. 20 commits per day in March. I’m not quite sure how it compares to the other theorem proving communities, but the other ones tend to be less “one open source project with a swarm of activity” and more “many different groups doing different things” so I feel like Lean is the most likely to win out.

    It’s tough that they had to manually formalize some training problems. Getting reasonable training data is really hard for these math proofs.

    One aspect of this prover that I think is under-recognized is how much it uses complex Lean tactics. nlinarith, field_simp, norm_num, these are each really complicated manual theorem-proving tactics. There might be more code in nlinarith than in the AI-training components of this system. In one sense, this is good because as people make more tactics it will make this sort of approach stronger. In another sense, this approach seems like it will be restricted to the areas where we have really good tactics.

    Transformers overall

    I feel like transformers for proof generation are really hamstrung by a lack of training data. Transformers would love to have billions of proofs to learn patterns from. This seems achievable in some cases, like solving some systematic variety of problem by generating a lot of them. But many problems don’t work this way.

    I am curious whether a more reductionist approach would be useful, where you specifically try to generate “interesting problems”, in a sub-area like algebraic inequalities or Euclidean geometry. In practice, mathematicians do seem capable of differentiating interesting problems from boring problems. You generally want to study interesting problems to get better at math.

    The Gowers Manifesto

    Sort of a different angle on the problem entirely. Gowers poses a few questions in the blog post that I think would have a “standard machine learning” answer. I imagine a dialogue like this.

    Question. What is it about the proofs that mathematicians actually find that makes them findable in a reasonable amount of time?

    Answer. The state of a partial proof has many different features that each contribute a little bit to this. We can model these properties with deep learning.

    Question. Humans seem to be able to find proofs with a remarkably small amount of backtracking. How do they prune the search tree to such an extent?

    Answer. A mathematician’s intuition tells them what a good next step is, given the current state of the proof. We can approximate this intuition with a deep learning model.

    Problem. Come up with a precise definition of a “proof justification”.

    Answer. A proof justification is a deep learning model that takes a partial proof state and gives you possibilities for a proof tactic. It justifies any proof for which the combination of the scores for each of its tactics is high.

    To me, the problem with these answers is that deep learning alone has trouble with the minimal amount of training data that human mathematicians get. We need to be able to bootstrap. We need a lot of good proofs to learn what good proofs look like. More than human mathematicians are going to provide, Lean community notwithstanding. So I think we really do need better answers to these interwoven-but-related questions that Gowers is posing.

    At the heart of it is a basic principle that shorter proofs are more elegant. But that can’t be everything - it isn’t elegant to prove that 234 + 678 = 912 even if the proof is very short.

    The Lottery Ticket Hypothesis

    I hope this hypothesis is true because it makes my mental model of how deep learning works a lot simpler. Why are transformers so good? Because the previous methods for handling sequences had no way to model a function that jumps around the input like

    # Let a be an array of a bunch of smallish integers
    i = 0
    for _ in range(100):
      i = a[i]
    return i

    If that’s your function, the ideal circuit to solve it just isn’t a subset of the deep learning networks that you build if you use something like LSTM. With a transformer you can in fact make a network that accesses different locations. Voila.

    How much can a transformer model? I feel like a transformer is still limited, by its depth. Can you teach a transformer how to multiply two numbers? I think there is some sort of algorithmic limit, for functions that cannot be embedded in a transformer network, and perhaps these mathematical heuristics cannot be embedded in a transformer network. I’m not sure though.


    What is the simplest function of sequential data that transformers cannot learn?

    Is there any heuristic for whether a math problem is interesting or not?

    Let me know if you have a good answer…

  • The Cost Of Efficient Markets

    I love these Zvi blog posts where he analyzes prediction markets and states where he thinks they are overrated and underrated. At the same time, it seems like they take a ton of work.

    One way to look at a prediction market is that you have two forces in balance. One crowd thinks a proposition is likely to happen, another crowd thinks the proposition is unlikely to happen, and the price mediates between these two groups. In this view, the role of money is to stabilize the system, and you get useful information as a side effect.

    Another way to look at a prediction market is across all predictions, who is making a lot of money, and who is spending a lot of money? The ideal prediction market has very intelligent predictions, with people working hard on intelligent analysis. Are these people making money, to get paid for their work? Maybe they don’t have to be, like on Metaculus; maybe they will just play for points.

    For a predictor to make money on prediction markets, someone else has to be losing money. Who is willing to consistently lose money on prediction markets? You might get addicted to prediction markets, mistakenly overrate yourself, and keep doing it when you shouldn’t, like a gambling addiction. But it seems tough to get a large user base addicted to prediction markets when there are so many more optimized types of gambling out there. And friendly regulation for prediction markets is essentially predicated on it not “feeling like gambling”.

    Stock markets avoid this problem. The intelligent analysts do get paid, but they are essentially taking a cut from the huge flows of money that go into stocks without deep analysis. If every index fund investor makes an extra 0.1% lower interest because they haven’t optimized their spend, that funds a whole lot of analysts to do research on optimal price levels.

    It seems hard for prediction markets to ever be worth it for the money. I think they will have to be driven by people who are simply in it for the “love of the game”. In fact, writing this blog post makes me want to sign up for Metaculus, so that I can bet against some of the AI wagers that seem overrated….

  • AlphaFold: A New Kind of Science

    Last night I was watching an interesting video on AlphaFold, which is basically using modern deep learning techniques to solve a longstanding problem in molecular biology, determining how a protein folds based on its sequence.

    One part that is interesting is that they develop an intermediate structure which looks very “vectory” - a matrix of how close each part of the molecule is to each other part. In some sense this is redundant; you are storing a position for each pair instead of a single position per component. But it means if you screw up one part of the output you don’t automatically screw up all the other parts. It is more “local” in a sense.

    The other part that is interesting to me is that it is an approach towards solving the “middle size of problem” that I mentioned in my previous post on the “theory of everything”. How should we interpret this?

    The Lottery Ticket Hypothesis

    One way of understanding modern machine learning is the Lottery Ticket Hypothesis. Roughly, the lottery ticket hypothesis says that for problems that deep learning works well on, there exists an algorithm that solves the problem, and you know how to make a large deep network that is shaped in a way that is similar to this solution, so probably there is some subset of the randomly initialized deep network that happens to be close to the solution. The process of training a deep learner can then be thought of as searching for this subnetwork that happens to contain the answer.

    The lottery ticket hypothesis is not necessarily true. For example, the training process might get closer and closer to a decent answer by assembling the algorithm bit by bit, rather than discovering an algorithm that already is mostly there at initialization time. It’s probably different for different problems. But it’s a useful approximate way of thinking about it. Deep learning is a search through the space of possible formulas, looking for one that solves the problem.

    A New Kind Of Science

    Here I’m talking about the generally-poorly-regarded book that Stephen Wolfram quasi-self-published 20 years ago. One way of interpreting the thesis here is that we could discover the laws of physics by searching through the space of all small computer programs.

    In a sense, this is exactly what AlphaFold is doing! It isn’t using Mathematica, and it isn’t using any of Wolfram’s preferred algorithms like a linear search over cellular automata rules, and it isn’t aiming at sub-quantum-mechanics laws of physics. Okay, that’s a lot of things that are different. But the fundamental idea of discovering laws of science by doing an exhaustive computer search, that part is what AlphaFold is doing.

    The “Real” Laws of Physics

    You might say that AlphaFold isn’t looking for the real laws of physics. It isn’t even pretending to model every electron and use those low-level laws of physics to calculate protein structure. It’s just looking for an approximation that works at the level we want to calculate at.

    But is any science actually looking for the real laws? Or just laws that are close enough for any practical use? Differential calculus is great for physics because it is a great approximation tool. Any function that is “continuous enough” can be approximated locally by a matrix, you can approximate this matrix with calculus, and then you can get pretty good answers. That’s why we like using it to solve problems. We have never observed a true “real number” in nature (because real numbers, rigorously defined, are based on infinite sets).

    We have spent a long time lamenting that we cannot get quantum mechanics and gravity to line up. Well, that doesn’t really matter. What we should be lamenting is that neither quantum mechanics nor gravity provides a useful approximation for these intermediate-size problems like how to construct a useful machine out of 1000 atoms. Tools like AlphaFold seem like they could get us there.

  • Notes On Some Biographies

    It’s spring break and I’ve been holed up in a New England farmhouse, enjoying some “spring” afternoons, as it rains outside, the kids nap, and I get to curl up and read.

    My father-in-law is a history professor and somehow I feel like the house has “ambiance of history”. I feel like reading history books while I’m there.

    Recently I have been enjoying biographies. History books often feel like a hundred different things are happening and my mind gets shuffled up, forgetting precisely which ruler of which French city was angry at the peasants because archery was so effective, and then it’s hard to be sure I have the correct meaning of a subsequent anecdote instead of the backward meaning. With a biography, often the most interesting things you learn are facts about the general time and place, and the individual’s life is just a nice structure to hang a row of facts on.

    These three biographies were particularly good and I recommend them.


    The book: Leonhard Euler: Mathematical Genius in the Enlightenment. Euler lived from 1707 to 1783, so technically this was before the “industrial revolution” although his life shows so many ways that science and engineering were making the world more effective that it makes you wonder whether the industrial revolution was a concise period or more a phenomenon that slowly arose over time.

    When Euler was young, mathematicians were ashamed to call themselves “mathematicians”. The word had a connotation of magic, numerology, and astrology. Instead he preferred to be called a “geometer”.

    To understand Euler you have to understand Newton. Newton figured out the basics of calculus and the basics of the modern laws of physics. But he still did a lot of his work using geometry. Euler basically destroyed geometry. He solved so many practical problems and proved so many mathematical theorems using calculus, there was hardly any role for geometry any more. Nowadays we learn geometry because of its historical role in mathematics, not because any practicing engineer is going to use even the most basic parts of geometry like inscribing a circle in a triangle.

    Mathematics was so low-status at the time. Euler’s father wanted him to do something useful, like become a priest. He won respect not by pure mathematics, but by solving practical engineering problems like deciding what shape of a sail would be the most effective, or determining the longitude from observations of the moon.

    Some people look nowadays and wonder, why is progress in pure mathematics slowing down? Obviously Euler had a huge advantage, studying mathematics when the competition was like, four or five main academies in Europe had a few mathematicians each. But mathematics wasn’t a primary area of scholarly endeavor at the time. That would be something like theology or the study of Greek literature.

    Perhaps a field that gets little respect today will be looked back on as the primary scientific achievement of our time. My bet is on open source software. We are living in the era when Python, NumPy, and Jupyter notebooks are first being invented! One difference is that modern software is often developed by teams, as opposed to research papers in the 1700s.


    The book: Bismarck: A Life.

    Bismarck is more like Steve Jobs than anyone else I’ve read about. Intensely charismatic, except so many people disliked him. Somehow when people talked to him they were charmed and convinced by his unusual ideas. He ended up running a huge organization, micromanaging everything in a way that infuriated many people and exhausted himself, but at the same time achieving successes that were thought impossible.

    Bismarck is so full of contradictions. What did he even want? He was a conservative, believing that a strong king should make all decisions. And yet he clearly didn’t want his king to actually make decisions. It’s a weird organizational setup in that the king could technically fire Bismarck at any time, and yet once Bismarck as chancellor had turned the King of Prussia into the Emperor of Germany, who was going to fire him?

    A monarchist who was the first in Germany to introduce universal male suffrage, he clearly didn’t like Jewish people personally but at the same time he pushed through pretty strong religious freedom laws. He somewhat randomly put together the first social-security-type plan. But really his politics were all over the map. The only thing that it really seemed he consistently believed in was that his employer should gain more and more power.

    In the end, it’s hard to read about Bismarck without thinking of what would happen a few decades later, in the system he built. The Kaiser who finally fired a 75-year-old Bismarck would later get sucked into World War I and lose the German Empire that Bismarck built for him.

    It does make me respect George Washington and Steve Jobs more. If you make yourself the first great leader of an organization, it can be impossible for anyone else to keep it together. It’s easy to take it for granted when John Adams or Tim Cook keeps things going but that doesn’t always happen.

    Speaking of American presidents…


    The book: Abe: Abraham Lincoln in His Times.

    Abraham Lincoln is like Jesus. So many people have written books about him, there is far more commentary written after the fact than there is direct evidence of any truth at the time. And the vast majority of these writers think that the subject is a great, great human being, so every anecdote, every little bit is slanted and has dozens of accolades written about it. It’s just impossible to read something calm and neutral the way you can about, say, Euler, where nobody has any real strong opinions about Euler and nobody has had any for a hundred years.

    So this book ends up asking questions like: was Abraham Lincoln entirely honest throughout his entire life? And weirdly concluding “yes” although in practice the book itself contains many examples of Lincoln doing things like taking on pro bono cases for clients he knew were guilty but were political allies of his, or making statements that are clearly false. I mean, I would categorize them as “normal politician” things. They aren’t terrible. But it seems like there is a faction which considers Lincoln to be more like an angel than like a good politician, enough of a faction that this book can both seem too rosy-tinted to me, and also declare itself to be clearly on the “more negative on Lincoln” part of the Lincoln-biography spectrum.

    One of the striking things about Lincoln is how terrible so many other politicians were. People were just drunk all the time. The governor of Illinois, the vice president, and all sorts of lesser characters make terrible missteps due to being plastered at important events.

    Politics in general seemed even dumber than today. Political cartoons were often important drivers of public opinion, and they were even more simplistic and “fake news” than modern campaign ads.

    Lincoln does seem like a great president. He was more or less a normal politician who did normal politician things, became president due to a mix of luck and corruption, and found himself running the country in a civil war that had already started, which gave him more power than almost any other president. He then used this power to be about as anti-slavery as he could possibly be - both the Emancipation Proclamation and creating black army units were essentially “executive orders” that he could do without Congressional approval.

    I suspect that Sherman is underrated. If Lincoln didn’t win the Civil War, he would have been a bad president rather than arguably the best one. And if Lincoln didn’t get reelected he also wouldn’t have been able to get his full agenda through. Both of these seem in large part due to Sherman’s campaign being so successful ahead of the 1864 election. But this is only a sideline of the book - if I end up reading more about the Civil War then I’m curious to learn more about this aspect of it.


    Read one of these books! I feel like writing all this stuff about books is worth it if even one reader decides to read a book as a result.