• The Cost Of Efficient Markets

    I love these Zvi blog posts where he analyzes prediction markets and states where he thinks they are overrated and underrated. At the same time, it seems like they take a ton of work.

    One way to look at a prediction market is that you have two forces in balance. One crowd thinks a proposition is likely to happen, another crowd thinks the proposition is unlikely to happen, and the price mediates between these two groups. In this view, the role of money is to stabilize the system, and you get useful information as a side effect.

    Another way to look at a prediction market is across all predictions, who is making a lot of money, and who is spending a lot of money? The ideal prediction market has very intelligent predictions, with people working hard on intelligent analysis. Are these people making money, to get paid for their work? Maybe they don’t have to be, like on Metaculus; maybe they will just play for points.

    For a predictor to make money on prediction markets, someone else has to be losing money. Who is willing to consistently lose money on prediction markets? You might get addicted to prediction markets, mistakenly overrate yourself, and keep doing it when you shouldn’t, like a gambling addiction. But it seems tough to get a large user base addicted to prediction markets when there are so many more optimized types of gambling out there. And friendly regulation for prediction markets is essentially predicated on it not “feeling like gambling”.

    Stock markets avoid this problem. The intelligent analysts do get paid, but they are essentially taking a cut from the huge flows of money that go into stocks without deep analysis. If every index fund investor makes an extra 0.1% lower interest because they haven’t optimized their spend, that funds a whole lot of analysts to do research on optimal price levels.

    It seems hard for prediction markets to ever be worth it for the money. I think they will have to be driven by people who are simply in it for the “love of the game”. In fact, writing this blog post makes me want to sign up for Metaculus, so that I can bet against some of the AI wagers that seem overrated….

  • AlphaFold: A New Kind of Science

    Last night I was watching an interesting video on AlphaFold, which is basically using modern deep learning techniques to solve a longstanding problem in molecular biology, determining how a protein folds based on its sequence.

    One part that is interesting is that they develop an intermediate structure which looks very “vectory” - a matrix of how close each part of the molecule is to each other part. In some sense this is redundant; you are storing a position for each pair instead of a single position per component. But it means if you screw up one part of the output you don’t automatically screw up all the other parts. It is more “local” in a sense.

    The other part that is interesting to me is that it is an approach towards solving the “middle size of problem” that I mentioned in my previous post on the “theory of everything”. How should we interpret this?

    The Lottery Ticket Hypothesis

    One way of understanding modern machine learning is the Lottery Ticket Hypothesis. Roughly, the lottery ticket hypothesis says that for problems that deep learning works well on, there exists an algorithm that solves the problem, and you know how to make a large deep network that is shaped in a way that is similar to this solution, so probably there is some subset of the randomly initialized deep network that happens to be close to the solution. The process of training a deep learner can then be thought of as searching for this subnetwork that happens to contain the answer.

    The lottery ticket hypothesis is not necessarily true. For example, the training process might get closer and closer to a decent answer by assembling the algorithm bit by bit, rather than discovering an algorithm that already is mostly there at initialization time. It’s probably different for different problems. But it’s a useful approximate way of thinking about it. Deep learning is a search through the space of possible formulas, looking for one that solves the problem.

    A New Kind Of Science

    Here I’m talking about the generally-poorly-regarded book that Stephen Wolfram quasi-self-published 20 years ago. One way of interpreting the thesis here is that we could discover the laws of physics by searching through the space of all small computer programs.

    In a sense, this is exactly what AlphaFold is doing! It isn’t using Mathematica, and it isn’t using any of Wolfram’s preferred algorithms like a linear search over cellular automata rules, and it isn’t aiming at sub-quantum-mechanics laws of physics. Okay, that’s a lot of things that are different. But the fundamental idea of discovering laws of science by doing an exhaustive computer search, that part is what AlphaFold is doing.

    The “Real” Laws of Physics

    You might say that AlphaFold isn’t looking for the real laws of physics. It isn’t even pretending to model every electron and use those low-level laws of physics to calculate protein structure. It’s just looking for an approximation that works at the level we want to calculate at.

    But is any science actually looking for the real laws? Or just laws that are close enough for any practical use? Differential calculus is great for physics because it is a great approximation tool. Any function that is “continuous enough” can be approximated locally by a matrix, you can approximate this matrix with calculus, and then you can get pretty good answers. That’s why we like using it to solve problems. We have never observed a true “real number” in nature (because real numbers, rigorously defined, are based on infinite sets).

    We have spent a long time lamenting that we cannot get quantum mechanics and gravity to line up. Well, that doesn’t really matter. What we should be lamenting is that neither quantum mechanics nor gravity provides a useful approximation for these intermediate-size problems like how to construct a useful machine out of 1000 atoms. Tools like AlphaFold seem like they could get us there.

  • Notes On Some Biographies

    It’s spring break and I’ve been holed up in a New England farmhouse, enjoying some “spring” afternoons, as it rains outside, the kids nap, and I get to curl up and read.

    My father-in-law is a history professor and somehow I feel like the house has “ambiance of history”. I feel like reading history books while I’m there.

    Recently I have been enjoying biographies. History books often feel like a hundred different things are happening and my mind gets shuffled up, forgetting precisely which ruler of which French city was angry at the peasants because archery was so effective, and then it’s hard to be sure I have the correct meaning of a subsequent anecdote instead of the backward meaning. With a biography, often the most interesting things you learn are facts about the general time and place, and the individual’s life is just a nice structure to hang a row of facts on.

    These three biographies were particularly good and I recommend them.

    Euler

    The book: Leonhard Euler: Mathematical Genius in the Enlightenment. Euler lived from 1707 to 1783, so technically this was before the “industrial revolution” although his life shows so many ways that science and engineering were making the world more effective that it makes you wonder whether the industrial revolution was a concise period or more a phenomenon that slowly arose over time.

    When Euler was young, mathematicians were ashamed to call themselves “mathematicians”. The word had a connotation of magic, numerology, and astrology. Instead he preferred to be called a “geometer”.

    To understand Euler you have to understand Newton. Newton figured out the basics of calculus and the basics of the modern laws of physics. But he still did a lot of his work using geometry. Euler basically destroyed geometry. He solved so many practical problems and proved so many mathematical theorems using calculus, there was hardly any role for geometry any more. Nowadays we learn geometry because of its historical role in mathematics, not because any practicing engineer is going to use even the most basic parts of geometry like inscribing a circle in a triangle.

    Mathematics was so low-status at the time. Euler’s father wanted him to do something useful, like become a priest. He won respect not by pure mathematics, but by solving practical engineering problems like deciding what shape of a sail would be the most effective, or determining the longitude from observations of the moon.

    Some people look nowadays and wonder, why is progress in pure mathematics slowing down? Obviously Euler had a huge advantage, studying mathematics when the competition was like, four or five main academies in Europe had a few mathematicians each. But mathematics wasn’t a primary area of scholarly endeavor at the time. That would be something like theology or the study of Greek literature.

    Perhaps a field that gets little respect today will be looked back on as the primary scientific achievement of our time. My bet is on open source software. We are living in the era when Python, NumPy, and Jupyter notebooks are first being invented! One difference is that modern software is often developed by teams, as opposed to research papers in the 1700s.

    Bismarck

    The book: Bismarck: A Life.

    Bismarck is more like Steve Jobs than anyone else I’ve read about. Intensely charismatic, except so many people disliked him. Somehow when people talked to him they were charmed and convinced by his unusual ideas. He ended up running a huge organization, micromanaging everything in a way that infuriated many people and exhausted himself, but at the same time achieving successes that were thought impossible.

    Bismarck is so full of contradictions. What did he even want? He was a conservative, believing that a strong king should make all decisions. And yet he clearly didn’t want his king to actually make decisions. It’s a weird organizational setup in that the king could technically fire Bismarck at any time, and yet once Bismarck as chancellor had turned the King of Prussia into the Emperor of Germany, who was going to fire him?

    A monarchist who was the first in Germany to introduce universal male suffrage, he clearly didn’t like Jewish people personally but at the same time he pushed through pretty strong religious freedom laws. He somewhat randomly put together the first social-security-type plan. But really his politics were all over the map. The only thing that it really seemed he consistently believed in was that his employer should gain more and more power.

    In the end, it’s hard to read about Bismarck without thinking of what would happen a few decades later, in the system he built. The Kaiser who finally fired a 75-year-old Bismarck would later get sucked into World War I and lose the German Empire that Bismarck built for him.

    It does make me respect George Washington and Steve Jobs more. If you make yourself the first great leader of an organization, it can be impossible for anyone else to keep it together. It’s easy to take it for granted when John Adams or Tim Cook keeps things going but that doesn’t always happen.

    Speaking of American presidents…

    Lincoln

    The book: Abe: Abraham Lincoln in His Times.

    Abraham Lincoln is like Jesus. So many people have written books about him, there is far more commentary written after the fact than there is direct evidence of any truth at the time. And the vast majority of these writers think that the subject is a great, great human being, so every anecdote, every little bit is slanted and has dozens of accolades written about it. It’s just impossible to read something calm and neutral the way you can about, say, Euler, where nobody has any real strong opinions about Euler and nobody has had any for a hundred years.

    So this book ends up asking questions like: was Abraham Lincoln entirely honest throughout his entire life? And weirdly concluding “yes” although in practice the book itself contains many examples of Lincoln doing things like taking on pro bono cases for clients he knew were guilty but were political allies of his, or making statements that are clearly false. I mean, I would categorize them as “normal politician” things. They aren’t terrible. But it seems like there is a faction which considers Lincoln to be more like an angel than like a good politician, enough of a faction that this book can both seem too rosy-tinted to me, and also declare itself to be clearly on the “more negative on Lincoln” part of the Lincoln-biography spectrum.

    One of the striking things about Lincoln is how terrible so many other politicians were. People were just drunk all the time. The governor of Illinois, the vice president, and all sorts of lesser characters make terrible missteps due to being plastered at important events.

    Politics in general seemed even dumber than today. Political cartoons were often important drivers of public opinion, and they were even more simplistic and “fake news” than modern campaign ads.

    Lincoln does seem like a great president. He was more or less a normal politician who did normal politician things, became president due to a mix of luck and corruption, and found himself running the country in a civil war that had already started, which gave him more power than almost any other president. He then used this power to be about as anti-slavery as he could possibly be - both the Emancipation Proclamation and creating black army units were essentially “executive orders” that he could do without Congressional approval.

    I suspect that Sherman is underrated. If Lincoln didn’t win the Civil War, he would have been a bad president rather than arguably the best one. And if Lincoln didn’t get reelected he also wouldn’t have been able to get his full agenda through. Both of these seem in large part due to Sherman’s campaign being so successful ahead of the 1864 election. But this is only a sideline of the book - if I end up reading more about the Civil War then I’m curious to learn more about this aspect of it.

    Conclusion

    Read one of these books! I feel like writing all this stuff about books is worth it if even one reader decides to read a book as a result.

  • Metamagical Themas, 40 Years Later

    It is completely unfair to criticize writing about technology and artificial intelligence 40 years after it’s written. Metamagical Themas is a really interesting book, a really good, intelligent, and prescient book, and I want to encourage you to read it. So don’t take any of this as “criticism” of the author. Of course, after 40 years have passed, we have an unfair advantage, seeing which predictions panned out, and what developments were easy to overlook in an era before the internet.

    Instead of a “book”, perhaps I should call it a really interesting collection of essays written by Douglas Hofstadter for Scientific American in the early 1980’s. Each essay is followed by Hofstadter’s commentary written a year or two after. So you get some after-the-fact summation, but the whole thing was all written in the same era. This structure makes it really easy to read - you can pick it up and read a bit and it’s self-contained. Which is yet another reason you should pick this book up and give it a read.

    There are all sorts of topics so I will just discuss them in random order.

    Self-Referential Sentences

    This is just really fun. I feel like cheating, quoting these, because I really just want you the reader to enjoy these sentences.

    It is difficult to translate this sentence into French.

    In order to make sense of “this sentence”, you will have to ignore the quotes in “it”.

    Let us make a new convention: that anything enclosed in triple quotes - for example, ‘'’No, I have decided to change my mind; when the triple quotes close, just skip directly to the period and ignore everything up to it’’’ - is not even to obe read (much less paid attention to or obeyed).

    This inert sentence is my body, but my soul is alive, dancing in the sparks of your brain.

    Reading this is like eating sushi. It’s just one bite but it’s delicious and it’s a unique flavor so you want to pay attention to it. You want to pause, cleanse your mind, have a bit of a palate refresher in between them.

    The reader of this sentence exists only while reading me.

    Thit sentence is not self-referential because “thit” is not a word.

    Fonts

    Hofstadter loves fonts. Fluid Concepts and Creative Analogies goes far deeper into his work on fonts (and oddly enough was the first book purchased on Amazon) but there are some neat shorter explorations in this essay collection.

    To me, reading this is fascinating because of how much deep learning has achieved in recent years. My instincts nowadays are often to be cynical of deep learning progress, thinking thoughts like “well is this really great progress if it isn’t turning into successful products”. But comparing what we have now to what people were thinking in the past makes it clear how far we have come.

    The fundamental topic under discussion is how to have computers generate different fonts, and to understand the general concept that an “A” in one font looks different from an “A” in another font.

    Hofstadter is really prescient when he writes in these essays that he thinks the task of recognizing letters is a critical one in AI. Performance on letter recognition was one of the first tasks that modern neural networks did well, that proved they were the right way to go. And that’s several generations of AI research away from when Hofstadter was writing!

    Roughly, in the 80’s there was a lot of “Lisp-type” AI research, where many researchers thought you could decompose intelligence into symbolic logic or something like it, and tried to attack various problems that way. The initial “AI winter” was when that approach stopped getting a lot of funding. Then in the 90’s and 2000’s, statistical approaches like support vector machines or logistic regression seemed to dominate AI. The modern era of deep learning started around 2012 when AlexNet had its breakthrough performance on image recognition. Hofstadter is writing during the early 80’s, in the first era of AI, before either of the two big AI paradigm shifts.

    That said, it’s interesting to see what Hofstadter misses in his predictions. Here he’s criticizing a program by Donald Knuth that took 28 parameters and outputted a font:

    The worst is yet to come! Knuth’s throwaway sentence unspokenly implies that we should be able to interpolate any fraction of the way between any two arbitrary typefaces. For this to be possible, any pair of typefaces would have to share a single, grand, universal all-inclusive, ultimate set of knobs. And since all pairs of typefaces have the same set of knobs, transitivity implies that all typefaces would have to share a single, grand, universal, all-inclusive, ultimate set of knobs…. Now how can you possibly incorporate all of the previously shown typefaces into one universal schema?

    Well, nowadays we know how to do it. There are plenty of neural network architectures that can both classify items into a category, and generate items from that category. So you train a neural network on fonts, and to interpolate between fonts, you grab the weights defined by each font and interpolate between them. Essentially “style transfer”.

    Of course, it would be impossible to do this with a human understanding what each knob meant. 28 knobs isn’t anywhere near enough. But that’s fine. If we have enough training data, we can fit millions of parameters or billions of parameters.

    It’s really hard to foresee what can change qualitatively when your quantitative ability goes from 30 to a billion.

    By the way, if you like Hofstadter’s discussions of fonts and you live in the San Francisco area, you would like the Tauba Auerbach exhibit at SFMOMA.

    Chaos Theory

    Hofstadter writes about chaos theory and fractals, and it’s interesting to me how chaos theory has largely faded out over the subsequent decades.

    The idea behind chaos theory was that many practical problems, like modeling turbulence or predicting the weather, don’t obey the same mathematics that linear systems do. So we should learn the mathematics behind chaotic systems, and then apply that mathematics to these physical cases.

    For example, strange attractors. They certainly look really cool.

    Chaos theory seemed popular through the 90’s - I got some popular science book on it, it was mentioned in Jurassic Park - but it doesn’t seem like it has led to many useful discoveries. I feel like the problem with chaos is that it fundamentally does not have laws that are useful for predicting the future.

    Meanwhile, we are actually far better nowadays at predicting the weather and modeling turbulent airflow! The solution was not really chaos theory, though. As far as I can tell the solution to these thorny modeling problems was to get a lot more data. Weather seems really chaotic when your data set is “what was the high temperature in Chicago each day last year”. If you have the temperature of every square mile measured every 15 minutes, a piecewise linear model is a much better fit.

    I think numeric linear algebra ended up being more useful. Yeah, when you predict the outcome of a system, often you get the “butterfly effect” and a small error in your input can lead to a large error in your output. But, you can measure these errors, and reduce them. Take the norm of the Jacobian of your model at a point, try to find a model where that’s small. Use numerical techniques that don’t blow up computational error. And get it all running efficiently on fast computers using as much data as possible.

    There’s a similar thing happening here, where the qualitative nature of a field changed once the tools changed quantitatively by several orders of magnitude.

    AI in 1983

    These essays make me wonder. What should AI researchers have done in 1983? What would you do with a time machine?

    It’s hard to say they should have researched neural networks. With 6 MHz computers it seems like it would have been impossible to get anything to work.

    AI researchers did have some really great ideas back then. In particular, Lisp. Hofstadter writes briefly about why Lisp is so great, and the funny thing is, he hardly mentions macros at all, which nowadays I think of as the essence of Lisp. He talks about things like having an interpreter, being able to create your own data structures, having access to a “list” data structure built into the language, all things that nowadays we take for granted. But this was written before any of Java, Python, Perl, C#, JavaScript, really the bulk of the modern popular languages. There was a lot of great programming language design still to be done.

    But for AI, it just wasn’t the right time. I wonder if we will look back on the modern era in a similar way. It might be that modern deep learning takes us a certain distance and “hits a wall”. As long as GPUs keep getting better, I think we’ll keep making progress, but that’s just a guess. We’ll see!

  • A Theory Of Everything

    Physicists have a concept of a hypothetical “theory of everything”. There’s a Wikipedia page on it. Basically, general relativity describes how gravity works, and it is relevant for very large objects, the size of planets or stars. Quantum mechanics describes how nuclear and electromagnetic forces work, and it is relevant for smaller objects, the size of human beings or protons. And these two models don’t agree with each other. We know there is some error in our model of physics because we don’t have any way of smoothly transitioning from quantum mechanics to gravity. A “theory of everything” would be a theory that combines these two, one set of formulas that lets you derive quantum mechanics in the limit as size goes down, and lets you derive general relativity in the limit as size goes up.

    It would be cool to have such a theory. But personally, I feel like this is a really narrow interpretation of “everything”.

    Consider quantum mechanics. You can use our laws of quantum mechanics to get a pretty precise description of a single helium atom suspended in an empty void. In theory, the same laws apply to any system of the same size. But when you try to analyze a slightly more complicated system - say, five carbon atoms - the formulas quickly become intractable, either to solve exactly or approximately.

    The point of laws of physics is to be able to model real-world systems. Different forces are relevant to different systems, so it makes sense to think of our physical theories along the dimension of “what forces do they have a formula for”. And in that dimension there is basically one hole, the gap between quantum mechanics and general relativity.

    But there is a different dimension, of “how many objects are in the system”. In this dimension, we have an even larger gap. We have good laws of physics that let us analyze a small number of objects. Quantum mechanics lets us analyze a small number of basic particles, and classical mechanics lets us analyze a small number of rigid bodies. We also have pretty good laws of physics that let us analyze a very large number of identical objects. Fluid mechanics lets us analyze gases and liquids, and we can also analyze things like radio waves which are in some sense a large number of similar photons.

    But in the middle, there are a lot of systems that we don’t have great mathematical laws for. Modeling things between the size of DNA or a human finger. Maybe you essentially have to be running large-scale computer simulations or other sort of numerical methods here, rather than finding a simple mathematical formula. But that’s okay, we have powerful computer systems, we can be happily using them. Perhaps, rather than expressing the most important laws of physics as brief mathematical equations, we could be expressing them as complicated but well-tested simulation software.

    To me, a real “theory of everything” would be the code to a computer program where you could give it whatever data about the physical world you had. A video from an iPhone, a satellite photo, an MRI, the readings from a thermometer. The program creates a model, and answers any question you have about the physical system.

    Of course we aren’t anywhere near achieving that. But that seems appropriate for a “theory of everything”. “Everything” is just a lot of things.


...