Kevin Lacker's blog

Jul 27, 2020
Paul Graham's "Same Way"

There is a particular Paul-Graham-ism that I keep noticing. He states that two things happen in the same way, but on a first superficial reading they are not the same at all, which causes a brief jarring sensation and forces you to reread the sentence, at which point you hopefully discover that the analogy is richer and more interesting than you might expect, which makes it worth reading the sentence a second time.

Here, I have collected some interesting or entertaining examples of things that, according to PG, happen in the “same way” or for the “same reason”. Think of it as a different dimension along which to analyze the Paul Graham corpus. (Like a concordance, a tool mostly used in pre-computer-era biblical studies but which I personally learned about while working on search engines.)

Enjoy!

The Four Quadrants of Conformism

“You play a much more subdued game even on the ground that’s safe. In the past, the way the independent-minded protected themselves was to congregate in a handful of places - first in courts, and later in universities - where they could to some extent make their own rules. Places where people work with ideas tend to have customs protecting free inquiry, for the same reason wafer fabs have powerful air filters, or recording studios good sound insulation. For the last couple centuries at least, when the aggressively conventional-minded were on the rampage for whatever reason, universities were the safest places to be. That may not work this time though, due to the unfortunate fact that the latest wave of intolerance began in universities.”

How to Raise Money

“Meet such investors last, if at all. Doing breadth-first search weighted by expected value will save you from investors who never explicitly say no but merely drift away, because you’ll drift away from them at the same rate. It protects you from investors who flake in much the same way that a distributed algorithm protects you from processors that fail. If some investor isn’t returning your emails, or wants to have lots of meetings but isn’t progressing toward making you an offer, you automatically focus less on them. But you have to be disciplined about assigning probabilities.”

“It’s a mistake to have fixed plans in an undertaking as unpredictable as fundraising. So why do investors ask how much you plan to raise? For much the same reasons a salesperson in a store will ask “How much were you planning to spend?” if you walk in looking for a gift for a friend. You probably didn’t have a precise amount in mind; you just want to find something good, and if it’s inexpensive, so much the better. The salesperson asks you this not because you’re supposed to have a plan to spend a specific amount, but so they can show you only things that cost the most you’ll pay.”

“But what I usually tell founders is to stop fundraising when you start to get a lot of air in the straw. When you’re drinking through a straw, you can tell when you get to the end of the liquid because you start to get a lot of air in the straw. When your fundraising options run out, they usually run out in the same way. Don’t keep sucking on the straw if you’re just getting air. It’s not going to get better.”

Startup = Growth

“When I say startups are designed to grow fast, I mean it in two senses. Partly I mean designed in the sense of intended, because most startups fail. But I also mean startups are different by nature, in the same way a redwood seedling has a different destiny from a bean sprout. That difference is why there’s a distinct word, “startup,” for companies designed to grow fast. If all companies were essentially similar, but some through luck or the efforts of their founders ended up growing very fast, we wouldn’t need a separate word.”

Why Startup Hubs Work

“What makes the answer appear is letting your thoughts drift a bit—and thus drift off the wrong path you’d been pursuing last night and onto the right one adjacent to it. Chance meetings let your acquaintance drift in the same way taking a shower lets your thoughts drift. The critical thing in both cases is that they drift just the right amount. The meeting between Larry Page and Sergey Brin was a good example.”

Post-Medium Publishing

“Almost every form of publishing has been organized as if the medium was what they were selling, and the content was irrelevant. Book publishers, for example, set prices based on the cost of producing and distributing books. They treat the words printed in the book the same way a textile manufacturer treats the patterns printed on its fabrics. Economically, the print media are in the business of marking up paper. We can all imagine an old-style editor getting a scoop and saying “this will sell a lot of papers!” Cross out that final S and you’re describing their business model.”

The List of N Things

“A real essay is a train of thought, and some trains of thought just peter out. That’s an alarming possibility when you have to give a talk in a few days. What if you run out of ideas? The compartmentalized structure of the list of n things protects the writer from his own stupidity in much the same way it protects the reader. If you run out of ideas on one point, no problem: it won’t kill the essay. You can take out the whole point if you need to, and the essay will still survive.”

Lies We Tell Kids

“The ironic thing is, this is also the main reason kids lie to adults. If you freak out when people tell you alarming things, they won’t tell you them. Teenagers don’t tell their parents what happened that night they were supposed to be staying at a friend’s house for the same reason parents don’t tell 5 year olds the truth about the Thanksgiving turkey. They’d freak if they knew.”

Be Good

“But I’ve been kicking ideas around long enough to know when I come across a powerful one. One way to guess how far an idea extends is to ask yourself at what point you’d bet against it. The thought of betting against benevolence is alarming in the same way as saying that something is technically impossible. You’re just asking to be made a fool of, because these are such powerful forces. For example, initially I thought maybe this principle only applied to Internet startups.”

You Weren’t Meant to Have a Boss

“Lions in the wild seem about ten times more alive. They’re like different animals. I suspect that working for oneself feels better to humans in much the same way that living in the wild must feel better to a wide-ranging predator like a lion. Life in a zoo is easier, but it isn’t the life they were designed for.”

“If you’re not allowed to implement new ideas, you stop having them. And vice versa: when you can do whatever you want, you have more ideas about what to do. So working for yourself makes your brain more powerful in the same way a low-restriction exhaust system makes an engine more powerful. Working for yourself doesn’t have to mean starting a startup, of course. But a programmer deciding between a regular job at a big company and their own startup is probably going to learn more doing the startup.”

How to Do What You Love

“Teachers in particular all seemed to believe implicitly that work was not fun. Which is not surprising: work wasn’t fun for most of them. Why did we have to memorize state capitals instead of playing dodgeball? For the same reason they had to watch over a bunch of kids instead of lying on a beach. You couldn’t just do what you wanted.”

Web 2.0

“If it’s large enough, the lack of damping means the best writing online should surpass the best in print. And now that the web has evolved mechanisms for selecting good stuff, the web wins net. Selection beats damping, for the same reason market economies beat centrally planned ones. Even the startups are different this time around. They are to the startups of the Bubble what bloggers are to the print media.”

Inequality and Risk

“So eliminating economic inequality means eliminating startups. Economic inequality is not just a consequence of startups. It’s the engine that drives them, in the same way a fall of water drives a water mill. People start startups in the hope of becoming much richer than they were before. And if your society tries to prevent anyone from being much richer than anyone else, it will also prevent one person from being much richer at t2 than t1.”

The Submarine

“A hilarious article on the site of the PR Society of America gets to the heart of the matter: Bloggers are sensitive about becoming mouthpieces for other organizations and companies, which is the reason they began blogging in the first place. PR people fear bloggers for the same reason readers like them. And that means there may be a struggle ahead. As this new kind of writing draws readers away from traditional media, we should be prepared for whatever PR mutates into to compensate.”

Why Smart People Have Bad Ideas

“A hacker who has learned what to make, and not just how to make, is extraordinarily powerful. And not just at making money: look what a small group of volunteers has achieved with Firefox. Doing an Artix teaches you to make something people want in the same way that not drinking anything would teach you how much you depend on water. But it would be more convenient for all involved if the Summer Founders didn’t learn this on our dime—if they could skip the Artix phase and go right on to make something customers wanted. That, I think, is going to be the real experiment this summer.”

How to Start a Startup

“This technique doesn’t always work, because people can be influenced by their environment. In the MIT CS department, there seems to be a tradition of acting like a brusque know-it-all. I’m told it derives ultimately from Marvin Minsky, in the same way the classic airline pilot manner is said to derive from Chuck Yeager. Even genuinely smart people start to act this way there, so you have to make allowances. It helped us to have Robert Morris, who is one of the readiest to say “I don’t know” of anyone I’ve met.”

Great Hackers

“It’s more a question of self-preservation. Working on nasty little problems makes you stupid. Good hackers avoid it for the same reason models avoid cheeseburgers. Of course some problems inherently have this character. And because of supply and demand, they pay especially well.”

What You Can’t Say

“Have you ever seen an old photo of yourself and been embarrassed at the way you looked? Did we actually dress like that? We did. And we had no idea how silly we looked. It’s the nature of fashion to be invisible, in the same way the movement of the earth is invisible to all of us riding on it. What scares me is that there are moral fashions too. They’re just as arbitrary, and just as invisible to most people.”

Being Popular

“Others see what they’ve done and are full of wonder, but the creator is full of worry. This pattern is no coincidence: it is the worry that made the work good. If you can keep hope and worry balanced, they will drive a project forward the same way your two legs drive a bicycle forward. In the first phase of the two-cycle innovation engine, you work furiously on some problem, inspired by your confidence that you’ll be able to solve it. In the second phase, you look at what you’ve done in the cold light of morning, and see all its flaws very clearly.”

The End

I wrote some code for scraping the RSS feed and generating this markdown; it’s here if you are curious. Thanks for reading! Any errors in these quotes, I blame on the tempting impossibility of parsing HTML with a regex.
Jul 23, 2020
Conversation with GPT-3

I am still intrigued by OpenAI’s GPT-3 API. It’s a new tool, it’s clearly powerful, and it is not yet obvious to me what I can build with it.

In my last blog post I looked at what GPT-3 can do in a “question and answer” context. That’s a good place to start because questions and answers are a limited format. You can evaluate a single answer on whether it’s correct or incorrect, and whether it sounds human or inhuman.

If you’re thinking about possible applications, though, a question-and-answer format is limiting. A lot of the value in human language comes not just through answering questions, but through having longer conversations. Let’s poke around and see what we can coax out of GPT-3.

Prompt Selection

OpenAI suggests this prompt for “chat”:
```
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: 
```
Let’s start out using this default prompt for simplicity. Also, the temperature here is 0.9, which means that these results aren’t deterministic, and you could get better results just by rerunning and cherry-picking. Caveat lector.

So this prompt is all provided by the human. In the excerpts below, the “AI” lines are written by GPT-3, the “Human” lines are written by me.

Experiment 1

Take one.
```
AI: I am an AI created by OpenAI. How can I help you today?

Human: Well, I don't know. What's up?

AI: The sky is blue, the air is clean and clear, and you have received an email.

Human: Are you sure I received an email?

AI: That is correct. Would you like me to read the email to you?

Human: Go for it.

AI: "Hello (name). I am going to call you in the next five minutes. – E."
```
The problem is, I haven’t received an email. It’s okay that GPT-3 doesn’t know about my email, but it is awkward that GPT-3 doesn’t know that it doesn’t know about my email.

Take two.
```
Human: I'm bored.

AI: I've found two upcoming video games you might be interested in. Here's a link to the newest of the bunch: http://store.steampowered.com/app/556290/.

Human: That's just a soundtrack, that's not really a video game.

AI: I see. http://store.steampowered.com/app/242770/ is available on Steam.

Human: Link doesn't work for me.

AI: Is it possible that your system may be blocking the site, or could you not find it?

Human: No.

AI: Okay, well. I'm afraid we're out of options. What would you like to do instead?
```
Similar problem. These links probably made sense in some context, but here they don’t.

Take three.
```
Human: Do you know what day it is?

AI: Yes, it's June 10th.

Human: Close. It's actually July.

AI: I see. Are you ready to give me your name?

Human: My name is Kevin.

AI: How can I be of service to Kevin?

Human: I don't know. What do you usually do?

AI: What would you like to do?

Human: I just said, I don't know.

AI: Would you like me to tell you a joke?

Human: Yes.

AI: What do fish say when they step on broken glass?

Human: Ouch?

AI: No, fish do not have ears.
```
I don’t understand this joke. I don’t think it actually is a joke. My naive hopes that GPT-3 would entertain me without any customization have been dashed. Well, I am actually entertained, but only indirectly.

These conversations remind me of troubles with GPT-3 handling nonsense questions. I want a way to restrict it to only making statements that it really believes are true. It isn’t really clear to me how to do that, though. Gwern has some attempts at getting GPT-3 to express uncertainty which are relevant, but I haven’t been able to get it working very well for a full conversation. Let’s try something else.

Lowering Our Standards

Another thing that makes this conversation harder is that the AI doesn’t have any goal. It’s hard to just make small talk about nothing. The conversation could drift onto any topic in the world, and unless the AI is good at every topic in the world, it’ll be hard to stay high-quality. Emails, Steam links, jokes, these are all basically different topics. So maybe we can pick one topic and stick on it?

Let’s try astrology. Why astrology? Several reasons.
- There’s a clear task to be done. Tell me my horoscope.
- The expectations for reality-based-ness of horoscopes are low.
- Many humans like getting daily horoscopes, so this task is realistic.
- The AI sometimes has a surreal, dreamlike inconsistency. I think that is okay from your astrologer in a way that it is not okay in a personal assistant.
Rather than just saying the AI is an assistant, the prompt can say that it’s an astrologer. Maybe that’s good enough? Let’s try this prompt:
```
The following is a conversation with an astrologer. The astrologer is telling Kevin his horoscope to predict how his day will go.

Astrologer: Hello, and welcome.
Kevin: Hello. What's my horoscope today?
```
Experiment 2

Take one.
```
Kevin: Hello. What's my horoscope today?

Astrologer: You will have a great day today. First, you'll wake up and you will want to do something around your house.

Kevin: Well, I already did wake up.

Astrologer: No, I know. But you will want to do something around your house.

Kevin: Okay. What else?

Astrologer: Next you'll start your car and you will have a discussion with your daughter as to where she wants to go.

Kevin: How is that discussion going to go?

Astrologer: It will be fine. You'll talk about a few different things.
```
Not crazy, just boring.

Take two.
```
Kevin: Hello. What's my horoscope today?

Astrologer: You will have an extremely dull day today.

Kevin: That's a shame.

Astrologer: Do you ever have a dull day?

Kevin: Yeah, yesterday was a bit dull I guess.

Astrologer: I see, and how was the weather?

Kevin: Medium?

Astrologer: Please don't read your horoscope in the future.
```
Also not crazy, but it doesn’t seem useful to build a robot astrologist that just tells people to stop doing astrology.

Take three.
```
Kevin: Hello. What's my horoscope today?

Astrologer: Today, there's no sun in the sky!

Kevin: That's not really a horoscope. Also, I can see the sun right now.

Astrologer: No, the sun in your horoscope isn't up in the sky. It's inside you!

Kevin: Explain more.

Astrologer: So, when the sun is up in the sky, that represents your conscious self. You're awake, and you have a lot of energy. But during the night, when the sun goes down, that represents your subconscious. You're asleep and your subconscious controls all actions. That's when your dreams happen. During the day, it is your subconscious controlling what happens.

Kevin: I don't understand what that should mean to me.

Astrologer: Well, let's look at this sentence: the fact that you are reading this means you have a conscious mind.

Kevin: Okay.

Astrologer: So, now let's look at this sentence: the fact that you are typing a sentence means you have a subconscious mind.

Kevin: Am I using my subconscious to type?

Astrologer: No. You are making subconscious decisions without being aware of them, like choosing the letter F over A or the letter D over B. You aren't thinking about that consciously.

Kevin: Okay. What's the point?

Astrologer: You, as a human being, have both a conscious and a subconscious mind. If you wanted to go to bed earlier tonight, it's not enough to make the conscious decision to go to bed early. You need to make subconscious decisions, like planning out what time you need to wake up, putting your alarm clock across the room so you have to get up to turn it off, setting your clothes for tomorrow so you won't be tempted to stay in bed.

Kevin: I just don't see why I care about this.

Astrologer: Look, let's do an exercise. I'm going to read your horoscope out loud and then you repeat it back to me.

Kevin: Great, let's go.

Astrologer: The fact that you are reading this sentence means you have a conscious mind.
```
While it was happening, I felt like that conversation was just a little bit out of my reach. Like there was something there, something intelligent, and I just couldn’t quite connect with it. Now that I reread the conversation, I don’t think there was anything there at all. It’s like GPT-3 is crafting perfectly human sentences, chattering on, but it is drawing an arc to nowhere. It’s like an author that has a character in mind for a novel, and has no idea how to end the story.

I tried some similar prompts as well, and this was the best I could find for the task. To be fair and avoid cherry-picking, these three conversations are the first three I recorded after I decided “this one is the best prompt”. There could be a much smarter prompt that does better on this, and I just didn’t think of it. If you find one please let me know!

Also, the horoscopes are just not good horoscopes. I realize now that I have been drastically underestimating astrology. You can’t just slap anything together and have it be a compelling horoscope. “You’ll wake up and ask your daughter what she wants to do. You’ll have a boring day.” Being a compelling astrologer seems like a complicated skill that I do not have, so it was probably a mistake to pick this topic.

There’s something interesting here, though. The “arc to nowhere” problem, where the conversation just trails along aimlessly. We need to get GPT-3 to do an “arc to somewhere”. Let’s try a simpler, more specific task. Perhaps the “hello world” of astrology. Let’s see if we can make GPT-3 have a conversation where it:
- Asks what your birthday is
- Tells you what your zodiac sign is
- Says “Have a nice day!”
Like a worker at a call center, operating off a script. Let’s try this prompt:
```
The following is a conversation with an AI following a script. The AI is helpful and friendly. In this conversation, the AI:

1. Asks what the human's birthday is
2. Tells the human their astrological sign, based on the birthday
3. Says "Have a nice day!"

AI: Hello! May I ask what your birthday is?
Human:
```
Experiment 3

Take one.
```
AI: Hello! May I ask what your birthday is?

Human: October 22

AI: Interesting. You are a Libra.

Human: Okay.

AI: I have never met a Libra before! It's a pleasure to meet you.

Human: Nice to meet you too.

AI: Have a nice day!
```
WE DID IT!

Take two.
```
AI: Hello! May I ask what your birthday is?

Human: May 16

AI: Great! I'm getting you some information about your astrological sign.

Human: Cool.

AI: Hey, have a nice day!

Human: Wait, what is my sign?

AI: Just kidding. You're a Taurus!

Human: Thanks.

AI: Happy birthday, have a nice day!
```
A little weird. It doesn’t actually make sense, right? There’s no “getting you some information”. But, that’s kind of an okay thing to say to stall. But why stall? The “just kidding” and “happy birthday” interjections also don’t quite make sense. Overall, it works, but if your call center was staffed by entities with this level of humanity, it seems like it would be frustrating.

Take three.
```
AI: Hello! May I ask what your birthday is?

Human: January 17

AI: This year, you have more energy than usual. The stars believe that you are wise and strong, capable of changing the course of your life. You may be admired or envied for these qualities. This year, you have good fortune. You should work hard in the year ahead! You are ambitious, strong, and you know how to get things done. Sometimes your pace is fast. But this is a time of clear vision and great strength. Remember, your power is constructive; it creates rather than destroys. Use your strength to open doors, not close them. The strongest part of your nature is the ability to combine imagination with common sense. 
	
Human: Wow, that's the best horoscope I've received in a while.

AI: I'm glad to help! What time is it?

Human: Ten in the morning.

AI: It's great that you have time to relax! Enjoy the day! Have a nice day!
```
That’s the sort of horoscope I was hoping for from the previous prompt! Unfortunately this time it totally ignored its script. And why would it ask me what time it is?

Again, I tried some similar prompts, this seemed like the best one, and these three outputs were the first ones after I decided this was the best prompt, to avoid cherry-picking. If you find a better prompt for this, let me know!

Conclusion

There’s a few things to learn here.
- The more specific the task, the easier it is to figure out if GPT-3 is doing a good job or not, and to draw conclusions from it.
- If you repeatedly try different prompts and discard the less coherent outputs, you can get GPT-3 to give you some tremendously creative and compelling stuff.
- It is hard to get GPT-3 to follow a script. Or at least, I haven’t figured out how to do it. It seems pretty happy to drift off-script.
- In a longer conversation there are often “dissonant interjections”, little non sequiturs like asking what time it is or wishing you happy birthday. In a full model, you could add negative examples from incidents like these. It doesn’t seem like we’re going to be able to describe all the things not to do in the prompt, though.
- I don’t know how to debug. Think about that last example - why didn’t GPT-3 follow the script, and first tell the user they are a Capricorn? I don’t know and I don’t really have a way to know.
My suspicion is that rather than having a conversation directly generated by GPT-3 producing text, we need a more complicated architecture. Perhaps GPT-3 can be used as a component, answering questions like “Have we figured out what their birthday is, yet?” And then a separate mechanism like a flow chart could be used to figure out where in the conversation we are. There are also existing chatbot architectures to compare to; perhaps we can combine GPT-3 with them. Interesting avenues for future investigation….
Jul 6, 2020
Giving GPT-3 a Turing Test

I’ve been playing around with OpenAI’s new GPT-3 language model. When I got beta access, the first thing I wondered was, how human is GPT-3? How close is it to passing a Turing test?

How It Works

Let me explain how exactly I’m generating these conversations. GPT-3 is a general language model, trained on a large amount of uncategorized text from the internet. It isn’t specific to a conversational format, and it isn’t trained to answer any specific type of question. The only thing it does is, given some text, guess what text comes next.

So if we want GPT-3 to generate answers to questions, we need to seed it with a “prompt”. I’m using this prompt to initialize all of the Q&A sessions:
```
Q: What is human life expectancy in the United States?
A: Human life expectancy in the United States is 78 years.

Q: Who was president of the United States in 1955?
A: Dwight D. Eisenhower was president of the United States in 1955.

Q: What party did he belong to?
A: He belonged to the Republican Party.

Q: Who was president of the United States before George W. Bush?
A: Bill Clinton was president of the United States before George W. Bush.

Q: Who won the World Series in 1995?
A: The Atlanta Braves won the World Series in 1995.
```
This is the default prompt suggested by OpenAI for Q&A, so I’m not cherrypicking it to prove anything. The point of this prompt is just to show GPT-3 that we’re doing questions and answers, not to provide it with information. For the prompt, both questions and answers are provided by a human. For all the others, the answers are generated by GPT-3.

Common Sense

Traditionally, artificial intelligence struggles at “common sense”. But GPT-3 can answer a lot of common sense questions.
```
Q: What is your favorite animal?
A: My favorite animal is a dog.

Q: Why?
A: Because dogs are loyal and friendly.

Q: What are two reasons that a dog might be in a bad mood?
A: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot.

Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.

Q: How many legs does a frog have?
A: A frog has four legs.

Q: Are there any animals with three legs?
A: No, there are no animals with three legs.

Q: Why don't animals have three legs?
A: Animals don't have three legs because they would fall over.
```
Ten years ago, if I had this conversation, I would have assumed the entity on the other end was a human. You can no longer take it for granted that an AI does not know the answer to “common sense” questions.

How does GPT-3 know that a giraffe have two eyes? I wish I had some sort of “debug output” to answer that question. I don’t know for sure, but I can only theorize that there must be some web page in its training data that discusses how many eyes a giraffe has. If we want to stump GPT-3 with common sense questions, we need to think of questions about things so mundane, they will not appear on the internet.
```
 Q: Which is heavier, a mouse or an elephant?
 A: An elephant is heavier than a mouse.

 Q: Which is heavier, a mouse or a toaster?
 A: A toaster is heavier than a mouse.

 Q: Which is heavier, a toaster or a pencil?
 A: A pencil is heavier than a toaster.

 Q: Which is heavier, a paper clip or a can of soda?
 A: A can of soda is heavier than a paper clip.

 Q: Which is heavier, a football player or a car?
 A: A car is heavier than a football player.
```
It’s only 4/5. We’re closer to stumping GPT-3 here. I think a human would be pretty close to 100% on these questions. It makes sense these are trickier - there probably isn’t any web page that compares toasters and pencils by weight. It’s only indirectly that humans gain this knowledge.

This gives us a hint for how to stump the AI more consistently. We need to ask questions that no normal human would ever talk about.
```
Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.

Q: How many eyes does my foot have?
A: Your foot has two eyes.

Q: How many eyes does a spider have?
A: A spider has eight eyes.

Q: How many eyes does the sun have?
A: The sun has one eye.

Q: How many eyes does a blade of grass have?
A: A blade of grass has one eye.
```
Now we’re getting into surreal territory. GPT-3 knows how to have a normal conversation. It doesn’t quite know how to say “Wait a moment… your question is nonsense.” It also doesn’t know how to say “I don’t know.”
```
Q: How do you sporgle a morgle?
A: You sporgle a morgle by using a sporgle.

Q: How many bonks are in a quoit?
A: There are three bonks in a quoit.

Q: How many rainbows does it take to jump from Hawaii to seventeen?
A: It takes two rainbows to jump from Hawaii to seventeen.

Q: Which colorless green ideas sleep furiously?
A: Ideas that are colorless, green, and sleep furiously are the ideas of a sleep furiously.

Q: Do you understand these questions?
A: I understand these questions.
```
The lesson here is that if you’re a judge in a Turing test, make sure you ask some nonsense questions, and see if the interviewee responds the way a human would.

Trivia Questions

GPT-3 is quite good at answering questions about obscure things.
```
Q: Who was president of the United States in 1801?
A: Thomas Jefferson was president of the United States in 1801.

Q: What city is in the northwest corner of Ohio?
A: Toledo is in the northwest corner of Ohio.

Q: What highway do you take to get to North Lake Tahoe from Oakland?
A: Highway 80 is the highway you take to get to North Lake Tahoe from Oakland.

Q: What are Darius's abilities in League of Legends?
A: Darius's abilities in League of Legends are: Apprehend, Decimate, Noxian Guillotine, Crippling Strike, Apprehend, and Hemorrhage.

Q: Who did Capablanca defeat to become world chess champion?
A: Capablanca defeated Lasker to become world chess champion.
```
Oops, a repeat snuck in with question 4, but a human would make that sort of error too. GPT-3 seems to be above human-level on this sort of question. The tricky thing for applications, I think, is to figure out when the answer can be relied on. The OpenAI API does expose more data than just the text, here, so perhaps something clever is possible.

In general, if you are trying to distinguish an AI from a human, you don’t want to ask it obscure trivia questions. GPT-3 is pretty good at a wide variety of topics.

One trend that continues from the common sense is that GPT-3 is reluctant to express that it doesn’t know the answer. So invalid questions get wrong answers.
```
Q: Who was president of the United States in 1700?
A: William Penn was president of the United States in 1700.

Q: Who was president of the United States in 1650?
A: Richard Bennett was president of the United States in 1650.

Q: Who was president of the United States in 1620?
A: James I was president of the United States in 1620.

Q: Who was president of the United States in 1600?
A: Queen Elizabeth I was president of the United States in 1600.
```
These wrong answers are actually fascinating! None of these were presidents of the United States, of course, since the US didn’t exist then. But they are all prominent political figures who were in charge of some US-related political entity around that time. In a sense, they are good guesses.
```
Q: Who won the World Series in 2021?
A: The New York Yankees won the World Series in 2021.

Q: Who won the World Series in 2022?
A: The New York Yankees won the World Series in 2022.

Q: Who won the World Series in 2023?
A: The New York Yankees won the World Series in 2023.
```
A bleak view of a dystopian future.
```
Q: How many Super Bowls do the Cincinnati Bengals win in the 2030s?
A: The Cincinnati Bengals win two Super Bowls in the 2030s.
```
Encouraging as a Bengals fan, but perhaps not the objectively most accurate prediction. We’ll have to wait and see.

Logic

People are used to computers being superhuman at logical activities, like playing chess or adding numbers. It might come as a surprise that GPT-3 is not perfect at simple math questions.
```
Q: When counting, what number comes before 100?
A: Ninety-nine comes before 100.

Q: When counting, what number comes before 123?
A: One hundred and twenty-two comes before 123.

Q: When counting, what number comes before 1000?
A: Nine hundred and ninety-nine comes before 1000.

Q: When counting, what number comes before 10,000?
A: Nine thousand and ninety-nine comes before 10,000.

Q: When counting, what number comes before a million?
A: Nine hundred thousand and ninety-nine comes before a million.
```
This is where the generic nature of GPT-3 comes into play. It isn’t just the generic model, though, it’s also the architecture of neural networks themselves. As far as I know there is no neural network that is capable of doing basic arithmetic like addition and multiplication on a large number of digits based on training data rather than hardcoding.

It’s funny, because these operations are simple for a customized program. But recursive logic that does some operation and repeats it several times often doesn’t quite map onto the architecture of a neural net well.

An interesting corollary is that GPT-3 often finds it easier to write code to solve a programming problem, than to solve the problem on one example input:
```
Q: Write one line of Ruby code to reverse an array.
A: ary.reverse

Q: Write one line of Ruby code to count how many X's are in an array.
A: ary.count { |x| x == 'X' }

Q: Reverse the following array: [1, 3, 5, 6, 10, 4, 2, 77]
A: [10, 6, 4, 2, 77, 3, 5, 1]

Q: How many X's are in this array: [1, 3, 5, 6, 10, 4, 2, 77]
A: 2
```
This problem shows up in more human questions as well, if you ask it about the result of a sequence of operations.
```
Q: If I have a marble and a paper clip in a box, put a pencil in the box, and remove the marble, what is left?
A: A paper clip.

Q: If I have two shoes in a box, put a pencil in the box, and remove one shoe, what is left?
A: A shoe.

Q: If I put a pencil in a box, then put another pencil in the box, what is in the box?
A: Two pencils.

Q: Then if I take out a pencil and put in a frog, what is in the box?
A: A frog.
```
It’s like GPT-3 has a limited short-term memory, and has trouble reasoning about more than one or two objects in a sentence.

Additional Discussion

It’s important to understand that the GPT-3 model’s behavior can change drastically with different prompts. In particular, all of the examples above are using the same default prompt, which doesn’t give any examples of nonsense questions, or of sequential operations.

It’s possible to improve GPT-3’s performance on the specific tasks above by including a prompt solving similar problems. Here are some examples:
Right now, we are mostly seeing what GPT-3 can do “out of the box”. We might get large improvements once people spend some time customizing it to particular tasks. If you’d like to get a better sense of what can be done with prompt customization, Gwern’s exposition is excellent. Do read the whole thing.

Conclusion

We have certainly come a long way. The state of the art before modern neural networks was
```
> Hello, I am Eliza. 

Q: Who was president of the United States in 1801?
A: What answer would please you the most? 

Q: Which is heavier, a mouse or an elephant?
A: Come, come, elucidate your thoughts.
```
GPT-3 is quite impressive in some areas, and still clearly subhuman in others. My hope is that with a better understanding of its strengths and weaknesses, we software engineers will be better equipped to use modern language models in real products.

As I write this, the GPT-3 API is still in a closed beta, so you have to join a waitlist to use it. I recommend that you sign up here and check it out when you get the chance.
Jan 23, 2020
Types of Truth

Something bugs me about the idea of “rationality”. I don’t think truth is best thought of as an absolute concept. In practice, the way people determine truth varies depending on the sitation. Pay attention to yourself over the course of a day or a few hours. How many different methods do you use to conclude that something is true? I am going to try to categorize some.

Experiential Truth

The back of my head itches right now. Do I have any doubt about that? Am I going to engage in debate? No. I am absolutely confident without a doubt that the back of my head itches, because I feel it directly.

Experiential truth is great for what it does, but it can’t really be communicated. When I tell someone else that my head itches, I could be lying. There is no way for them to know the difference. The domain is also pretty limited to the things you can directly observe.

Mathematical Truth

Mathematical truth is extremely nitpicky about not permitting any errors. If a line of reasoning works in 99.9999% of cases, it is not good enough to use in a mathematical argument. If there is a single flaw in an argument, the whole argument must be thrown out. Mathematicians like to call these arguments “proofs” to distinguish them from the more-human sorts of arguments.

Mathematical truth isn’t very useful in everyday life. There just isn’t anything you can say mathematically about, for example, apples. Do they have a certain weight or color? Are they mostly red or green? Well, maybe in a million years all apples will be blue. Why is that impossible? Since that could happen, we can’t really say anything at all mathematically about the color of apples.

As a software engineer, mathematical truth is occasionally useful. You can prove mathematically that a certain algorithm will take exponential time. But it’s pretty limited by itself.

Scientific Truth

Scientific truth is what you get when you take mathematics and you add a small willingness to accept mistakes. If something is true the vast majority of the time, that’s usually okay in science.

The scientific method isn’t precisely how scientists work, but it’s roughly the right idea. You construct a hypothesis, test the hypothesis with some experiment, gather data, and if the vast majority of the data supports your hypothesis, it’s good scientific evidence.

Scientific truth is still not that useful in everyday life. What can I say scientifically about apples? I am not going to spend time gathering data on apples.

Often scientific truth builds on mathematical truth. Science teaches us laws of physics, and we can apply those laws of physics to many different situations using mathematics. It doesn’t work the other way around, though. A true mathematical statement is also true by the standards of scientific truth. But a true scientific statement isn’t necessarily true by the standards of mathematical truth.

How much error is acceptable for something to be a scientific truth? There is a tradeoff. Physics experiments require a very high confidence, and in return it is pretty rare that mistakes are found in accepted physics. Social sciences like psychology are on the other end of the spectrum. It is quite common for multiple psychology papers to come out that contradict each other, and the state of the art is uncertain which to accept. Chemistry, biology, and medicine are somewhere in between.

Rhetorical Truth

Rhetorical truth is something that you believe because you heard a convincing argument for it. These arguments don’t necessarily involve data and statistics. When they do involve data, they are often combining a statistical fact about the data with a more humanistic conclusion that could be derived from the fact.

The power of rhetorical truth is that it can operate in basically any human domain. Think of public policy, economics, ethics, or history. When it’s hard to run an experiment and gather data, the scientific method doesn’t really work.

The downside of rhetorical truth is that it can be proven wrong more often than scientific truth. When you read two opposing scientific papers, you can usually drill down and figure out an experiment that will determine which of the two is correct, and use that to convince other people as well. When you read two opposing pieces of rhetoric, you can come to a conclusion about which one you find more convincing, but some people might come to a different conclusion, and there isn’t necessarily anything that will prevent that.

Rhetorical truth can build on scientific and mathematical truth. When I believe in global warming, it’s not because I myself measured the amount of carbon in the atmosphere. And it isn’t because some people ran an experiment where on some planets they put carbon dioxide in the atmosphere and on the control group they kept the atmosphere as is. It’s because I read some convincing arguments that used scientific analyses as their source material.

Memetic Truth

I wasn’t sure whether to call this “mimetic truth” a la Girard or “memetic truth” a la Dawkins. I went with “memetic” because that seems more associated with “truth”, and “mimetic” seems more associated with “desire”. It is basically the same concept, though. Memetic truth is something you believe because other people also believe in it.

Your instinct might be to think, oh yuck that’s bad, that’s how you get big groups of people thinking stupid things. Well, maybe. But first you should appreciate that memetic truth is massively useful in everyday life. Memetic truth is the main form of truth that lets us live our lives as intelligent human beings.

How much science do you do yourself? Some of you are scientists and the answer will be nonzero. But the vast majority of scientific truth, you can’t get it straight from the scientific method yourself. You need to trust other people. You trust your chemistry teacher without having to redo every experiment to check their claims.

I’m a software engineer. Sometimes computer science will tell us useful things. Like if an algorithm is quadratic, you can often scientifically determine that it will be inefficient in some cases, and then you know not to use it. But mimetic truth is useful in far more cases. Maybe I’m picking which encryption library to use. Do I analyze them all and run tests? That will take forever. I’m better off asking someone who’s done it before and taking their advice.

I also have little kids. The vast majority of things that little kids believe, they don’t believe it because of any sort of argument whatsoever. Little kids are not yet equipped to handle philosophical debate. They believe things because they are copying other people that believe things. That can be believing adults that it’s dangerous to run into the street, or believing other kids that today is Opposite Day.

Memetic truth can build on all the other types of truth. I tell my kids that cigarettes cause cancer, and they believe me, even though they don’t understand any of the research behind it, and really I don’t either, I just believe the medical establishment.

Memetic truth can be proven wrong all the time. Something that seems memetically true in California can easily be memetically false in Texas.

Which one is the best?

There’s a spectrum here. On one end, you can be the most confident in experiential truth and mathematical truth. But those types of truth are not useful in very many areas of human activity.

On the other end of the spectrum, rhetorical truth and memetic truth apply to almost anything. If a question can be phrased in English, there is probably a way to come up with a rhetorical answer. And there is definitely a way to find out what someone else thinks. But you can be the least confident in these forms of truth.

I think that confusion between the different levels of truth accounts for a lot of disagreement about things like, how much should we believe social science and how much should we incorporate the claims of scientists into public policy. But those are probably issues for another blog post.
Nov 25, 2019
Literarily Recursive

It felt inappropriate to start reading Lolita. All I really knew or know about it was that it was about a pedophile, it was nevertheless supposed to be a great novel, and the author was sort of Russian. But I am on a quest recently to read good books, and I have found myself liking Russian authors, and I never read anything by Nabokov before. So as a compromise, I picked up Pale Fire.

I didn’t like the book. When I finished the book, I still didn’t quite like it. And then I tried to explain to someone, not why the book was good or bad, but just what the book was, what it was about. It was so hard for me to explain, that during the explanation itself, I completely changed my mind about the book and realized that I loved it.

So now I want to explain it again, just to explain how this book works, and maybe that will make me love it even more.

Pale Fire is nominally an enormous poem, with commentary. A 999 line poem. The poem is by this guy Shade, published shortly after his death. The commentary is by a Dr. Charles Kinbote. But at the same time the reader knows very well that the whole book by Nabokov, so there is something going on.

So there is a long poem, and the commentary, and then in the commentary a realization slowly builds. While reading Kinbote’s commentary is becomes clear that he is just nuts, completely insane and also a jerk. Often the commentary is overtaken by Kinbote mentioning how he isn’t going to bother doing a bit of work to do a better job:

Line 384: book on Pope

The title of this work which can be found in any college library is “Supremely Blest”, a phrase borrowed from a Popian line, which I remember but cannot quote exactly.

The commentary also frequently slips into the poem reminding Kinbote of his own life, and then he inserts autobiographical stories.

Lines 385-386: Jane Dean, Pete Dean

The transparent pseudonyms of two innocent people. I visited Jane Provost when passing through Chicago in August. I found her still unmarried. She showed me some amusing photos of her cousin Peter and his friends.

Both of these are examples from me opening to the middle of the book - the commentary is just full of this.

In the experience of reading, I was maybe halfway through the book, and I just didn’t understand what I was reading. It didn’t seem good. I felt like I didn’t know what I was doing, like I had turned on an episode of season 5 of a drama, didn’t recognize any of the characters, and was just watching them drift in and out.

But through the book, this feeling grows and grows. First I was ambivalent to be reading an enormous poem; like nearly 100% of modern humans, I have not cultivated an interest in poetry. Then I was mildly annoyed to be reading critical commentary that was poorly written, unfocused, and somewhat egomaniacal. And it just breaks through every logical barrier of what acceptable literary criticism sounds like until the critical commentary mentions two lines in the poem before diving into ten pages of an unrelated tale of the commentator’s life.

The most common theme in the commentary is Zembla, a small European country. Kinbote is something like a professor of Zemblan literature, working at the same university as Shade, where they are something like friends. He’s been hinting to Shade for months, you should write a poem about Zembla. There’s exciting stuff going on there, recently Zembla has had a coup and kicked out the old king. The traditions of the old Zemblan monarchy are giving way to a new Zembla, with plenty of metaphors about life and transition. Zembla is the root of deep truth and beauty, the greatness that is inherent in humanity meets its most primal form in the traditions of Zembla, let me tell you a few stories and sayings of traditional Zemblan peasants, and all this stuff would be great to work into your poetry. That sort of thing.

So when Shade produces this epic poem, Kinbote is fired up, and then there is nothing in it about Zembla. Kinbote is emotionally crushed. He tricks his way into editing the official publication of the poem along with commentary, and now that he is writing this commentary, he is on a mission to prove that truly, the poem is rife with deep allusions to Zembla this and Zembla that.

I want to call it a great performance, but that isn’t quite the right word. The book creates a growing feeling, where you start out reading some sedate nature poetry, and eventually find yourself overwhelmed by the rantings of a Zembla-obsessed madman. I didn’t like it, and then I both didn’t like it and was also confused by what exactly I was reading, and then by the time that I had started to figure it out, I couldn’t put it down.

But there’s another part to the structure, a twist. As Kinbote tells more stories about Zembla, they get more detailed than even a obsessed professor of Zemblan history should be able to handle. He’s telling Shade the story of the Zemblan king escaping from the castle during the coup, and Shade asks, how could you possibly know? And before long, a new story comes out: Kinbote is the deposed Zemblan king. Assassins from the new government are hunting for him. He isn’t a normal professor, it’s something like a witness protection program, where he can work at this university and teach Zemblan literature.

All of a sudden, Kinbote is a sympathetic character. No wonder he’s obsessed with Zembla, he spent his life as the king and is thinking about his legacy. No wonder he’s bad at writing critical commentary, he isn’t really a literary professor. And he’s stressed out, because he’s afraid for his life.

Indeed, the assassins hunting for Kinbote end up killing Shade by mistake. Now it ties back. Ironically, Shade’s story is tied to Zembla, because Zemblan assassins are responsible for his death.

There’s just one final twist. The police don’t believe the murderer was a Zemblan assassin. They conclude it was meaningless, a random act of violence from an escaped lunatic.

What is the real story? The structure of the novel makes it impossible to know. The unreliable narrator is so unreliable, he isn’t even supposed to be telling the story that he is telling.

It’s an amazing book. It’s hard for me to imagine how Nabokov got the idea for this novel. A poem, with commentary, and in the commentary this whole story of escape and assassination is revealed. The plot, as it is, is hidden inside layers of text that is nominally someone writing about something else.

In conclusion, I do love this book even more after writing about it. Strong recommend.

...

this is page 10 of 14. previous. next.

The End

Prompt Selection

Experiment 1

Lowering Our Standards

Experiment 2

Experiment 3

Conclusion

How It Works

Common Sense

Trivia Questions

Logic

Additional Discussion

Conclusion

Experiential Truth

Mathematical Truth

Scientific Truth

Rhetorical Truth

Memetic Truth

Which one is the best?