Algorithms to live by cover

Key takeaway

You can use solutions from computer science to solve problems in real life. These can be wide-ranging: from how to choose your soulmate, to when to choose a new restaurant vs your favourite.

How to choose your soulmate

The Optimal Stopping Problem

Be it looking for a soulmate, or a secretary, or a house to rent – the process is the same. The catch though, is that you can’t go back and choose another once you’ve made a decision (or it’s very expensive to do so).

This problem of deciding how long to scan - and who to choose is the Optimal Stopping Problem.

The optimal solution takes the form of Look-Then-Leap Rule: You set a predetermined amount of time for “looking” - that is, exploring your options, gathering data - in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best person you saw in the look phase.

This optimal number comes out to 37%.

The 37% Rule: look at the first 37% of the applicants, choosing none, then be ready to leap for anyone better than all those you’ve seen so far.

How do you determine 37%? This could be on the time or the total number of candidates.

For example, searching for a soulmate - with your preferred age for looking between 20 and 32, the optimal age to leap comes at 24.5.
In this case, when you find someone better than anyone you’ve met after you’re 24.5 years old - marry them.

The new Italian restaurant vs the all time favourite pizza place

Explore - Exploit Problem

Imagine walking into a casino full of different slot machines, each one with its own odds of a payoff. You don’t know the odds in advance. Until you start playing, you won’t have any idea which machines are the most lucrative and which ones are money sinks.

This scenario is the “multi-armed bandit problem.”

Choosing a restaurant or an album is all about deciding which arm to pull in life’s casino. You wouldn’t know how great they are till you experience them. You could rely on what other people tell you, but you can never know for sure.

A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle. Discovering an enchanting café on your last night in town doesn’t give you the opportunity to return.

The flip side is that the value of exploitation can only go up over time. The loveliest café that you know about today is at least as lovely as the loveliest café you knew about last month. So explore when you will have time to use the resulting knowledge, exploit when you’re ready to cash in. The interval makes the strategy.

A better than chance strategy is the Win-Stay, Lose-Shift Algorithm:

Choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to another.

But changing arms each time one fails is harsh. Would one disappointment at your favourite restaurant be enough to give up on it?

Gittins came up with a solution to this problem.

For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number - which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index - suggests an obvious strategy on the casino floor: always play the arm with the highest index.

More on Gittins index

When to put things in order


Err on the side of messiness.

Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.

The question becomes how to estimate ahead of time what your future usage will be.

“To lower costs per unit of output, people usually increase the size of their operations,” - J. C. Hosken. This is the economy of scale.

But with sorting, scaling up is expensive.

As a sort grows larger, the unit cost of sorting rises.

Cooking for two is typically no harder than cooking for one, and it’s certainly easier than cooking for one person twice. But sorting, say, a shelf of a hundred books will take you longer than sorting two bookshelves of fifty apiece: you have twice as many things to organize, and there are twice as many places each of them could go. The more you take on, the worse it gets. This is the first and most fundamental insight of sorting theory. Scale hurts.

This problem is also known as the Agony of Sorting.

What’s the solution? With pairwise comparisons, none. But, if we can objectively rank everything in the sort - we don’t need the comparisons.

This move from “ordinal” numbers (which only express rank) to “cardinal” ones (which directly assign a measure to something’s caliber) naturally orders a set without requiring pairwise comparisons.[..]. The Fortune 500 list is one of these. To find the most valuable company in the United States, analysts don’t need to perform due diligence comparing Microsoft to General Motors, then General Motors to Chevron, Chevron to Walmart, and so forth. These seemingly apples-to-oranges contests (how many enterprise software installations equal how many oil futures?) become apples-to-apples in the medium of dollars. Having a benchmark—any benchmark—solves the computational problem of scaling up a sort.

Choosing what books to read in the library


“It is a mistake to think that that little room [the brain] has elastic walls and can distend to any extent. Depend upon it there comes a time when for every addition of knowledge you forget something that you knew before. It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones.” - Sherlock Holmes

One way to think about forgetting is that our minds run out of space. The problem might not be not one of storage, but of organization. According to Andersons’ theory, the mind has infinite capacity for memories, but we have only a finite amount of time in which to search for them. Anderson made the analogy to a library with a single long shelf. You can fit as many items as you want on that shelf, but the closer something is to the front the faster it will be to find.

Something similar occurs in libraries. There’s a recently returned section - which makes the lives of librarians easier. They aren’t expected to immediately put the books back on the shelves. For us, this is an interesting opportunity to look at books that others are reading - instead of the showcase which tells us what to read. It’s the cache of recently read books.

Read more on Caching and organising

How to get things done


Do the difficult things while they are easy and do the great things while they are small. - LAO TZU

Every scheduling algorithm needs some metric you’re optimizing for.

Here are a few.

Doing everything as soon as possible

Assume that every week you get a lot of fresh fruits all at once. Each fruit is set to spoil on a different date — so eating them by Earliest Due Date sounds reasonable. You’ll get to every item as soon as you can.

However, it’s not the end of the story. Earliest Due Date is optimal for reducing maximum lateness, which means it will minimize the rottenness of the single most rotten fruit you’ll have to eat. This may not be the most appetizing metric to eat by.

How do you fix this? Change what you’re optimizing for.

Doing something(s) on time

Let’s say, we want to minimize the number of fruits that spoil and we won’t eat spoilt fruits.

Here a strategy called Moore’s Algorithm gives us our best plan. Moore’s Algorithm says that we start out just like with Earliest Due Date—by scheduling out our fruits in order of spoilage date, earliest first, one item at a time.

But as soon as it looks like we won’t get to eating the next item in time, we pause, look back over the meals we’ve already planned, and throw out the biggest item (that is, the one that would take the most days to consume). For instance, that might mean forgoing the watermelon that would take a half dozen servings to eat; not even attempting it will mean getting to everything that follows a lot sooner. We then repeat this pattern, laying out the foods by spoilage date and tossing the largest already scheduled item any time we fall behind. Once everything that remains can be eaten in order of spoilage date without anything spoiling, we’ve got our plan.

Doing as much as possible

Minimizing the sum of completion times leads to a very simple optimal algorithm called Shortest Processing Time: always do the quickest task you can.

Even if you don’t have impatient clients hanging on every job, Shortest Processing Time gets things done. (Perhaps it’s no surprise that it is compatible with the recommendation in Getting Things Done to immediately perform any task that takes less than two minutes.) Again, there’s no way to change the total amount of time your work will take you, but Shortest Processing Time may ease your mind by shrinking the number of outstanding tasks as quickly as possible. Its sum-of-completion-times metric can be expressed another way: it’s like focusing above all on reducing the length of your to-do list. If each piece of unfinished business is like a thorn in your side, then racing through the easiest items may bring some measure of relief.

When the future is foggy, you don’t need a calendar - just a to-do list.

People and computer operating systems alike face a curious challenge: the machine that is doing the scheduling and the machine being scheduled are one and the same. Which makes straightening out your to-do list an item on your to-do list—needing, itself, to get prioritized and scheduled.

Predicting the future - How long things will last

Baye’s Rule

Baye’s rule gives us a means to predict the future. The only requirement? Have a strong sense of priors.

What are priors? What you know about what you’re trying to predict.

Good predictions require good priors. This has a number of important implications. Our judgments betray our expectations, and our expectations betray our experience. What we project about the future reveals a lot—about the world we live in, and about our own past.

Let’s dive in.

The Copernican Principle:
If we want to predict how long something will last, and have no other knowledge about it whatsoever, the best guess we can make is that it will continue just as long as it’s gone on so far. This is the power law distribution and the multiplicative rule.

But there is a problem. If you meet a 90-year-old man, the Copernican Principle predicts he will live to 180. Every 6-year-old boy, meanwhile, is predicted to face an early death at the tender age of 12. To understand why the Copernican Principle works, and why it sometimes doesn’t, we need to return to Bayes rules and the priors.

When we apply Bayes’s Rule with a normal distribution as a prior, we get an Average Rule: use the distribution’s “natural” average as your guide.

For instance, if somebody is younger than the average life span, then predict the average; as their age gets close to and then exceeds the average, predict that they’ll live a few years more. Following this rule gives reasonable predictions for the 90-year-old and the 6-year-old: 94 and 77, respectively. (The 6-year-old gets a tiny edge over the population average of 76 by virtue of having made it through infancy: we know he’s not in the distribution’s left tail.)

Between those two extremes, there’s a third category of things in life: those that are neither more nor less likely to end just because they’ve gone on for a while. Sometimes things are invariant. The Danish mathematician Agner Krarup Erlang, who studied such phenomena, formalized the spread of intervals between independent events into the function that now carries his name: the Erlang distribution.

The Erlang distribution gives us a third kind of prediction rule, the Additive Rule: always predict that things will go on just a constant amount longer.

  • Power law distribution
    • The Multiplicative Rule
  • Erlang distribution
    • The Additive Rule
  • Normal distribution
    • The Average Rule

News, social media, and other forms of story-telling can very easily skew our perceived distributions and likelihoods for a given phenomenon. Representation in the media does not equal frequency in the real world.

If you want to be a good intuitive Bayesian—if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate—you need to protect your priors. Counterintuitively, that might mean turning off the news.

When to think less


The lesson is this: it is indeed true that including more factors in a model will always, by definition, make it a better fit for the data we have already. But a better fit for the available data does not necessarily mean a better prediction.

One way to combat overfitting: Withold some available data points. This can be conscious or built into the circumstance.

Consciously, give yourself 5 minutes to make a decision.
Built in, start making the decision 5 minutes before the due-date.

Another way: Testing your model with data derived from some other form of evaluation. The use of proxy metrics—taste as a proxy for nutrition, number of cases solved as a proxy for investigator diligence—can also lead to overfitting.

In these cases, we’ll need to cross-validate the primary performance measure we’re using against other possible measures.

How to solve difficult problems


And this is where computer science figured out something invaluable, something we can all learn from: how to best approach problems whose optimal answers are out of reach. How to relax.

It’s probably not the relaxation you’re thinking about. It’s about relaxing a problem.

  • Constraint Relaxation removes some constraints altogether and makes progress on a looser form of the problem before coming back to reality.

  • Continuous Relaxation turns discrete or binary choices into continua: when deciding between iced tea and lemonade, first imagine a 50–50 “Arnold Palmer” blend and then round it up or down.

  • Lagrangian Relaxation turns impossibilities into mere penalties, teaching the art of bending the rules (or breaking them and accepting the consequences). A rock band deciding which songs to cram into a limited set, for instance, is up against what computer scientists call the “knapsack problem” - a puzzle that asks one to decide which of a set of items of different bulk and importance to pack into a confined volume. In its strict formulation the knapsack problem is famously intractable, but that needn’t discourage our relaxed rock stars

What would you do if you weren’t afraid? What would you do if you could not fail? What would you do if you won the lottery? What would you do if all jobs paid the same? The idea behind such thought exercises is exactly that of Constraint Relaxation: to make the intractable tractable, to make progress in an idealized world that can be ported back to the real one. If you can’t solve the problem in front of you, solve an easier version of it—and then see if that solution offers you a starting point, or a beacon, in the full-blown problem. Maybe it does.

What to do the next time someone cancels on you

In human society, we tend to adopt a policy of giving people some finite number of chances in a row, then giving up entirely. Three strikes, you’re out. This pattern prevails by default in almost any situation that requires forgiveness, lenience, or perseverance. Maybe we’re doing it wrong.

Solution: Exponential Backoff on the invitation rate. Try to reschedule in a week, then two, then four, then eight. The rate of “retransmission” goes toward zero - yet you never have to completely give up.

How to make sure your plans work out

Computational Kindness

Language like “Oh, I’m flexible” or “What do you want to do tonight?” has a dark computational underbelly that should make you think twice. It has the veneer of kindness about it, but it does two deeply alarming things.

  • It passes the cognitive buck: “Here’s a problem, you handle it.”
  • By not stating your preferences, it invites the others to simulate or imagine them.

The simulation of the minds of others is one of the biggest computational challenges a mind (or machine) can ever face. In such situations, computational kindness and conventional etiquette diverge. Politely withholding your preferences puts the computational problem of inferring them on the rest of the group.

In contrast, politely asserting your preferences (“Personally, I’m inclined toward x. What do you think?”) helps shoulder the cognitive load of moving the group toward resolution.

Read more on Computational Kindness

Enjoyed this summary? Get the book here! [Affiliate Link]