An interesting shift happens once you realise you’re writing code for humans to read, and not just for machines to execute.

One big change is that writing clearly takes priority over correct code.1

Reading code involves building up a mental model of what each thing should do, and how these things interact with each other. The best code, however, short-circuits reading everything. Instead of reading the implementation, you use variable and function names, comments, and other clues to figure things out.2

This is chunking. You combine things together, and remember them together. For example, a Chess Opening, like the Queen’s Gambit, is a chunk. So is a forEach loop: just by the name, you can figure out that it’s going to do something to every element of an array.

Once you’ve built a chunk for a forEach loop, you don’t have to read the implementation of forEach to understand what it will do. This is very powerful, since it allows you to come back to the code, without having to read the implementation of everything again.

Thus, a good name is a chunk that encodes important information about what the thing does.

We’re compressing information into the name. This is almost always a lossy compression. Further, the more lines to compress, the more losses you incur, since you want your names to be no longer than 80 characters.

The goal of good naming is to minimize this loss.

And naming things is hard because of this compression. English isn’t precise like C++, so compressing from a precise to an ambiguous language increases losses. Another big reason is that choosing the most important things to include is hard. You’ll notice that most disagreements about naming fall on these two axes: either the name isn’t just right for what you want to do (does it assign or setup or link or mangle?). Or the name doesn’t capture important parts of the functionality.

I think this is the generating function for most best practices I’ve heard of, and thus lots more valuable than one specific guidance.

Generating best practices

  1. Make functions do one thing

    When a function does multiple things, the name needs to encode lots more information, like link_user_to_trial_and_enable_downloads_and_set_charge_date(). That’s a long name, and a sign of losing too much information: what semantics should the name expose about the charge date, and enabling downloads?

    These side effects are hard to encode for in the name.

  2. Keep functions small

    Long functions, like functions doing lots of things, need to encode lots of information into the name. And since the name length is bounded, you lose a lot more information in compressing this long name, which leads to poorer names.

  3. Avoid meaningless names

    Since we’re writing code to be read by humans, names that don’t compress any information don’t help us build chunks. This makes it harder to read, since you have to go down one layer of abstraction. You need to parse the implementation to understand what that piece of code does.

  4. Avoid too abstract names

    do_things() is a suitable name for any function, but the compression is too lossy to be useful - you lose almost all information, and need to parse the body to figure out what it does. Not great. It’s almost a meaningless name.

  5. Be consistent

    When you build similar chunks throughout your code base, it becomes easier to go up another layer of abstraction, by chunking the chunks. For example, continuing the trial management system example, you’ll have 3 things in place:

    1. link_user_to_trial_agreement_given_email()
    2. enable_research_downloads_for_user()
    3. set_charge_date_for_user()


    Since they’re operating in the same domain, you can go up one level to setup_user_for_trial() that does all three.

Disrupting best practices

The generator of best practices allows you to reason about cases where best practices aren’t the best way forward.

For example, consider being consistent, and naming variables using camelCase. Your entire codebase uses camelCase. You’re writing a new function that’s very non-standard. It does arcane stuff that doesn’t fit into your existing model. You need it, because legacy integration/business requirements/etc. In effect, you don’t want anyone to gloss over it when they read it.

So, do you stay consistent, and keep it in camelCase? Or, do you switch to snake_case?

The goal is to keep things clear. I would switch to snake_case to show how it’s different from the rest, and have some comments explaining the arcane things it does.3

Precision with words

Spoken languages are very imprecise. There’s lots of ambiguity, double meanings, and sentences that don’t mean anything.

Code, however, needs to be precise. Like we saw above, this is a source of trouble, since we’re trying to compress a precise language into an imprecise one, which makes searching for the right English words harder.

These ideas apply to naming things outside of code, too! When you have a name for something, it suddenly becomes legible.

Consider: You see some people around you who test if they can do something once, and drop it if they fail. Some others keep trying until they learn how to do it. The first group prizes its intelligence, while the second group prizes its hard work. Then, when you hear about the Fixed vs Growth mindset, things immediately click.

Choosing the right name here is hard too. Every best practice that lost its nuance over time can be attributed to a name that didn’t encode the necessary information. People passed these ideas on without explaining the nuances.

Or, consider equating two names. “Abortion is murder” - equating two names that aren’t the same creates an interesting dynamic between emotions and the truth. Each name has different connotations, and on combining them together, you’re passing on the connotations of one onto the other. This isn’t necessarily a good thing. You have to be very careful of the equality operator in an imprecise language.

Conclusion

Optimise for clear, easy to understand code. The cost of reading it is higher than the cost of executing it.

To do this, leverage chunking: write names that are meaningful, and encode as much information as possible into the name.

However, writing meaningful names is a lossy compression. Every naming maxim follows from minimising this loss.

And finally, the same idea applies to naming things in the real world. Except, it’s even harder, since it’s encoding from an imprecise to another imprecise language. Code, atleast, was unambiguous in its execution.

Thanks to Soumya for reading drafts of this.

Have feedback? You can reply to this post on Twitter.

  1. You might contest this, that correctness comes first, but if it’s clear to read, PR reviews will catch correctness bugs quickly. Or, if no-one catches mistakes, and then things break, it’s easier for your colleagues to fix things if they can read your code clearly.
    Footnote to footnote: This is an instance of taking ideas seriously

  2. Sometimes, poor naming gets in the way, generating an incorrect model of how things work. Here, you have no choice but to read the implementation. 

  3. Ideally, you’d fix things properly so there isn’t a need for something like this. But, the default is to have things like these littered over a huge codebase.