I’ve been thinking about code clarity & maintainability recently, and I thought I’d write some of it up. The main goal is to make code easier to read, debug, understand, and collaborate on. This is especially important as a team grows.
Code Styles
For teams, choose a set code style guidelines and stick to them. Keep indentation, naming conventions, spacing, etc. consistent so all contributors do not need to learn a new style every time they move to a new section of the codebase. Keep a knowledge base of best practices and guidelines.
If your language has a standardized code style, adopt that if possible so you don’t need to teach new hires your custom flavor of code formatting.
It’s not particularly useful to argue about the best style, isIndexSignatureDeclaration, etc, - it’s more important to be consistent across the codebase.
For example, some common JS style guides are:
Python has it’s own PEP 8 style guide, Rust has it’s own recommmended style, and many other languages have similar guides or formatting tools.
Type Annotations and Comments
Obviously, this is geared towards scripting languages like Python and JavaScript, not typed languages like Rust, C, etc which require types for the code to run.
Some languages, like Python, support this out-of-the box in recent versions. Using type annotations makes it much more clear how to properly utilize a function, and much easier for static analysis to catch issues. It allows you to communicate expectations around inputs and outputs to the rest of your team.
Type definitions are nice since they’re part of the code. Unlike comments, they’re harder to ignore or forget when refactoring or adding new code. With good naming and type annotations, comments can be relatively minimal to if you need to describe type information, etc. in them.
These examples are in Python and are supported out of the box, but languages like JavaScript require using a tool like TypeScript to compile your code first. Many linting tools alert if types don’t match expected values.
Resources:
Anti-Pattern
Take this function definition as an example. Just reading this, you don’t have much to go on for how to use it. You could maybe guess that name/email might be strings, but the format of user
is unknown without further context. Is it an id? is it an object? Does this return the updates? We can’t be sure.
def perform_user_updates(user, name, email):
# <some implementation>
Best Practice
The following is a little more verbose, but much more maintainable in the long run. It’s explicitly clear what’s required to successfully call the function, without even needing to read its implementation. As a plus, modern editors also suggest/autocomplete fields when things are typed, and comment blocks are visible on hover in other files!
from typing import TypedDict
# this could be stored in a separate module for types and imported across your project (recommended)
class User(TypedDict):
user_id: int
name: str
email: str
def perform_user_updates(user: User, name: Optional[str] = None, email: Optional[str] = None) -> User:
"""
Update a user's information in the database, and return the current user.
If arguments are not specified / None, the existing values are kept.
- user: the user object to update in the database
- name: the user's new display name
- email: the new email
"""
# <some implementation>
Functions / Modules
Use pure functions for computations
When doing anything non-trivial algorithmically, prefer to use “pure” functions. Functions of this type:
- Return the same output given the same inputs
- Have no side-effects:
- do not modify global variables, arguments, etc.
- don’t perform database read/writes, make API calls, etc.
If you need to perform side-effects, prefer to do them in a separate function that is called outside the pure function. This allows testing of the pure function without needing to mock out the side-effects.
Avoid global variables
Global variables make it hard to reason about the current state of your program while debugging or editing. If you need to share state, prefer to create it once and pass as an argument to any functions you call. This way, it’s clear what is impacting the behavior of your functions.
There are some exceptions to this, where a global singleton is needed. Some examples of this might be a database connection, configuration, etc.
Limit function size
This is especially important as your team size grows. It’s very hard to maintain a single multi-hundred-line function or module. If you chunk it up into several functions that are then chained together in some controller function, you can reason about the behavior of a single portion more clearly.
If split up the code, you can work on pieces without needing to worry as much about potential side-effects of modifying one part of a single large procedure, and you’ll have far less merge conflicts if multiple people are contributing.
Prefer to split logic, database updates, and utilities whenever possible to ensure the code’s easy to follow. Prefer to reduce the amount of code needed to be understood in order to build, new features and fix bugs.
Split up large modules
If a module grows to a large size, it might be advantageous to split it into separate files. At the moment I don’t have a “guideline” for how to approach this, other than do something that makes sense for your team and codebase.
Notes
I’ve maintained a fair amount of code at this point in my career that breaks one or more of these rules - and this is where I’ve seen all sorts of issues arise. These modules have tended to be several thousand line files, often just one or two large functions, using global variables within for maintaining state throughout the whole thing. Where functions are used, they have side effects requiring edits to all sorts of other places. I’ve often seen this as part of some prototype that ended up getting sent into production, and never refactored.
Programs like this incredibly hard to maintain. It’s incredibly difficult to collaborate on these sorts of programs, since adding a new feature usually introduces edits across the file and all sorts of merge conflicts arise. Reuse of code from these sorts of programs is next-to-impossible.
Un-Nest your code
Especially for languages like Python where indents are meaningful, but this applies to all languages - limit your nesting as much as possible. There’s a couple ways to tackle this, depending on the situation.
- Add functions to break up distinct cases
- Refactor callback nesting into async/await or similar async code
- Refactor if statements to reduce nesting / bail out quickly for error cases
- Rethink your logic to avoid nested loops (with large
n
, execution time can quickly grow since this isO(n^2)
or worse)
As part of the linux kernel coding style - they explicitly call this out. Indentation for linux kernel code is set at 8 characters, and this makes it incredibly clear when indentation is becoming a problem.
Callback Hell
This is a common issue in JavaScript, where you have a series of nested callbacks.
JavaScript (and many other languages), have a solution to this problem, by way of async
/await
, and promises. You can easily wrap some classic callback function with a promise “promisifying” it, and then and await
the returned promise:
Anti-Pattern
/**
* Avoid this!
* This can get incredibly nested, and confusing, fast.
*/
function foo (callback) {
someLibfunction((res, err) => {
if (err) { return callback(err) }
someOtherLib(res, (res2, err) => {
if (err) { return callback(err) }
// do something (potentially *MORE* nested logic)
// this could go on
callback(undefined, res2)
})
})
}
// to use our function:
foo((err, res) => {
if (err) {
return console.error('error!', err)
}
dosSomething(res)
}))
Best Practice
I’ve found that utilizing tools such as Javascript’s promises are a good way to tackle this problem. In node, util has a nice promisify
function that turns any callback func into a promise that can be awaited. This also allows you to write less error handling boilerplate as a side-bonus!
import util from 'node:util'
const someLibfunctionAsync = util.promisify(someLibfunction);
const someOtherLibAsync = util.promisify(someLibfunction);
/**
* A much clearer version of foo()
* Logic is much more readable
*/
async function foo() {
// get the first Result
const firstResult = await someLibFunctionAsync()
// get 2nd result
const secondResult = await someOtherLibAsync(firstResult)
// do something else:
// logic
return secondResult
}
// then, use it:
foo()
.then(res => { doSomething(res) })
.catch(err => console.error('error!', err))
Refactor conditionals
Another trick is to limit the amount of if statements. Sometimes, it’s better to invert your statement and raise/return early, rather than keeping logic for that dangling in else {}
blocks.
I saw an example of this on twitter x.com. Of which, there are some extremely overcomplicated suggestions, ranging from doing math to calculate the correct ASCII char code, to using all sorts of dictionaries and loops. The best solution is still to just use a simple if
statement.
Anti-Pattern
function checkGrade(score: number) {
if (score >= 90) {
return "A"
} else {
if (score >= 80) {
return "B"
} else {
if (score >= 70) {
return "C"
} else {
if (score >= 60) {
return "D"
} else {
return "F"
}
}
}
}
}
Best Practice
function checkGrade(score: number) {
if (score >= 90) {
return "A"
}
if (score >= 80) {
return "B"
}
if (score >= 70) {
return "C"
}
if (score >= 60) {
return "D"
}
return "F"
}
Often, you can move these onto a single line to enhance clarity if the conditions are simple:
function checkGrade(score: number) {
if (score >= 90) return "A"
if (score >= 80) return "B"
if (score >= 70) return "C"
if (score >= 60) return "D"
return "F"
}
Anti-Pattern
Here’s another example, where nested conditionals and else statements are hard to flollow. THe preconditions for the function are specified throughout the body of the function, making it hard to follow what’s going on.
function getDashboardInfo (user: User) {
if (user) {
const out = {}
if (!user.suspended) {
// TODO: calculate dashboard metrics
return out
} else {
throw new Error('User is suspended')
}
} else {
throw new Error('User is not valid')
}
}
Best Practice
Do this, which is both easier to read, and debug. The preconditions for the function are specified at the top, so it’s easier to read:
function getDashboardInfo(user: User) {
if (!user) { throw new Error('User is not valid') }
if (user.suspended) { throw new Error('User is suspended') }
const out = {}
// TODO: calculate dashboard metrics
return out
}
Avoid Nested Loops
For small n
, this doesn’t cause too much of an issue. For larger n
, this is a huge bottleneck, as the time complexity here is O(n^2)
(quadratic time)
There’s some cases where you need to nest loops (i.e. computing a matrix of data, or something along those lines). But for many data processing cases, you don’t need to, and the nesting is a sign there’s a better way (and clearer/concise) to write it.
Anti-Pattern
This is an example of some theoretical update procedure. Suppose we have an array of items, and we need to produce a merged set with a new list of items/item updates:
const out = []
// merge updates to items with existing data
for (const item of items) {
for (const new_item of new_items) {
if (item.id === new_item.id) {
out.push({
...old_item,
...new_item
})
}
}
}
// add items we didn't find in our current dataset
for (const new_item of new_items) {
let found = false
for (const outItem of out) {
if (new_item.id === outItem.id) {
found = true
}
}
if (!found) {
out.push(new_item)
}
}
Best Practice
We can more efficiently do this if we first scan the initial list into a hash table. We can then execute our merge much more quickly as the size of items grows, as the time complexity here is O(n)
. This is also a ton clearer what we’re trying to do:
const out = []
const existingItems = {}
// create hash table of current items
for (const item of items) {
existingItems[item.id] = item
}
// scan the new items and perform updates to build a new set of items
for (const new_item of new_items) {
if (existingItems[new_item.id]) {
// merge updates to items with existing data
out.push({
...existingItems[new_item.id],
...new_item
})
} else {
// add items we didn't find in our current dataset
out.push(new_item)
}
}
Libraries and Abstraction
A healthy amount of abstraction is helpful in keeping code consise and understandable. If you find yourself copying hundreds of lines of code around your app - that’a a good sign you should abstract that portion.
I’ve worked on a fair amount of sites and apps that use the MVC architecture. In my personal opinion, you should try to keep your controllers as high level as possible - large amounts of business logic don’t belong there. If there is any non-trivial logic, it should go into a model or library function so that it can be re-used, without requiring copy-pasting across controllers.
In many apps, you’ll end up duplicating common controller operations at least once or twice, especially if you have a main app, an administration portal, or an api which have different controllers but all perform similar functions internally.
Final Words
Like most things in programming, some of this is subjective. But I’ve found that trying to follow these guidelines has made my code much easier to maintain, and understand and much easier to collaborate on.