About Adding Features (part 1)

July 25, 2021

Teaser

One of the most clever C tricks I’ve ever seen is the following:

#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))

The BUILD_BUG_ON_ZERO(e) macro evaluates whether the compile-time value of e is non-zero (yes, the name is misleading). If yes, it results in a compiler error, therefore breaking the build. This trick leverages a few obscure properties of C, one of them being lack of support for bitfields of a negative width (who would’ve thought?).

That’s how we got static assertions in C.

Whoever wrote this line for the first time must have felt a tremendous sense of accomplishment. This must have come at a cost of losing some sanity, though.

Were sizeof and bitfields designed to support this use case? Unlikely, but here we are!

This article will be published in a few parts (1 (this), 2, …)

Introduction

Note: Although I’m mostly talking about (programming) languages, most of the considerations here also apply (perhaps even better) to libraries, modules, frameworks, and other software building blocks.

Software engineers always need to choose a tool for the job. Be it a programming language, a library or an RPC¹ service they wish to delegate tasks to. These tools can be pliable and extensible, or - in contrast - rigid and opinionated.

In his famous 1998 talk - “Growing a Language” - Guy L. Steele Jr. demonstrated (in a very entertaining way) how a language can grow if it has built-in the ways to do so. The protocol he described is simple, elegant, and capable. A language like that is very flexible.

When a tool we use doesn’t provide the flexibility we want (often don’t really need), we find a way to do what we had set our mind on anyway. This often results in clever² and hacky solutions. A question that comes to mind then is: Why didn’t the authors implement this feature in the first place?

How We Got Things Done

SFINAE (Substitution Failure Is Not An Error)

A well-known folk theorem states that C++ templates are Turing complete. People have used SFINAE for the craziest things, some of them making their way into the standard library. No seasoned C++ engineer squints their eyes when they see:

template<class T = std::enable_if<...::value>::type>
void foo() { ... }

a declaration that, to someone unfamiliar with this madness, would read as: “foo is a generic function with a type parameter T, which is equal to… WHAT IS HAPPENING HERE?!".

this i sfinae

SFINAE gave C++ engineers great power. They could now optimize their function calls based on what the passed object offers, with zero runtime cost. All the work is done by the compiler and the poor soul who had to write the templates. Good luck with debugging, too.

Enforcing String Literal Arguments

There are some reasons for enforcing that a given function argument is a string literal (not just any string). One could probably write a linter to achieve such a thing (e.g. ErrorProne in Java), but in Go, this can be done using this trick:

package fooer

type compileTimeString string

func Foo(s compileTimeString) {
	fmt.Printf("%s", s)
}

compileTimeString is an unexported type of package fooer, therefore one cannot create a variable of this type. But, in Go, you can use a string literal to create an untyped string. Therefore, in the expression Foo("bar"), the argument is actually an untyped string and it is coerced to compileTimeString.

In the case of Go, this feels like an obscure property of a language that otherwise embraces simplicity and clarity.

What Has Just Happened?

Let’s take another look at the problems we’ve tried to solve:

static assertion on a compile-time expression,
ensuring basic properties of a type we’re writing a function for (e.g. it has a given method),
expecting compile-time literal values as opposed to dynamic (possibly user-controlled) ones.

As you can see: none of the presented solutions are easy to read. If you haven’t seen any of these before, I expect you to scratch your head a lot and still run out of patience before getting it. The problem isn’t you.

A deceivingly simple way to stop this madness would be to extend the language itself to support such use cases. But, there are many reasons why languages don’t add first-class support for some features.

Why Can’t My Language Just Support X?

Firstly, we need to consider whether we want a rich feature set to begin with, regardless of the implementation cost.

Then, we should consider what it takes to have an implementation of a language (library) with many features.

Lastly, what the cost of adopting new features in a language (library) that already exists (and has many, many users) is.

Do We Really Want Lots of Features?

The Ramp-Up Cost

The bigger³ the language is, the higher the ramp-up cost. In your project, this might be negligible if new contributors are rare. Nonetheless, it is also a high barrier to entry for folks who are trying to learn new technology in their spare time.

Sometimes the high ramp-up cost is worth the productivity benefits one gains when familiar with the tool. For example, Rust’s borrow checking seems hard to learn at first but eliminates whole classes of bug that C programmers have been struggling with for decades.

There’s More Than One Way

Quite often an unintended consequence of adding a feature is opening the avenues of doing things in more than one way. At first glance (at second, probably, too), this shouldn’t be an issue. What’s wrong with having options, right?

Subtle Differences

One of the problems lies in the fact that the differences between approaches are often subtle, but vital in some scenarios. Examples include:

pre-incrementation and post-incrementation (++x and x++),
the multitude of ways one can initialize an object in C++,
equality checks in JavaScript or PHP.

Because these solutions actually differ in some scenarios, their pros and cons need to be considered. These (opinionated) analyses make their way into coding style guides, tutorials, and blog posts. One needs to now learn what the preferred approach is,the approach that should be used in obscure cases, and which approaches should be avoided at all costs; even though they look all equally good at solving the problem at hand.

Consistency

One more benefit of having exactly one way to solve a problem is consistency. Working with a unified codebase makes things easier for developers switching between projects, as their ramp-up doesn’t involve getting used to a new way of doing things. Static analysis (more on that later) tooling becomes simpler as there are fewer cases to support. Enough consistency can also enable Large-Scale Changes.

Context Heavy Code

Let’s take a look at the following snippet:

store(db, x)

What does this code do? Well, it depends whether the language:

supports implicit conversions to mutable references (e.g. void store(DB &db, Val &x) in C++, as opposed to a store(&mut db, x) call in Rust) – this might allow both db and x to be modified by store(),
supports implicit user-defined conversions (e.g. what if x is not the right type, but could be implicitly converted to a type that matches the signature of store()?),
features borrowing (e.g. like Rust),
supports non-strict evaluation,
and many other things.

Any of the above features expand the ocean of possibilities of what this single line of code can do. This burdens the reader with keeping in their mind lots of context; increasing the overall cognitive load needed to understand the program. This is especially taxing when the technology stack of your project isn’t uniform and you need to consciously keep track of which set of patterns you should recognize at a given time.

Idioms

Engineers strive to write idiomatic code. We tend to like patterns, as they let us understand things at a glance (if we’re familiar with them). For example, to create a list of integer squares, one would write in Python:

z = [ i*i for i in range(0, 10) ]

which uses a construct called list comprehensions.

In Go, which doesn’t feature list comprehensions, one writes simply:

var z []int
for i := 0; i < 10; i++ {
	z = append(z, i*i)
}

and both code snippets are idiomatic in their respective languages. It would not have surprised me if you got a code review comment requesting a change if you used a for loop in Python for this task, but in Go, there is simply no alternative.

Sometimes, adding a feature completely changes the feel of a language.

Making Things Error-Prone

In C (and other languages⁴), one can use assignment as an expression (e.g. in an if condition). Code like this is fairly common:

while ((c = getchar()) != EOF) {
	// ...
}

This, unfortunately, makes many other seemingly innocent programs error-prone. Consider this snippet:

while (x == y) {
	// ...
}

Now, one can easily make a typo or overlook a missing = sign during a code review. That would be a bug, resulting in mutating x (and the check likely passing too).

Gotcha!

Some features are just plain weird and complicated. There are usually some reasons behind introducing them, but they are scary nevertheless. It seems to be a good idea to make them obvious (like Rust does when you use unsafe), but this often doesn’t happen. Consider the following:

// Definitions of the Bar interface and BarImpl struct are omitted.

func Foo() Bar {
	var bar *BarImpl
	if (bar == nil) {
    		fmt.Println("Abra")
	}
	return bar
}

func main() {
	if (Foo() == nil) {
    		fmt.Println("Kadabra")
	}
}

The code above compiles without any warnings or errors. Can you guess what it prints⁵?

Abra

If you are not familiar with Go, this might be a surprise to you. This is because, in Go, a nil pointer to a concrete type is not the same as a nil interface value⁶]. This is a feature that is sometimes convenient⁷, but utterly confusing to new Gophers.

Undefined Behavior

One infamous feature of languages like C or C++ is the so-called undefined or unspecified behavior. For example, the order of evaluation of function call arguments is an unspecified behavior in C++. This can lead to surprising errors like a memory leak described here.

Not defining the precise behavior of a given language construct is not done out of laziness. It gives the compiler enough freedom to perform plenty of optimizations. These can even be platform-specific, making it possible to create efficient ports of software.

The question remains: how much flexibility is too much?

What Is the Cost of Implementing a Feature?

At some point, we are convinced that a new feature belongs to a language (library). It fits well and improves the ergonomics. For it to be useful, it needs to be implemented. In the cases where this is possible (not always), it might take a long time to do it right. Even then, we might have inadvertently changed how it feels to use the language (library) as a whole.

In the next post, we’ll talk more about the (lasting) costs of adding a feature to your language (library).

If you want to discuss this post or give me feedback (much appreciated), please use Twitter: https://twitter.com/kele_codes/status/1420289305176576002.

Notes

Remote Procedure Call. ↩︎
https://mobile.twitter.com/TitusWinters/status/1418569362647232518 ↩︎
For some definition of bigger. ↩︎
There was some relatively recent drama involving introducing such a feature in Python, playfully named the walrus operator (":=”)]. It’s not exactly the same as in C, though. ↩︎
https://play.golang.org/p/YQ1npl7H2CE ↩︎
https://golang.org/ref/spec#Variables: “Variables of interface type also have a distinct dynamic type, which is the concrete type of the value assigned to the variable at run time (unless the value is the predeclared identifier nil`, which has no type)." ↩︎
You can access a protocol buffer field like x.GetFoo().GetBar().GetXyz() like this without any worries about panics. This is because the methods are well-defined on nil receivers (of the concrete type). ↩︎