The importance of crashing early

Robin Boeckxstaens - June 3

Reading time: 3 min

Some fellow software developers argue that a software crash is an indication of poorly written or unreliable code. It’s a popular notion, but I think it’s a fairly common misconception. Crashing can actually be very useful in software development.

Don’t get me wrong. Systematic error handling is essential in any piece of software, and it is vital in those running in high-uptime devices or systems. You’re not likely to agree to a surgical operation if you know that the cardiac monitor to be used in the procedure has a tendency to stop functioning once in a while without warning. Similarly, the authorities aren’t going to allow driverless cars on the roads unless they’re quite sure that the software won’t end up in an undefined state. There’s little discussion that, in the event of failure, such systems should do everything possible to at least signal there’s a problem and bring the vehicle to a safe halt. People can then make their own decisions about its safety. 

 

Catch ‘em all can make your code very complex

But high-uptime systems represent a special category of software and not all code has to meet their stringent requirements. Yet many people, among them even software engineers, think that the need to ‘catch them all’ (sometimes aptly called the ‘Pokémon anti pattern’) is universally valid. Really? There are two reasons why it’s not.

First, the need to catch every single possible exception and error can make your code unnecessarily hard to deal with. Let’s take a simple example: what about those cases we assume will never happen?

pointerType somePointer = doSomethingThatReturnsSomePointer();
if (!somePointer) // should not happen
    return; // thus, do nothing and ignore error

Blog post

This is something I’ve seen many times — a developer assumes something should not happen and so writes a form of handling that in essence just ignores the facts and carries on. The reason for doing this is to satisfy the desire to handle every possible error. But what if the assumption was wrong and the event could happen, for example due to some changes in the called function? Because this is not properly thought out it leaves as much chance of undefined behavior as if it were not checked — a returned variable left at 0 causing a division by 0, a pointer not being initialized causing a crash later, an endless loop because something is not set correctly, or a user action that looks like it worked but actually did nothing.

Complicated

Blog post

What would you think, for example, of a programming error in Facebook that accidentally overwrote your relationship status? It’s complicated? Try to explain that to your life partner …

How about trying to cover this path properly then? Well, at this point it’s where we could start to question whether we should. Let’s say that a piece of software is not part of a critical system or service, so you don’t need to catch all the errors and exceptions per se. Instead, you could for example use Assert() statements to make your assumptions explicit and make your code easier to understand.

pointerType somePointer = new doSomethingThatReturnsSomePointer();
assert(somePointer);
As you can see, the code is shorter but actually it’s more explicit than the earlier example. It demonstrates to other developers that we assume or know that this path should not happen thus leaving the code a better fit for humans.

 

The closer to the cause, the easier the debugging

Blog post

Depending on the type of assert, it would cause the program to crash immediately in debug and, possibly, in production if the stated “should not happen” failed, indicating that the assumption was wrong. Such a crash would also stop the program from causing trouble if the input data doesn’t meet expectations. Indeed, it would run into those undefined paths that we mentioned.

The fact that a program can run into unexpected areas also illustrates the second reason why allowing a program to crash early can be useful: it facilitates the debugging process. In general it’s better to fail fast than slow. The closer you are to the cause of the crash or problem, the easier it is to correct both the code and, if needed, the related data. And going even further, adding an explicit assumption in the code can avoid those data corruptions.

Blog post

Crashing — nothing to be afraid of

This is not to say that crashing should be high on the list of options. On the contrary, in most cases it is better to handle errors in a more soft way. But it’s not dogma. In certain circumstances it’s really good to cause a program to crash right away when something out of the ordinary happens. In the software business, you shouldn’t be afraid of crashing. Here, unlike in real life, crashing early could even save your life.

Care to share your thoughts?

Robin Boeckxstaens

Software Development Manager at OMP Belgium

Biography

Joining OMP in 2013, Robin currently specializes in developing software solutions and overseeing projects to help packaging companies optimize their supply chains. He is a specialist in everything embedded, which turned out to be useful for OMP’s Manufacturing Execution System, and its machine links.