Everyday Problem Solvers
Here’s an interesting thought question from Mike Stall: what’s worse than crashing?
Mike provides the following list of crash scenarios, in order from best to worst:
Mike points out that there’s a natural tension between…
The philosophy behind “fail fast” is best explained in Jim Shore’s article (pdf).
Some people recommend making your software robust by working around problems automatically. This results in the software “failing slowly.” The program continues working right after an error but fails in strange ways later on. A system that fails fast does exactly the opposite: when a problem occurs, it fails immediately and visibly. Failing fast is a nonintuitive technique: “failing immediately and visibly” sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production.
Fail fast is reasonable advice– if you’re a developer. What could possibly be easier than calling everything to a screeching halt the minute you get a byte of data you don’t like? Computers are spectacularly unforgiving, so it’s only natural for developers to reflect that masochism directly back on users.
But from the user’s perspective, failing fast isn’t helpful. To them, it’s just another meaningless error dialog preventing them from getting their work done. The best software never pesters users with meaningless, trivial errors– it’s more considerate than that. Unfortunately, attempting to help the user by fixing the error could make things worse by leading to subtle and catastrophic failures down the road. As you work your way down Mike’s list, the pain grows exponentially. For both developers and users. Troubleshooting #5 is a brutal death march, and by the time you get to #6– you’ve lost or corrupted user data– you’ll be lucky to have any users left to fix bugs for.
What’s interesting to me is that despite causing more than my share of software crashes and hardware bluescreens, I’ve never lost data, or had my data corrupted. You’d figure Murphy’s Law would force the worst possible outcome at least once a year, but it’s exceedingly rare in my experience. Maybe this is an encouraging sign for the current state of software engineering. Or maybe I’ve just been lucky.
So what can we, as software developers, do about this? If we adopt a “fail as often and as obnoxiously as possible” strategy, we’ve clearly failed our users. But if we corrupt or lose our users’ data through misguided attempts to prevent error messages– if we fail to treat our users’ data as sacrosanct– we’ve also failed our users. You have to do both at once:
The guiding principle here, as always, should be to respect your users. Do the right thing.