Bugs! No matter how many times I decree that my coworkers and I must stop writing bugs, we keep on doing it anyways. Even worse, sometimes those bugs make it into production, where users run into them!
The fact of the matter is, you are going to someday release a buggy app. Even with layers of defenses (like QA, automated tests, and CI) you'll eventually put out code that will bring shame to your name.
Therefore, you should have contingency plans in place for when you do release bugs. Here are some hard-earned lessons we've learned over the years about safely releasing apps.
First things first: you need to know when you’ve released a buggy app.
As comforting as it would be to push releases out to the wild and never think about them again, you absolutely need some way to monitor how your app is doing in the field.
Back when I was a wee baby app developer, I included zero remote logging. Imagine my surprise when we got a support email from a customer telling me the app they'd paid for was crashing - constantly! Not only was that news to me, I also had nothing to go on for how to reproduce the issue.
There are plenty of great services these days for remote monitoring that not only tell you when the app crashes but gives you stack traces and logs so you can debug the issue. For Trello Android, we use Crashlytics. At an absolute minimum, you'll want remote crash reporting. Remote logging is also useful for issues that users hit that don't crash the app.
Alphas, Betas, and Staged Rollouts
One easy way to limit the damage of your buggy app is to release it to fewer people.
Staged rollouts are a great tool for this. If you only push your crashy app to 1% of users, then you're screwing over fewer people. Not ideal, but it could have been all of them!
In addition, you should have an alpha and/or beta tester program. Your testers willingly opt into less-than-stable releases and make for a great early warning system. I wouldn't unleash knowingly crashy apps on them (lest you convert them from "testers" to "former testers"), but I would toss them new features that haven't been fully QA'd yet.
At Trello, we have a beta program (which you can sign up for on the Play Store) where we regularly release beta versions of the app. We're only a handful of developers; users run into all sorts of situations we hadn't anticipated. We are very thankful we have such great beta users who run into all sorts of interesting bugs and crashes!
Remote Feature Flags
I am a huge fan of feature flags that allow you to enable or disable features in your application.
They're great for developing new features. If you've got a project that's going to take a few months, it's much better to keep merging code, but keep it flagged off from users. It allows your devs and testers to dig into the new feature while keeping it away from production.
You can boost their utility even further by making them remotely configurable. That way, when you first release that big new feature and something goes wrong, now you have a way to disable that feature without having to scramble to put out another release.
For Trello Android, we use Firebase remote config to control our feature flags. Your solution need not be anything too complex; before Firebase, we just used a simple JSON file on Trello's servers.
Avoid releasing your app right before any sort of work break. Weekends, vacations, conferences, jury duty... don't release before any of them!
Why? If something goes horribly wrong and you need to rollout a fix, that means your break is now over. Worst-case scenario, you can't get a hold of a key player and now you've got an awful bug floating around for a few days.
I have personally made this mistake countless times. "What's the worst that could happen," I ask myself. "It's so much more convenient to release this Friday than next week,” I say.
The end result: I have spent one Google IO keynote banging out a hotfix for a release, hoping the network wouldn't go down (again) as I uploaded a new APK. I spent another Google IO releasing multiple hotfixes for a buggy app - unfortunately, my sleep-deprived state also deprived me of my brains. Learn from my mistakes: do not release right before Google IO!
Let he who has never released an app that's accidentally DDoS'd his own servers cast the first stone...
Okay, so, I did that once. Whoops.
Luckily, we had a user agent that identifies the Trello Android app and the current version. The server team put up an emergency measure to block requests from that particular user agent (to avoid taking down all of Trello) while we worked on a hotfix.
I recommend putting something into your requests that identifies the source app + the version, just in case the server needs to insert some logic for that particular release to cover up for your mistakes.
It's not a measure the server team should take lightly - writing code paths for one specific UA is a last-ditch effort. But it can really save your hiney in select circumstances.
In extreme cases, you may want to remotely disable the entire app and force users to upgrade. At Trello, we have a remote flag which can be used to render an old version of the app inoperable.
Do NOT take killswitches lightly - you’re also killing a lot of goodwill with your users when you activate them. However, they're good to implant into your code just in case, for peace of mind. It's better to have it and not use it, than not have it and need it.
For the record, we have never had to use our killswitch, so I have no fun stories about it. I'm sure that whenever we do use it, the story will make for a good post-mortem.
Ignore the Little Things
Finally: chill out.
Chances are, your app is not controlling Hawaii's missile alert system. If the app crashes once, or something goes minorly wrong, it is not the end of the world.
You would be surprised how forgiving users are for one-off crashes or issues with easy workarounds. As long as they like your app as a whole, they'll keep using it.
Your first instinct may be to rush another release out the door to fix the bug. Resist the urge! Rushed releases are much more error prone. Unless the issue is critical, take your time to fix it correctly.
We all release buggy apps, but if you go about it in a reasonable manner, you can live to tell the tale. Follow the steps above and you'll sleep better at night.
This article was originally posted on the Trello engineering blog and has been reproduced here for posterity.