When Your Source of Truth is Reality

Let me tell you a story about why one should be cautious with databases that track the real world.

I play trombone in a local community orchestra. For the past two years, I’ve been the board secretary; part of my duties is keeping a roster of who is in the orchestra.

We’ve got a database (aka spreadsheet). I’m an organized person. Simple enough, right?

Wrong! Keeping an up-to-date, accurate roster was a nearly impossible goal. The reasons are numerous:

  • Every month, a few random people trial the orchestra. Some of them never check in with me to let me know they’re actually joining.
  • Likewise, some members drop out by ghosting. They would rather receive weekly emails for years than formally quit.
  • Some people only show up intermittently to practices, making it hard to tell if they’ve quietly dropped out or are just sick/busy.
  • Some people act on their own without telling anyone, like when the trumpet section invited three guest trumpet players to a concert with no advance warning.

And you know what? This is fine. It’s a casual community orchestra. People might be left off the weekly email list or concert program. We play on.


What if the database did need to be exact?

That’s tough because I’m increasingly convinced that there will always be a gap between a database and the real-world stuff it’s tracking.

Remember that time when JPMorgan thought it had $1.3MM of nickel but it was actually just bags of rocks? Or when forest-based carbon offsets are destroyed in a wildfire? Or the myth of “blue zones” (places where people live extra long) being debunked as simply bad record keeping and/or pension fraud? Or let’s make it more personal - your project is listed as due on Friday but you know will take another two weeks?

There’s a practice in database design known as single source of truth, wherein you normalize your schema such that no data is duplicated. Otherwise, you can end up in a situation where there’s two different values for a single piece of data - if one table says my name is “Dan” and the other says “Daniel”, which one is correct?

The problem with any database tracking real-world objects is that your single source of truth is reality. That means whatever is in the database is NOT a source of truth. As such, it doesn’t matter how well-designed your database or application is - you can get into trouble because the real world can change at any time.

Digital services rely on databases, and rightly so! It’s extremely useful to digitize the real world. They don’t call it the “computer revolution” for nothing. But every database has to account for when reality doesn’t agree with your data.

For example, my last job involved inventory tracking, and a constant concern of ours was “what if our database says there’s 5 chairs in the warehouse, but when they’re requested, there’s actually only 4?” Instead of simply assuming everything would work out, we needed to make sure our workflows could handle exceptions like “the data is wrong.”

(This problem is one of many reasons why I’m skeptical of blockchain technology - immutability can be a bad thing because there are often exceptions.)

My advice: be humble enough to recognize that your database plays second fiddle to reality. Have a plan to handle the schism between your data and reality. It’s not a question of “if” the data drifts, but “when”, and you need to account for that.