I have this terrible habit of subtweeting during meetings. What can I say? It’s a reasonably effective way for me to blow off the stress associated with a discussion that’s driving me a little bit crazy without inflicting said crazy on the room.
Except this particular subtweet went viral.
I was amazed honestly. I didn’t think this statement was particularly profound, but for some reason it resonated with a lot of people. One thousand retweets and one thousand likes are both within striking distance.
Anyway, as the conversation has continued throughout the past week, I thought it might be fun to tell this tweet’s story in another medium (see what I did there?) and deal with some of the feedback in more than 140 character snippets.
Without naming names, I was watching a presentation centered around a proposed set of enhancements to a certain cloud application platform that would make the agile feedback loop a first class citizen. An ancillary point was raised about the necessity to facilitate the deployment of a set of related microservices as a set and in a specific order. I immediately started to twitch just a little, and fired off the tweet. You see, I’ve heard this particular requirement several times before. And I’ve never understood it.
I believe the greatest power to be found in what we’re calling “microservice” or “cloud native” architectures is the notion of absolute decoupling of deployment lifecycle.
When I discuss these architectures with my customers, I often use analogies like “change out the engines on an aircraft while it’s in flight” and “replace the cells on a living organism.” In an always on services world, I simply don’t see how you can create a 24/7 responsive user experience, and deploy new features, without the ability to replace any component of your system at will. And to further the point, I don’t see how you can make that system secure (see my colleague Justin Smith’s post below for context) without this ability either.
So, if you’re telling me that you’re willing to punt on the supreme value add of microservices, then why would you be willing to take on all of the additional pain that comes with a distributed system?
Read the blog above. Read the linked content. Keep going. If you don’t figure out immediately that distributed systems are HARD then I’m not sure you’re paying attention. As a system architect I would never agree to pay the distributed systems tax unless I was getting a clear and valuable benefit.
Going on to some feedback, my good friend Dan Woods brings up the following:
Well played Dan. You’re absolutely right. There are in fact plenty of examples of distributed systems where order matters. But there are many more examples where order absolutely does not matter. Seldom do you get lucky. The overwhelming majority of the time you must design this order independence into the system. It’s your responsibility to make each component not care if one of its dependencies is present. In fact, this sounds suspiciously like fault tolerance. Why do we care about fault tolerance? Because we know that the probability of failure in any distributed system increases exponentially with the number of nodes present in the system! One of your lovely little microservices is going to get sick on a fairly regular basis, and what are the rest of your microservices going to do? If you can’t even orchestrate your initial deployment if something’s missing, how in the world are you going to survive production?
So if you’re going to embrace distributed systems, please start from the first principle that every node in your system can at least start and run in a gracefully degraded state regardless of what else is transpiring in the system. If (and only if) you come to a point where this is seemingly impossible (prove it if you can), introduce an ordering dependency.
It’s just like the notion of database transactions. They’re absolutely useful. And over the course of the last couple of software engineering decades, we’ve made it stupid easy to add transactional behavior to applications. In doing this, somewhere along the way we became convinced that everything should exist within a transaction. There is no business behavior that can exist without ACID!
The interesting thing that I see is that almost nothing seems to be ACID in the real world. Processes truly requiring ACID really do exist, but there really aren’t that many of them.
Distributed systems requiring complex orchestration really do exist, but there really shouldn’t be that many of them.
So what now?
Time for the standard consultant answer: it depends.
Should you recompose all of your microservices into a monolith? Or should you apply foundational distributed systems patterns to eliminate your orchestration issue?
I don’t know your context. I don’t know which of these paths is going to be successful for you. I’ve absolutely chosen both of them in different situations based on careful consideration of the data in front of me:
In that particular situation, I was smokejumping in to save an architectural dumpster fire. These were OSGi-based microservices, so they weren’t paying the distributed systems tax, but there was still a complete failure to define effective component boundaries and service contracts.
A monolith paying the OSGi tax is just as painful as a monolith paying the distributed systems tax.
I wasn’t convinced we had enough time or money to put out the fire any way other than to rip OSGi out and reify the monolith. Wouldn’t you know that within 6 months this team was doing continuous delivery of working software to pre-production environments?
But please don’t tell me you’re special…
You’re doing it wrong. I’ve spent the better part of the last three years helping Fortune 500 customers across every vertical imaginable build or migrate to microservices and distributed systems architectures. At the same time, I had the privilege of managing product for another distributed system you may have heard of:
If there’s one thing I’ve learned during this amazing journey, it’s this: you’re not special. There’s no special business problem that you’re trying to solve that puts you in a special class where multiple decades of distributed systems theory doesn’t apply to you.
You either need a distributed system that follows distributed systems principles, or you need a well-factored monolith that follows the design principles of OO or Functional or whatever programming paradigm you choose.
There really isn’t another option.