Matt Stine, a Principal Software Engineer at Pivotal, gave an excellent, fast-paced talk at Cloud Foundry Summit about why microservices and Pivotal Cloud Foundry have such a powerful affinity, and introduced Spring Cloud and NetflixOSS to the attendees in the process.
Matt leads off the talk with pragmatic advice and recommends that whenever possible, start with a monolith while things are small—but make it sure it’s designed to be 12-Factor, particularly if it’s net-new development. Existing applications can be re-factored, but that’s quite a bit harder. At what point is the codebase too big and requires functional decomposition? Turns out answering that question is hard, since it’s qualitative not quantitative—there is no one definition of what’s “too big” that fits all of the use cases.
What are microservices? Matt, like many of us, prefers Adrian Cockcroft’s definition of microservices: “Loosely coupled service oriented architecture with bounded contexts”. Short and sweet, but what does that really mean?
- No, it’s not SOA circa 2004, even though there are some similar concepts, as the implementation is quite different.
- Loose coupling, meaning that you can deploy code whenever you want to and not have to ask anyone first. This requires strong boundaries between teams/services.
- Domain Driven Design is key to separating things. You cannot have multiple representations of the same domain object. If you examine a given concept (say, an airline reservation) from one side of the application to the other, and come up with more than one definition, then you don’t have a bounded context.
Any single place where you put all your important stuff is a monolith, and resistant to change. If you can create good boundaries, then put an API in front of it so nothing is coupled to details, then you are on your way to deploying software continuously. In fact, there are a lot of prerequisites, or as Martin Fowler says, “You Must be This Tall To Use Microservices”. If you can’t do the following things, you’re not ready to use microservices:
- Rapid provisioning—seconds or minutes
- Basic monitoring—many more things to monitor
- Rapid Application Deployment—be able to deploy a new line of code quickly
- DevOps culture—developers own the operation of code they write, operations handles the platform
Microservices are an application architecture that runs extremely well with Pivotal Cloud Foundry. While one doesn’t technically require the other, they are ideally suited for each other. Pivotal Cloud Foundry deals with many of the underlying concerns for microservices at the platform level: issues like Environment Provisioning, On-Demand Scaling, Failover/Resilience, Routing/Load Balancing, and even Data Service Operations (via BOSH). But deployment platform level support for microservices isn’t enough. Since microservices introduce distributed systems at the application level, they introduce new issues. Some challenges of distributed systems include things like Configuration Management, Service Registration and Discovery, Routing and Load Balancing, Fault Tolerance, and Monitoring.
Spring Cloud addresses these boilerplate patterns for distributed computing in the Spring programming model, built atop Spring Boot and NetflixOSS, and very soon—Pivotal Cloud Foundry. Just annotate your Java Class, and you’ve instantly got:
- A Config Server / Config Event Bus
- A Service Registry (Eureka)
- A Client Side Load Balancer (Ribbon)
- A Circuit Breaker (Hystrix)
- A Reverse Proxy / Lightweight API Gateway for Intelligent Routing (Zuul)
In conclusion, Matt touches on where will Spring Cloud go in the future. Pivotal and Netflix are collaborating to align NetflixOSS, and Spring Cloud OSS around the our respective roadmaps. Matt wraps up with a glimpse of some future capabilities that are being discussed:
- Alternative Stacks (Consul, Zookeeper, etcd, JRugged)
- Distributed Request Tracing/Correlation (Dapper, Zipkin)
- Stateful patterns (leader election, locks, state machine)
- Developer workflow improvements
- Switches (Canaries, Feature toggles, Surgical Routing)
- Missed Cloud Foundry Summit? Read the recap here.
- Watch more Pivotal sessions.
- See all sessions from the Cloud Foundry Summit 2015.
- Learn more about Pivotal Cloud Foundry.
I know you’re all going to be in a food coma in a few a minutes so we have to go and we have to go fast. There’s no way we can possibly complete this talk in 30 minutes, but we’re going to try anyway.
Very quickly, me, Matt Stine. They call me a principal software engineer, whatever that means. I do a lot of stuff at pivotal around Cloud Foundry in Spring. I spent a lot of time on airplanes as it turns out. Managed to get here on time which is good.
I wrote a book that you might have seen floating around. If you don’t have a print copy of this yet, and you want one, just find and attack the people at the pivotal booth and they should have some. I think they have a few hundred of those, so if you want to get a hold of that. This is my cloud native application architecture propaganda that you should all consume and believe and go do.
Andrew Shafer gave a talk yesterday. Who went to Andrew’s talk? Andrew’s talk was without doing this on purpose a great setup for my presentation. He actually helped me craft the abstract for this presentation and these are his words. He said, “Now that you have Cloud Foundry, what are you going to do with it?” It’s an important question right?
We have this platform, how we can use a platform in a way that’s actually going to give us what we want. The first thing that I would say is that, “If you’re smart, don’t try to microservices on day one. You should actually start with a monolith.” Actually, if I could build everything I wanted to build as a monolith and do continuous delivery and be agile and innovate and get all that right, I would because it would be a whole heck of a lot easier than doing microservices.
Microservices are hard stuff. Start with a monolith while you’re small, while you can. You’re going to build something new, this is where you want to begin small teams, small monolith, make it 12 factor. If you have something small, doesn’t matter if it’s a microservice or a monolith or anything else, making something 12 factor and having that contract between application and the platform such that they get along well with one another is relatively easy to do. It’s a lot harder to take something that’s big and been around for a long time and turn that into a 12-factor app. I’m sure many folks in the room have felt that pain, but we all know that eventually things get bigger.
We don’t really have a good answer at what point is it too big. In fact, microservices is a terrible word. When we talk about microservices, we see this micro thing and we start to think, “Something that I can count. I can tell you how big it is and I can measure it, and I can put a number on it like lines of code or number of operations or things like that.” Don’t do that, that’s all wrong.
Don’t do that at all. It’s not quantitative micro, it’s qualitative micro. Role, responsibility, capability, focus, scope, those things are really hard to measure because every person in the room and every context that you come from, you’re going to have a slightly different answer to the question of, “How big is too big that it’s time to start decomposing this thing into something that we’re calling microservices?”
What are microservices? Microservices has a lot of definitions. They’re floating around, lots of hype. I like Adrian’s definition best. He says, “The dreaded word service oriented architecture” in his definition. I thought that microservices were different than SOA, we’re not doing SOA anymore. SOA is bad and microservices are good. That’s wrong.
Actually, if you read Wikipedia, the first five paragraphs or so of the service oriented architecture article, there’s really great stuff in there. You read about what the focus of what we were trying to achieve with that was, it’s actually all very good stuff. Then we went and implemented it and we got kind of wrong because we missed the two things on either side of it that Adrian adds in his definition.
One, the idea of loose coupling because what do we do when did SOA? We went and found something really big to couple of ourselves too which was that ESB and we put all the important stuff inside the ESB. Then anytime we needed to change something we had to go through the ESB and all the enterprise architects cling to the ESB like barnacles and start to put up barriers to change. It doesn’t matter where monolith is if it’s an application or if it’s a bus. We were talking about this morning about how an F5 can load balancer can become a monolith as well. Anytime you start putting all of your stuff in one place, you can get into that situation.
Loose coupling means what? It means that I can deploy my service anytime I want to. I don’t have to ask you if it’s okay. Now, that sounds hard and it is hard. If I can get there, we would all agree that, that would be very powerful. One of the ways that we can get there is this idea of a bounded context. What’s a bounded context? Eric Evans started writing about this in Domain Driven Design 12 years ago. Who has read the Domain Driven Design book? Okay.
You don’t have to read the whole thing. Start about chapter 13. He starts talking about this thing called strategic design. You read about three chapters of that and you realize, this is a textbook on how to do microservices except he never uses the word microservices. Honestly, what we’re trying to do is just take principles and concepts that actually worked very well and arguably are part of the set of small things that we can say is true about software engineering. Very few things that we can say are true versus false. There are handful of principles out there, and these things start to feel they’re in that category.
You know that if a domain gets too big you go from one side to the other taking the same term, trying to understand what does that term mean in all of the different contexts along the way from one side of business to the other. How many different definitions do you come up with? If you come up with more than one, you don’t have a bounded context because the domain is not actually internally consistent across itself.
Pick whatever your central concept is. I was working with the airline customer, we’re talking about a reservation. It said how many different definitions or reservation do you have? Probably 17. If you’ve got 17 different definitions, can we all agree on one? Of course not, we can’t. We all have slightly different connotation. You have a different context whether it’s an order, or a movie, or something else. You have this central concepts that everyone treats a little bit differently.
What you do is you find context where if I keep the boundary here—all the words mean the same thing from one side to the other. That is the thing that either is big or small or somewhere in between and you’re going to come up with the different answer for every context in your business and you’re going to come up with different answers across every business. If you can get that right and then you can bound that thing with an API and you can say, “Nobody gets to know what’s going on inside the box. You only get to know what’s going on at the API level and you can create these bounded context that are loosely coupled. You don’t know what’s going on, you can’t couple the details. You’ve got this wall, this barrier keeping you from becoming too bound to the things that you’re depended upon and maybe you can start to deploy services whenever you want to.
Now, if that all sounded very complicated it’s because it is. There’s a sense in which you can’t just start doing this. I think the way Martin described this, this idea of you must this tall to ride the microservices ride is probably one of the best ways that I’ve heard this explained. If you can’t do these four basic things, you aren’t ready. If you can’t provision new environments in seconds or minutes. If you can’t monitor things reasonably well, if you can’t deploy a new line of code very quickly, and if you don’t have something that feels like a DevOps culture and Andrew talked a link about that. I’ll refer to his talk if you want to know what I mean by that then you probably need to come back next year. Fix these things first and then maybe you can go do that.
As it turns out, you start to look at Cloud Foundry and I talked about this last year. This is a nice relationship between microservices and Cloud Foundry. Not just anything will run on Cloud Foundry well, as it turns out the things that we build that feel like microservices tend to run kind of well on Cloud Foundry. You have these issues that you have to deal with when you start deploying microservices provisioning new environments, provisioning new code that something we sort of know how to do in this world. There’s a sense in which you bring these two things together. One doesn’t require the other but one can definitely help the other.
We have all these great features in Cloud Foundry that help us to deal with a lot of the concerns that we run into with microservices. I can provision code quickly, repeatably, reliably, I can scale, I can let the health manager or DEA or whichever flavor we’re on right now, take care of making sure that when things die that they come back. I can deal with a lot of my routing and load balancing concerns and I can run all the data services that I want to run. All the services, BOSH all the things and things are going to be working pretty well in this microservices world.
That’s not enough. Dave Syer who works on the Spring Cloud Project that I work on. He made the statement, at SpringOne last year that “no microservice is an island.” Doesn’t matter if we can build small services, it’s good but it’s not enough. Being able to build small services and deploy them is good but it’s not enough. Being able to build them, deploy them, and run them, and keep them running is good but it’s not enough. As soon as we start to decompose a monolith and as soon as we start to put network boundaries between the things that we’re building, we start to create these nasty things called distributed systems. Distributed systems are hard.
We start to run in to a lot of new challenges that maybe we didn’t have when we were writing code inside of a monolith. How do I get configuration information out to all of my microservices and then all of my scaled out instances of my microservices consistently and reliably? How do I discover where things are? Once I know where things are actually route traffic to them and do load balancing. Cloud Foundry does some of this but maybe we want to do things even more sophisticated with Cloud Foundry can do today. Obviously, I deploy more things, I have more things that can break. More things you’re running, the more likely something in your systems going to fail. If you don’t actually think about that, and something does fail and failure doesn’t just mean it died.
Failure might mean that the latency got to a point where I have enough load or enough stream service that I fill up all of my thread pools waiting on this thing to respond. Then that thing does die and then something dependent upon it dies and we have this cascade effect. You don’t run into that when you call a method that’s running in process when you get an exception. You do have failures but they’re very different types of failures, a little bit easier to figure out what’s going on.
There’s a sense in which monitoring becomes even more of a concern. If you have one app, I can plug a monitoring tool into it and I can pretty well tell you what’s going on. If I have 10 applications, I can plug a monitoring tool into 10 applications. I can pretty much tell you what’s going on. I have a 100 microservices deployed that are composing a distributed system and it’s only the composition of those things that gives me the behavior that I’m looking for. Where do I put the plug into that system to find out what it’s doing so that I can know how the system is behaving. There’s no physical thing that represents the system. We have a bunch of little things that are running around and the behavior that emerges from that composition is the system. How do you monitor that? These are the types of questions that we have to answer.
We need some representation of the composite system. I started being involved in conversations about this several months ago. The first thing that we’ve started working from was this idea of the big A app as opposed to the little A app. What’s the little A app? Those are the applications that we deployed at Cloud Foundry but everybody’s got some set of applications that they deploy to Cloud Foundry that we put a user interface in front of and the customer knows that as an app. There’s actually lots of little Cloud Foundry applications that are that thing. We say, “Okay, I need a representation of that, that I can manage and work with.” The first tool that we had to deal with that was a manifest.
A manifest can tell me a lot to solve this problem, I can name several applications in the code that produces applications and I can say: deploy this thing and bind this thing to these services and I can get something that looks like what I want. The problem with this is that it’s very static. This is a point in time description of what the system should look like right now. If I need to change that, I need to go back to Go and start again. I probably end up having to deploy this whole unit again. That’s fine, but when I get into the world of microservices I want the ability to be a little bit more dynamic and I’ll tell you why. Then I started thinking what I really want is something like BOSH but for applications, for microservices.
BOSH is very good at taking a cluster of things that happen to live on VMs and I can describe that thing as a system and tell BOSH, “Go make this so,” and it will go make it so. As things change in that environment it will keep it the way it ought to, the way I described it converged to some desired state eventually. Then I go in to my manifest and say, “Make me a little bit more of this thing and apply it,” and it will do that. Cloud Foundry’s manifest will do that to an extent.
Even then BOSH wants to own the whole thing. Here’s a cluster, make this thing exist but there’s no concept of, “Okay now, I want to without redeploying anything, split it into two things and have this manage the left half and this manage the right half.” When you start to talk about microservices, you have this topic out there of, it’s not just about technology, it’s about organization and people. You have this very strong idea of decentralizing.
If you think of BOSH, BOSH is centralizing management of a cluster. What I want now is decentralized management of a cluster. I want all of the pieces that form my composite system to actually be able to act autonomously meaning I want all of the teams to be able to deploy when they want. I want to be able to deploy my service whenever I want. If I have a change, I can deploy it now, I don’t have to wait on you. I can’t really do that in that world.
Andrew brought this up yesterday this idea that if you write it, you run it. If I write it and I run it, that means I’m responsible for deploying it. Again, I’m back to the square one of I have tens or hundreds of services and I have tens or hundreds of teams that are managing those services and deploying them and somehow I need to get a system out of that. All of the tools that I’ve worked with to this point in the Cloud Foundry ecosystem aren’t geared toward that. How do we create a composite system?
Netflix did this. Netflix started from exactly the principle that I just described. We’ll have multiple teams. Multiple teams will own their own services owning them from build to deploy, to run, to wear the pager, and everything in between. They still needed a way to compose that into a system and they wanted to compose it into a system in such a way that any of the components can fail at anytime but the system should keep working.
They did us the service of taking these components that they used to build a composite system, battle testing them in production, running two-thirds of the traffic in the evening on the internet for them, and then open-sourcing that stuff. Now, not only do we get to hear about how they do things, here is a lot of the code that we used to do it too, so go use that. You can go grab that and start using the Netflix code today but you have to figure out how it works, you have to figure out how to run it, and deploy it and manage on your own.
In the Spring team, several months ago the idea was had, what if we take these components and we take the Spring programming model that already has a huge number of java developers who understand it, enjoy it and productive with it. Let’s apply that programming model to the Netflix components such that I don’t have to relearn how to write my app, I can just say, “Now, I need this new distributed systems patterns that I’ve learned, in my app, if I just annotate things appropriately and configure things appropriately. Then now, I can go back to focusing on business code again and not worrying about all the distributed system goodness is going to work the way it should.”
We have all of these now, not just deployment level patterns but application and service composition patterns and this idea of deploying and running code. That’s what Cloud Foundry does but it doesn’t really have an opinion about what is the code that I’m building. We could run a composed fault tolerant distributed system on top of it but Cloud Foundry doesn’t really have an opinion about what that thing is.
What these patterns when we take Spring Cloud and we layer it on top of Cloud Foundry do is allow us to do exactly that. We have several components, I’m going to run through this quickly. I know that I’m running out of time already but we’ll see what happens. All of these also happens to work on Lattice and all of my demos are going to be on Lattice. By the way, if I change slides really quickly they’re already on the internet and I’ll tell you where they are. If you don’t get your picture I’m sorry.
The config server is a way for us to put of all configuration information in a central place. In this case we chose Git. Git’s really good at doing what? Versioning things, and making an audit trail of things, and that’s something that you really like to have for your configuration. I want to know what changed. I want to know who change it. When did it changed and I want to know maybe a reason for that. Git is very good of that, so why duplicate it? Let’s just put a service in front of it that can distribute that to applications.
We have the REST API in the config server and then we have a client binding inside of a Spring Application that knows how to take that information, reconcile it with any configuration that’s local, and create something that’s consistently configured across all of the system. Now, in my Git repository I have some description of what the configuration for the composite should be and then I can distribute that appropriately to all the individual pieces. Then I want to update that in real time, I don’t want to go through a deployment. What I’d really like to do is say this piece of configuration should change and that’s going to affect these small components in these applications. I’d like that to happen right now.
We add a component called the Cloud Bus to make that happen. The Cloud Bus is just a management backbone that happens to be backed right now by RabbitMQ. When I send a refresh event to something that participates in that Bus, what it will do is send a message to the Bus so that all of the other participating applications receive it. Let’s see if we can make this work very quickly. You’ll notice that I have a bunch of stuff. Here we go, bunch of stuff running on my Lattice cluster here.
Here’s my config server. The important thing that I want you to notice is that there’s a greeting property in here that happens to say right now, let make sure that that’s up-to-date. Right now it says, “Howdy,” because I’m from the south and I need to do that. I have another application here that says, “Howdy world” right now. I want to do a couple of things. First of all, let’s make this big as well. You will see that I have only one instance of this app that we’re going to show off. Let’s go ahead and scale that up to five instances and then let’s grab the logs for that one since it’s done.
While that scale is happening, let’s go in the demo.yaml, let’s change that greeting. What do we wanted to say? “Hola?” okay, we’ll make it speak Spanish. Sounds good. Let’s commit that to our repository. That’s out there. If we go back to our configuration server, we will see that the greeting has in fact updated to say, “Hola,” but when we go to our application it still says, “Howdy.” We’ve got another step and that is that we need to send the refresh event. If we say, “LTC logs disk config,” what I’m going to do is send a post event to this route Bus refresh. Not Bush, I make that mistake every time. There we go.
What you just saw in all those log events was each app received and here’s app number four it says, “Receive remote refresh request.” The result is that, and now it doesn’t matter which of these applications I get routed to. I now have that change distributed across the cluster. Let’s keep moving. Now, we want to find out where things are. Eureka is a service registry that allows us to do that. Very simple, application registers, consumer looks up what it wants to find, and is able to connect to it directly.
We have Eureka here and you’ll see that I have a bunch of stuff registered right now and if I refresh that, you’ll see that I have the five instances of the producer going. Now, we had a problem here and that you can argue whether it’s a problem or not but we have this thing called the router. The router wants all of the traffic to go through that. We don’t have these things talking to each other. This idea of consumer talking straight to producer doesn’t really happen and what we end up doing is registering the route for the producer in Eureka and then we can do this trip from ribbon that’s capable fully of load balancing. It’s going to load balance me to HA proxy here or whatever load balancer we have in front of this and through the go router and then down to the producer.
What we added in CF release 195, some environment variables that you automatically get in your app environment to tell you what your DEA, IP and port are. There’s an equivalent for Deigo for cell IP and port. Then also in 204 the ability to allow host access on the DEA. If I’m in a container, I can actually talk to another container on the same DEA or cell that I’m living on. After 204, we’re able to do this and Lattice has an equivalent setup that allows consumers to talk straight to producers. This may end up being the last demo that we’re able to get to but we will see.
Let’s do an LTC list again and let’s take our producer. One thing I want you to note about the producer, you see there’s no route assign. I can’t even talk to this producer outside of the VPC in which this Lattice cluster is living. I do have a consumer service and the consumer is basically just going to tell me exactly what that producer is doing which is in this case is producing an increasing counter sequence. I only have one, we keep going up a number every time. Let’s scale the producer out to say 10 instances. What those start up. We’re filling up our cluster nicely.
Once all these start up, I want you to pay attention to the logs. Right now, they’re not that interesting, still on the start up stuff. Very close to done here. Maybe I should have done five. I’m flirting with the clock. We should start to see some of what we want. Some of these are up and registered. You see I’ve got right now four instances of the producer app. As that’s happening, now what I should see as I’m hitting this, you’ll see that I’m actually load balancing across the instances of the producer but if you were to look in the logs—and actually now we’re in a good spot. You look at the logs, you see that we are in fact hitting the producer app but you don’t see any router logs. We’re not going to the router at all.
Now, we’re able to do client side load balancing inside the app. I’m going to have to get out of the way of the next presenter. Very quickly, there’s some other patterns I couldn’t show you. I wasn’t able to show you circuit breakers. Circuit breakers are state machine that protects you from those cascading failures and then Zuul is a component that allows me to do intelligent routing. You can learn more about those on Netflix and Spring Cloud website. We have a lot of other things that are coming. We want to be able to support alternative stocks like Consul, Zookeeper, Etcd, JRugged as an interesting circuit breaker library that Comcast has produced. We want to do distributed request tracing in the vein of what Dapper and Zipkin able to do, Leader Election, Locks, State Machine, stateful patterns.
We really want to improve the developer workflow for microservices, it’s not great today. We also want to do a lot around switching patterns. If you want to learn more, I put a bunch of links. The most important link is for getting to this talk on GitHub and download a pdf. It’s also on slideshare and then all of the demos that I was able to show are in that second repository. Like I said right after the talk, I will tweet the links to all of these things so that you can go get the slides. Thank you very much.
About the Author