KubeAcademy by VMware
Network Policies
Next Lesson

Join 80,000+ fellow learners

Get hands-on practice with Kubernetes, track your progress, and more with a free KubeAcademy account.

Log In / Register

In this lesson, you will learn how to place controls on the network communications into and out of your pods with Network Policies.

Eric Smalling

Docker Captain and Senior Developer Advocate at Snyk

Eric is a Docker Captain and Senior Developer Advocate at Snyk.io where he helps developers secure the applications, containers and Kubernetes platforms they build and deploy to.

View Profile

Hi, welcome back to the KubeAcademy course, Networking in Kubernetes. I'm Eric Smalling, a staff field engineer at VMware, and in this lesson we'll go over how Kubernetes network policies can be used to constrain traffic into, out of and within the cluster. It's a pretty big topic, so I'm going to split this up into two halves. First, we'll discuss a standard application example and how Kubernetes network policies can be leveraged to secure it. Then I'll get hands on and demonstrate how to implement that.

Now take note that not all CNI providers implement the network policy API. So if you want to leverage this feature, make sure you install one that does. Some of the more popular ones that provide implementations are Calico, Cilium Or Antrea. Also, while we won't be digging into how they implement the policy API is under the covers, note that each CNI plugin may implement it in very unique ways. We have another lesson in this course that compares a few CNI providers and dives into details, including their network policy implementations and extensions.

Let's say we have a simple multi tier web application with a collection of front-end web servers. These front-ends make connections to a pair of backend business services by HTTP. One of these back-end services sends traffic to a database hosted elsewhere in the company's network. And the other one makes connections to an internet based web service. We've got a few security concerns that we want to enforce. So let's segregate the networks each tier is deployed into. And we have a front-end and a backend network and we'll say that the database is on its own network. Obviously we only want the users to be able to connect to that front-end website. Additionally, those backend services have no need to initiate any connections to the front-ends either.

Now there are several design solution possibilities for this, but we'll keep it simple here and say that border firewalls are set up that allow only the traffic we've described so far, and a load balancer exists between the firewall and the front-end application. This setup limits external traffic to just that front-end application and thus protects the back-ends from attacks such as unauthorized access, CVE exploits, denial of service and so on. Now we're going to assume the database network is already secured and only accessible from inside the company.

Next, we want to make sure connections out of the applications, also known as egress traffic, is constrained as well. Only the database backend service should be able to send traffic to that database. And only the other business service should have access to that external internet based web service. So we end up with many concerns, all being controlled by various border firewall ACL's and load balancer configurations.

Because of the sensitivity of these systems, the process for requesting configuration changes to them usually involves multiple teams and manual out of band configuration tasks which can lead to misconfiguration. One of the benefits of Kubernetes network policy is that it's completely software defined. Changes to its configuration follow standard software delivery life cycle steps and deployment can be easily automated. With proper governance, this means that what's defined in source control will match what's configured in the live cluster and changes can be more easily traced back to their origin. It also makes it easy to test the configurations before they're applied in a production environment.

By design 100% of traffic inside of a Kubernetes cluster is open and allowed between pods and services. Let's take a look at a sample Kubernetes cluster and how we can add policies to meet the requirements of our use case. One of the benefits of using Kubernetes namespaces is that it allows us to segregate the rules about traffic flow for each tier inside the cluster, eliminating the need for physical network separation and external firewalls between the tiers. Kubernetes or more correctly the installed CNI plugin will deal with the implementation details of how to constrain the traffic and it will automatically be updated as workload scheduling changes over time.

In this example, I've defined a pair of Kubernetes namespaces that map to the tiers. We have a front-end namespace for the web app and a backend namespace for those business services. Now you don't have to use namespaces in this way, but it's common for different tiers of applications like this to be owned by different teams within an organization. And having them segregated this way is a clear delineation of responsibilities between those teams and apps.

A practice that I find useful is the zero trust pattern where all traffic is initially blocked and then we add only the minimum access our applications need. Since Kubernetes network policies are allow list only, to implement it we basically start by allowing nothing and then add additional rules to that. Also, note that these policies are namespace scope, so anything we apply has to be done to each namespace we care about.

When we start talking about allowing connections from one point to another, it's important to understand that the Kubernetes policies have an implied allow establish rule applied to them. This means that if we declare traffic to be allowed from the front tier pods to the back, the response on that established connection will automatically be permitted. We don't have to create a rule for that return traffic.

First, let's talk about ingress. By implementing a zero trust ingress policy, you can see that everything will pretty well be broken as traffic into the pods is blocked. Now the traffic to the database and external web service. We'll deal with that in a minute when we talk about egress. The first thing we need to do is let customers get to that web app by opening ingress from the external networks to those pods, which we can refer to by a label selector. By moving this firewalling into Kubernetes, we're already getting benefits because as pods are scheduled and moved around in the cluster, network policies will be enforced dynamically at runtime. No more hard setting IP ranges in the firewalls or load balancers where the app's deployed as needed.

Next, we need to allow those backend pods to receive traffic from the front-end tier. Notice a new one constraint we've gained here is that neither of the back-end services can talk to each other now. In our case, that's fine. But in a real world scenario it might be overly restrictive. It's worth noting that we achieved this isolation without external network, physical or logical isolation. Okay, that handles the ingress rules. Now let's shift our attention to egress traffic. Just like before, we'll apply a zero trust egress rule that will block all traffic from leaving all pods. All traffic trying to leave the pods is now blocked, so we've broken everything again. We have however, fixed the issue of inappropriate access to the database and external web services though.

The first thing we're going to fix is to allow the web pods to send traffic to the backend namespace. Then we're going to go to the DB service and give it and only it, access to the database.

Finally, we're going to give the external access to that web service to the EXT Service. So our finished Kubernetes implementation could look something like this. The network segmentation is replaced with namespaces and internal firewalling is now being performed by the CNI providers network policy implementation. And of course the border firewalls outside of Kubernetes cluster will still exist. The main difference now is that they and any load balancers out there only have to refer to the IPs of the cluster end points as they no longer will have to track where the application servers are deployed.

Now let's dive into a hands-on demonstration of implementing these policies in a cluster. I'll be setting up a zero trust-based set of policies to block all ingress and egress traffic, and then open up just the connections that our application needs. You can see here that I've got three deployments, a web, DB service and EXT service between a front-end and a back-end namespace. This pretty much maps to the slides I showed you before. Also I have this test script I'm going to be running throughout the demonstration, that goes out and checks connectivity between all the pods so all three trying to connect to all three, as well as connecting to external things. This IP address is my database. So my SQL database is running on my laptop, but outside of the Kubernetes cluster I'm using. And this is Google.com. We're just using that as a proxy for an external web service out on the internet. We also have three connections from the outside in to the cluster.

So can we get to the web, the DB service and the EXT service? And right now the test results are showing that everything can connect to everything. And like I said, it's wide wide open. And every one of the services is visible to the outside world and can be connected to. So that's not what we want. Let's get started on our policies.

First, let's implement our zero trust policies to lock everything down. What we see here is a manifest that has two policies, one for the front-end namespace and one for the back-end, because remember I told you they're scoped by namespace. So we have to have one each. And it's a really simple policy saying that pod selector empty dictionary, which means everybody. Every pod in the namespace has an ingress of empty. And remember this is allow list. So that's saying allow nothing, which conversely means block everything, right?

The egress, it's a little more nuanced here. We do need to have DNS in order for service discovery to work. So we are permitting that, but that's the only thing we're allowing. So we're saying allow UDB traffic on port 53. And that's the same for the front and the back tier. So let's go ahead and apply this, and run our test again. Okay. We can see that everything is pretty well broken, as all traffic into and out of the pods is blocked. Before we fix anything though, this brings up an important topic about egress. If you're going to use it, you have to come with every connectivity need your pods will have. And I mean everything. As you saw in this example, I opened up DNS because we need that for service discovery. It's a low-level need that all pods are going to have, but you're going to need to find out everything they're going to need. NTP, NFS, active directory, et cetera.

Every one of those needs to be in the allow list. For this reason, many teams choose not to use egress policies. Not because of any fault with the way it works, but because it's simply more restrictive than they really want. That being said, let's go ahead and fix our app now. The first thing we want to set up is our web app's policy. So let's go take a look at that. Here we see a network policy named Web app, that is applied onto the front-end namespace. Pod selector is saying match any pod with the label run web. So that's going to look at the front-end namespace, any pods running that have that label selector on it, this will apply to. There's two types of rules. There's an ingress and egress.

The ingress rule is made up of a single one. It's a from IP block. Now this cidr range that I'm allowing is really big. The virtual machines I'm running in my laptop, this is what they'll come in at. So every time I run this demo, I could have a different IP address. It's going to be in that range though. It's probably wider than I need, but it works for the demo. But this is saying, let traffic from my laptop or my virtual machine, wherever it could be running in, but only on TCP port 9,000, don't let any other traffic in. And on the way out egress traffic can go to the namespace selector that matches the label tier back. Now that happens to be the back-end namespace that I've created. Now, Kubernetes does not give you by default, a label on a namespace. When I created my deployment the namespace I gave it that label, tier back. So it's going to find that namespace it's going to allow any traffic on TCP port 9,000 out. You'll see, that's what using for all my services.

So let's go ahead and apply this and take a look at our test, and we'll run that test again. Okay. With this in place, the outside world can indeed hit the web pods but nothing else. They can't connect to the database or EXT services. So that's working. We still don't have connectivity to the back tier services from the web tier. In fact, no connectivity at all. And that's because the ingress policies of those backend services is still in a zero trust state. It's not allowing anything yet. So even though we opened the egress to the back namespace, it doesn't matter because they're not letting it the traffic in once it gets over there. Now before we fixed that, notice we didn't have to set up any egress rules to allow the response from the web tier to the outside client. This is an example of the implicit allow established rules in practice.

The Kubernetes cluster is setting it up so that return traffic is able to come back out to the client without me having to explicitly state it. That's really nice. So now, let's get to the database service and get it working. Here's the database policy. And you can see that it's named DBS VC. It applies to the backend and it's matching pod labels that have run DB service, which just like the web had run web, my database says that. The policy types ingress and egress, ingress is saying, allow traffic from the front tier namespace. By label again, I had to apply that to the front namespace when I made it, but there is a tier front. So it's going to say anything up in that tier, go ahead and let it in.

On the back we're letting traffic go out to the database. Now you'll see this same cidr I showed before. Again, that database is running on my virtual machine, it's not in the cluster, it's in that range. I could have narrowed this down. In a real world scenario, this would be more likely a much smaller range, and it would point to your database rack or wherever your database team hosts that database.

So the ports that'll allow... This is my SQL, so TCP 3306 makes sense. Now the eagle-eyed of you may notice I didn't specify a port here on my from. It's just to show you, you don't have to. I probably would, if this was a real word scenario, but you can see that it's optional. I'm saying just let any Traffic in. Let's go ahead and apply this though and run our test. Great. Now the database is accessible to the database service pod and only the database service pod. Also the web tier now can make a connection to the DB service.

Now, even though we're were restricting the database service from connecting to the web tier, which is what we want. The connection from the front-end does work because of that allow established rule. The roundtrip traffic from the web tier to database service and back is automatically set up for us without any work from us.

Finally, we need to do something similar for the external service. So let's take a look at that. So here for the EXT service, we have a very similar to the DB service. We've got our pod selector for EXT services time. We have an ingress allowing anything from the front tier to get in. For egress, you'll see that we're allowing two kinds of traffic, port 80 or 443, both TCP. Now you'll also notice I do not have any kind of IP block aware I can send traffic. That is because we don't own the IP addresses that the Google loads in this case. You're Going to run into this a lot with any externally internet based kind of a thing is that unless you are guaranteed what those IP addresses are, they can change at any time. Because you don't have any control over it. DNS can rotate different ones in this is very common in cloud providers where they're adding IP blocks all the time. In order to provide security because right now this could call out to anywhere on 80 and 443.

Most companies will have some kind of external proxy or something that will say, "Hey, I understand what google.com is. And I'm going to restrict to that". There's also some third party CNI capabilities around host name look-ups, but anything you implement, whether it be a host name lookup through CNI or a Proxy is going to add some latency. It's going to add some complexity to the connection. And you just have to go into that with your eyes open. It's nothing different than connections today that you're making out of an app that's not in Kubernetes. Just wanted to call that out.

Let's go ahead and apply this and run our tests. Okay. Everything looks great. We have the web tier able to connect to both of the back tiers, but nothing else. The database service is able to connect to the database and nothing else. And the EXT services able to connect to google.com and none of the other things. From the outside world, only the web tier is available to us. The DB service and the EXT service still can't be connected to by the outside world.

So there you have it. We now have the same network constraint use cases we had talked about before implemented by four fairly small policy definitions. And because it's all implemented using tags selectors, as pods move around the cluster, Kubernetes will take care of keeping the configurations up to date on the fly with no human interaction needed. For this demonstration, I've been using the Calico CNI, which like other CNI plugins provide some extra APIs over and above what stock Kubernetes has. Of note is the global network policy, which as its name indicates is a cluster wide policy.

To use this, I could leverage the Calico CTL tool and apply it, or just use their CRD definitions and include something like this manifest, eliminating the need to have it in every namespace. This kind of thing can be used by your cluster admins to apply that zero trust to the entire cluster complete with the UDP port 53 egress rule we talked about earlier. And then the application teams will just provide their own policies to overlay it using this example however, would couple my manifests to Calico. So if we ever wanted to deploy a different CNI, there'd be work needed to re-implement our application for it.

There's many other enhancements that CNR providers like Calico provide. Refer to their documentation for details. As we wrap things up here, know that we've only touched on a small subset of what you can do with Kubernetes network policies. There's an excellent GitHub repository that Ahmet Balkan has put together, that describes several patterns you can use. Many of which include animated diagrams like this one, which illustrate network traffic flows. Also, Josh Rousseau did an entire episode of the TGI Kubernetes show, where he deep dives into these policies and I highly recommend watching it. Finally, here are links to some of the pages I've discussed during this lesson. The rest will be in the show notes below this video. I want to thank you so much for watching. We'll see you in the next one.