Who are we?
We are the Greenplum for Kubernetes team. We’re working on a Kubernetes operator to run Greenplum, and connected components of Greenplum like PXF and GPText. We started with a controller for Greenplum, and recently used KubeBuilder to add controllers for PXF and GPText.
In our previous post, we reviewed some of the details we learned about what it takes to build a working controller with KubeBuilder. In this post, we take you through our journey of applying test-driven development within the KubeBuilder framework.
Testing
Our journey of testing our KubeBuilder operator has been a long, iterative process. At the start of that process, we thought it would be best to utilize and follow the patterns in the KubeBuilder scaffolding. However, as our project matured and we added features and accompanying test coverage, we encountered barriers to overcome. Initially, we struggled to know when our code under test was done executing. Then, after writing simple happy path tests, we wanted to test our handling of unexpected errors. Finally, faced with poor test performance, we decided to abandon the KubeBuilder test scaffolding altogether, and developed our own unit testing pattern.
Using the KubeBuilder Scaffolding
When we started our testing efforts, we thought it would be prudent to use the KubeBuilder test scaffolding. When generating a new controller, KubeBuilder generates scaffolding to start the test environment (testenv), create a manager, and add the reconciler to the manager so it will be able to receive requests. The testenv runs a real, ephemeral Kubernetes api-server and etcd to store state.
The controller manager watches for changes to the objects, caches retrieved objects, and batches reconciliations. Further the object must round-trip through the api-server before reaching the controller. Therefore it may take time for Reconcile() to be called and for its effects to be observed. Our tests were intermittently failing because of the unpredictability of the test system. We sought to find a solution for synchronizing our tests so we would know when Reconcile() finished running before we made assertions about the results.
We found a testing pattern from KubeBuilder v1 which we thought would help fix our flaky tests. In this pattern, you create a function that wraps the original Reconcile with a test Reconciler that writes to a channel when it finishes running. We created our own version of this method that we called ReconcilerSpy, provided here. Compared to the KubeBuilder version, it adds the Result and error from the reconciliation to the channel.
func ReconcilerSpy(inner reconcile.Reconciler) (reconcile.Reconciler, chan Reconciliation) {
requests := make(chan Reconciliation)
fn := reconcile.Func(func(req reconcile.Request) (reconcile.Result, error) {
result, err := inner.Reconcile(req)
requests <- Reconciliation{
request: req,
result: result,
err: err,
}
return result, err
})
return fn, requests
}
If you would like to utilize this pattern, provide the ReconcilerSpy to the Manager instead of your bare Reconciler when you set up the test environment. Then it will receive the reconciliations.
testReconciler, testReconciliations := ReconcilerSpy(&MyReconciler{})
builder.ControllerManagedBy(mgr).
For(&v1beta1.MyCustomResource{}).
Complete(testReconciler)
Your unit tests can then wait for the channel to receive, which will tell you when Reconcile has finished.
Expect(k8sClient.Create(ctx, myCR)).To(Succeed())
var r Reconciliation
Eventually(testReconciliations, 5*time.Second).Should(Receive(&r))
//make assertions
We noted there have been some issues
reported related to using the ReconcilerSpy pattern. In our estimation, the
common issue that users encountered stems from mixing up unit and integration
tests. Keep in mind the goal is to test the Reconcile()
function, not any
re-queueing behavior of the ControllerManager or other system component
functionality. For example, in order to test that a Reconcile()
will be
re-queued, just verify the returned Result value.
It seems that since the pattern proved difficult to use properly, KubeBuilder has removed the helper method, without (as of yet) providing an alternative. When used judiciously, we found the benefits of the ReconcilerSpy pattern outweigh the possible issues.
Using reactive.Client
The provided testenv and our custom ReconcilerSpy allowed us to write happy path test cases where we did not encounter errors, but it did not give us a good way to simulate errors. The recommendation we found from the KubeBuilder community was to set up the environment to actually produce the desired error. For example, Create() the object beforehand in order to cause an error stating that the object already exists when an attempt is made to Create() it again. However, we found attempting to generate errors in a real environment can be tricky and unreliable.
Testing that our code correctly handles unanticipated errors proved impossible when dealing with a real client and api-server. For example, in order to test that an error returned from Get() is handled, we must simulate the error condition. Specifically, consider Get()-ing the object to be reconciled. When using a real api-server, then we must Create() the object in the test case in order for Reconcile() to be called. The only way to force Get() to fail that we could think of was for the object to actually not exist. This creates a contradiction in that the object must both exist (for the code under test to run), and not exist (for the error condition). We were unable to use the provided framework to simulate this error case. Instead, we needed some way to inject errors.
Test-driven development benefits from having easy and reliable ways to inject errors into injected dependencies so we can verify our code can react to anything. Before using KubeBuilder, we had previously written a Controller using the Operator SDK framework. That framework uses the client-go api, while KubeBuilder uses controller-runtime. One feature that we had used extensively and missed from client-go’s fake client was its Reactors. Reactors intercept an action made on the Client and create different behavior from what would normally happen, so they are a great fit for injecting errors. For example:
When("getting the customResource fails", func() {
BeforeEach(func() {
reactiveClient.PrependReactor("get", "mycustomresource", func(action testing.Action) (bool, runtime.Object, error) {
return true, nil, errors.New("injected error")
})
})
It("returns and logs the error", func() {
Expect(reactiveClient.Create(ctx, myCR)).To(Succeed())
Eventually(testReconciliations, 5*time.Second).Should(Receive(&r))
Expect(r.err).To(MatchError("unable to fetch MyCustomResource: injected error"))
})
})
In order to inject errors for testing our KubeBuilder operator, we developed our
own reactive client that wraps the provided controller-runtime Client and adds
reactors. reactive.Client
embeds a testing.Fake
(the same one used in
client-go’s fake.Clientset
) which is used for managing and triggering the
reactors.
Like the client-go fake.Clientset
, we provided a default reactor. Our default
reactor, instead of using an object tracker to store and retrieve objects,
delegates storage to the wrapped Client. Delegating to another Client meant we
could start using reactors with a real client talking to the api-server, and
later change the delegate to a controller-runtime fake Client (see below).
The challenges creating reactive.Client
surrounded adapting the reactor
actions developed to the client-go Clientset to the controller-runtime
Client. The client-go fake client is generated, and can provide specific
concrete runtime.Object
implementations. The controller-runtime Client
eschews generated code, and so uses the more generic interfaces. Converting
between these two styles of API took some deep searching through parts of the
Kubernetes API we had not explored previously.
Taking our tests from 20+ seconds to <1 second
At this point, we felt we had figured out testing with “the KubeBuilder way”
augmented with reactive.Client
, but we weren’t satisfied with it. The tests
for our older controller that used the client-go fake package ran blazingly fast
(85 test cases in under ¼ second) while our KubeBuilder tests took multiple
seconds per test case. When doing TDD we run our tests very frequently (make a
small change, run tests), so when multiplied by the total number of tests, that
time difference caused us a lot of pain.
Our first attempt to run tests faster was to try to run the test cases in
parallel. We could not simply enable parallelism with the ginkgo test runner
since the tests share resources. We considered two techniques to isolate the
resources and enable parallelism: we could uniquely name our test objects on the
same testenv, or run multiple testenvs. If our tests all ran on the same
testenv/api-server, we would need unique names for every test object in order to
prevent conflicts. It would be easy to accidentally cause difficult-to-debug and
intermittently failing tests by naming an object the same as another test’s
object. To avoid this issue, we could instantiate a testenv for every test case.
In order to run multiple testenvs, we would need to instantiate and wrangle
multiple controller managers and associated ReconcilerSpy
functions and
reactive.Client
objects. The more we thought about the complexity involved in
such a test system, we missed the simplicity of our Operator SDK tests. Even
with parallelism, each test case still would take about 5 seconds to run, well
above our sub-second ideal. In addition, we started to question the value of
using a real api-server.
We wanted to get away from using the heavier testenv and instead strip things
down to use a controller-runtime fake client and call Reconcile()
directly.
Since reactive.Client
wraps an existing client, it was pretty easy for us to
substitute in the fake client from controller-runtime rather than the real
client connected to the testenv.
At this point we were able to call Reconcile directly and achieve sub-second test runs.
Before
Expect(k8sClient.Create(ctx, pxf)).To(Succeed())
Eventually(pxfReconciliations, 5*time.Second).Should(Receive(&r))
Expect(r.err).NotTo(HaveOccurred())
// make assertions
After
Expect(reactiveClient.Create(ctx, pxf)).To(Succeed())
_, err := pxfReconciler.Reconcile(pxfRequest)
Expect(err).NotTo(HaveOccurred())
//make assertions
Conclusion
At the end of our testing journey, we’re happy with the rapid iteration cycle
provided by direct unit tests using a fake Client with reactors, and
ReconcilerSpy became unnecessary. Sub-second unit tests provide immediate
feedback, meaning we can add features gradually. This leads to clearer tests and
implementation that reveals itself naturally. Our experience with the
KubeBuilder test scaffolding reinforced old lessons of unit testing: focus on
the unit of testing (your Reconcile()
method), and inject dependencies
(controller-runtime Client
) that have the behavior (reactors) needed to
implement the scenario.