I’ve written before about the value and the challenges of collaboration between data science (DS) and application development. One key lesson from experience was that data scientists and engineers need a level of independence while staying on the same page and not breaking each other's code. In this post I will dive deeper into how we achieved that independence and reliability through microservices with contract testing.
As a data scientist at Pivotal I worked on a fully-balanced team (DS, engineers, product managers, and designers) to build a tool for planning and optimizing a manufacturing pipeline. The core of the web application was being built in C#/.NET for IT-organizational reasons. In contrast, the data science team wanted to use Python for its rich ecosystem of data science tools. We also anticipated that each sub-discipline would have different iteration cycles. Independence of tools and cycles aligns with Nathaniel Schutta's reasons for when you would want to use microservices. A microservice is an application that provides capabilities, running its own process and communicating via a RESTful HTTP application programming interface (API). This type of API might look familiar to you if you have used the Twitter API or Google’s AI APIs. You send a request to a URL (e.g. a POST request with JSON) and get back the response as JSON. Google’s APIs are useful, because you can interact with them in a dependable, repeatable way. You don't have to worry about how they implement the service. They can improve the service’s quality on their own schedule without changing how you interact with it.
Luckily, for my project we could use Cloud Foundry, a platform that facilitates deploying these kinds of microservices. We chose to build an optimization application in Python that the C# user-facing application could call via the REST API.
Contracts: Agreements On Input and Output
A contract is a guarantee that the API provider makes to the consumer. If the consumer provides input in a specific type and format, then the provider will return output of a specific type and format. A consistent interface is as helpful in libraries within a single language as they are to web applications. Scikit-learn is an example of a popular data science library with a consistent API. You can take any kind model and call `.fit(X, y)` with features X and labels y, and you get a trained model. Then on that trained model you can call `.predict(new_data)` with new data and get predicted values. As long as the input feature matrix X has the same number of columns as the `new_data` matrix, then it works. You get an array of predictions the same number of rows as the `new_data` matrix.
Using matrices as the input and output formats can get confusing. It works when data scientists prototype models locally, but it can cause problems when providing a trained model as a service. Which feature does each input column represent? A matrix doesn’t have column names, so a consumer of the service could accidentally send the wrong features or in the wrong order and still get predictions. Unfortunately, the predictions would be completely invalid. To avoid ambiguity, a more self-describing format such as JSON is better. The observations would be items in a list where each item is a mapping of column/field names to values. A given version of the API would guarantee that if you give records with the expected field names and value types you’ll get back a list of predicted values (i.e., the contract).
Contract Tests Build Clarity, Independence, and Confidence
Creating automated tests of the API contract provides clarity, enables independence, and builds confidence. A clear contract means that the data science API and consumer know what to expect, and the data science side isn’t tied to one particular implementation or model. The models or implementations can change as long as the interface remains the same. This way, the initial model could be a random number so that other applications have something to develop against. Later, when DS swaps in a better model, it will immediately make the consuming applications better without any changes on their side.
By having contract tests in place, the data scientists can make changes with the confidence of knowing immediately whether that change will break the consumer’s code.
An added benefit of contract tests is it enables the DS API to provide more specific error messages. Typically, if the consuming app passes in unexpected input, the DS function or API will fail and return a generic something-went-wrong error (e.g. HTTP status 500). But such a generic error could mean a problem with the input or a problem with the DS code or environment. It would be better to give more specific feedback. For example, input was missing a required field or its value was of the wrong type.
Avoid Pitfall of Testing For Specific Values
Hopefully I’ve sold you on contract testing. So how do you do it? In my project I ran into some pitfalls before finding a good solution. The biggest pitfall one I fell into was to test for specific values. At the highest level you might imagine maintaining canned input and output files and testing whether your API accepts the input and gives back the exact same output string. This is very brittle for many reasons. First, recall that we were using the JSON format. Whitespace differences do not matter in JSON, and it can encode mappings (i.e. dictionaries), where the order of the keys should not matter. When testing for an exact match, it would fail for differences that are not meaningful. Second, the data science component can (and should!) change and improve over time, or you might add or subtract optional metadata. Having to constantly update the contract test for specific model results is cumbersome and ties your API tests to implementation details. Instead, your API tests should only need to change when there is a change to the actual interface, not implementation details.
JSON Schema Makes Contract Tests Robust
One of the software engineers on the project guided me to a more robust way of testing API contracts that use JSON input and output. The solution was JSON Schema, which is a standard for specifying and validating the shape and properties of a JSON document. The benefits are that it can be used directly in contract testing, and consumers of the API can use the same schema in their tests as well. A JSON schema is itself written in JSON, and validators exist in a variety of languages. This was super useful for us, since we were using two different languages--Python and C#. We could both test against the same schema specification using our own language of choice.
JSON schema can specify a rich variety of constraints. I will cover a few to give you a taste. You can specify named properties (dictionary keys) and the data type expected for the values. You can make certain keys required, optional, or prohibited. For numeric properties you can specify a minimum and a maximum allowed value. JSON schema allows for references and recursion, so you could define a person object that has a “siblings” key, which in turn is a list of person objects. Below is a list of a few more constraints, but this is far from exhaustive (see the docs for the complete list).
Additional JSON Schema Constraints:
Conditional or Boolean logic
Quick Start For JSON Schema Newbies
Since JSON Schema was new to me on that project, I was a little anxious about how to get started quickly. Thankfully there are schema generators out there to help you jump-start the process. Instead of learning how to create a schema from scratch, you can paste an example JSON document into quicktype.io and generate a basic schema. You can customize it to add or change the constraints, but even just using the basic generated one is a good start. Quicktype also has a command line version that can run offline locally in case you have security/privacy concerns about your example JSON document. Other tools exist for generating JSON schema, as well.
Since I was using Python, I used the jsonschema Python package. It has a simple validation method to test an instance of a JSON document against a schema. It has an iter_errors() method in case you want to enumerate all the errors, which is useful for making your API return more specific error messages.
Different Tools Or Boundaries? Still Test Your Contracts!
The core concepts here are relevant beyond HTTP and JSON. The boundary between teams could have been drawn differently. The lesson still stands: define a contract between the teams and include tests of that contract in the test suite. Doing so will solve a lot of coordination headaches, providing independence and confidence to all involved.
Frontend Contract Tests without Magic Numbers (Pivotal Engineering blog)
API First for Data Science (Pivotal Engineering blog)
My PyData NYC 2018 talk: Contract testing in Python for a data science API
About the Author