There are many ways to package and run workloads in the cloud. The newest, and most interesting is functions, or serverless.
Industry analysts, researchers, and the tech press agree that the serverless wave is happening. The hyperscale cloud providers have all started offer “functions as a service,” coupled with the rest of their cloud service offerings.
Pivotal’s contribution to the functions movement is riff, an open-source project, recently unveiled at SpringOne Platform.
We’ve seen a surge in interest around riff since its release. It is easy to learn and get up and running with riff. In this post, we will quickly walk through what project Riff is, its benefits and a few use cases that may spark some ideas.
I was able to learn and understand all of riff in about 2 hours. riff is also great because I can use a standard container listening on port 8080 to back my functions. I also like the function side-car approach; makes it really easy to add support for new language runtimes.
— Kelsey Hightower (@kelseyhightower) February 12, 2018
Why Functions? And Why is it Called “Serverless”?
Serverless refers to the fact that developers or operators do not need to provision or maintain the underlying infrastructure needed to run functions. The model is popular for the reason any new piece of developer tech goes mainstream: it’s a higher-level of abstraction that makes life easier. With functions, a developer can execute a small slice of code in response to an event, called a trigger. When the event occurs, the code runs, performing its tightly-scoped function.
How does this make life easier for your engineers? A few ways.
-
The function is narrowly defined. It’s intended to perform a specific, simple job. This eschews a more ambitious (and cumbersome) scope. (Value Stream Mapping and lean principles tell us that smaller iterations lead to big velocity improvements.)
-
Event integration is built-in. You don’t have to manage this separately.
-
Operational efficiencies. Applying functions to distributed computing automates event-based scheduling and self-scaling. That means less grunt work for ops teams.
As an added bonus, it can save you a boatload of money on infrastructure, either on-prem or in the public cloud. Compared to code running on virtual machines and containers, functions consume fewer resources because they don't run when idle, and they scale based on actual load.
Meet riff
riff is a service for executing functions triggered by events. Like Cloud Foundry, riff can run on-premises and in the public cloud. Here’s a look at the key features of riff:
Portability & Kubernetes-native support. riff extends Kubernetes (k8s) by defining custom resource definitions. These are represented in YAML and posted to the k8s API server. You can run riff locally, and anywhere K8s runs. Here are a few examples of riff running atop a K8s services in the public cloud like GKE:
Function-as-a-service @projectriff: Here is the sample 'Greeter' function running on @googlecloud Kubernetes Engine (GKE) #SpringOne #GKE @pivotal https://t.co/4RImVhhlCW pic.twitter.com/GNE0EuyJLW
— Guillermo Tantachuco (@gtantachuco) December 7, 2017
Or Azure Container Service:
Function-as-a-service @projectriff can run on any @kubernetesio cluster. I got to run a the sample 'Greeter' function on @azure container service (AKS) #SpringOne @pivotal https://t.co/4RImVhhlCW pic.twitter.com/q64iWbJudr
— Guillermo Tantachuco (@gtantachuco) December 7, 2017
Support for many languages. Since functions are packaged as containers, they can be written in a variety of languages: All you need is a function invoker for the language you are using. riff already has invokers for Java, NodeJS, and Python. There is even a command invoker for running native executables and shell scripts.
1st class event streaming. riff provides functions with built-in event integration. Developers will love this feature; it frees them from the toil of wiring up connections to message brokers like Kafka and RabbitMQ.
Scalability. riff scales your functions automatically based on event volume. Functions can scale from 0 to 1, from 1 to N, and back down to 0 when there are no events.
When to Use riff
Functions, as a programming model, are actually quite old. So even though function services are new, organizations across all industries can use riff today to explore how to use serverless to address real-world use cases.
What might some of these use cases be? Glad you asked!
-
Web events: process online forms; react to a given event from your SaaS providers (webhook).
-
Event-based integration: bio authentication for mobile devices; scheduled tasks such as data cleansing or ETL (extract-transform-load).
-
Internet of things (IoT): online fraud detection; ingest, transform, filter and process, constant streams of data from lots of devices such as point of sales devices, medical devices, smart meters, and cars.
-
Machine learning: process images or videos for facial or object recognition; integrate chat bots or digital assistant services with natural-language processing and machine learning back-end services.
-
Security: log analysis to search for specific events or patterns.
Let’s take a closer look at one of these scenarios.
Real-Time ETL Behind the Firewall
In this data pipeline scenario, an organization receives data from multiple upstream sources, via streaming and scheduled feeds. The pipeline performs real-time ETL (extract-transform-load) using functions that run on riff to keep their system of record up-to-date.
The platform operations team of this organization has installed and configured riff in their own data center. Figure 1 depicts the high-level architecture of the solution.
Figure 1. Real-time ETL with Project riff
In this example, ETL developers wrote the Extract and Transform functions in Python. They used Java for the DB Load function. The functions are packaged in Docker containers and deployed using kubernetes resource definitions. Using similar resource definitions, they also declared four (4) event topics: Raw Data, Valid Data, Enriched Data and Error. (riff’s current underlying event broker is Kafka; there will be other pluggable implementations in the future.)
Now, let’s examine the solution in more detail.
-
There are three (3) upstream sources: i) Text files, which are sent by a partner every 2 hours; ii) legacy database, which several users update throughout the day; and, iii) JSON data sent in real time by the streaming API of another partner.
-
Each upstream source has a corresponding data collector service that posts the raw data to riff’s HTTP gateway. This approach turns incoming data into events on the Raw Data topic. (To make data collection even easier, future versions of riff will provide mechanisms to code data source functions for integration with external services. Such functions would be deployed and scaled like any other. Further, source functions would interact directly with the topics, and wouldn’t need to connect using an HTTP gateway. The input events for such "source" functions would be lifecycle triggers and/or metadata such as query parameters.)
-
riff’s Function Controller monitors event activity. So as soon as any activity occur on the Raw Data topic, it scales the Extract function from 0 to 1. Depending on the event volume, the Function Controller scales the function up to N replicas, where N is the maxReplicas property in the function’s YAML configuration file. (Alternatively, the default maxReplicas value could be derived from the number of partitions on the input topic.)
-
The Extract function listens to the Raw Data topic, validates the Raw Data event and sends it to the Valid Data topic.
-
This time, riff’s Function Controller scales the Transform function from 0 to 1. This function enriches the Valid Data event using custom logic, and sends it to the Enriched Data topic.
-
Finally, the Function Controller scales the DB Load function from 0 to 1, which stores the Enriched Data event into the system of record.
-
If, for some reason, errors occur during the event processing, the functions will send the event, along with some re-processing metadata, to the Error topic.
It is important to point out that riff provides first-class support for event stream processing. For instance, you could use windowing operations to emit aggregated counts collected over fixed time intervals.
Getting Started with riff
By now, you’re probably itching to get started. The Project riff website is the best place to start. The team has created tutorials to run riff on Minikube, to run on GKE with RBAC and without RBAC. We’ve also built a collection of demo functions and helper scripts to get up and running with riff. That repo is here.
If you have an interest in taking @projectriff from @pivotal for a spin. This Repo https://t.co/gvCq3EGp0B will set EVERYTHING up for you (assuming you have Docker) and let you test it out on Minikube. Props to @BrianMMcClain and @moredeploys.
— Dan Baskette (@dbbaskette) January 2, 2018
Want to try event streaming? Take a look at this repo to see a riff sample involving functions of finite streams that maintain stream-related state
This @projectriff sample program contains a puzzle to test your logical thinking, but you'll need to know or learn a bit about reactive streams to find all the answers: https://t.co/2dE6p2bTaL @ProjectReactor
— Glyn Normington (@glynnormington) February 22, 2018
Please take riff for a spin and let us know what you think! You can find us on Github and Twitter. Enjoy!
Project riff is the basis for the Pivotal Function Service which is a commercial product, due out later this year. Contact us for early access!
Special thanks to Jurgen Leschner, Mark Fisher, and Jared Ruckle for their help on this post.
About the Author
Follow on Twitter