Batch microservices

What are Batch Microservices?

A batch Microservice is a type of short lived process that launches as the result of some trigger (typically a clock), or executed on demand.

The following Spring projects (along with Spring Boot of course) support building Batch Microservices:

Spring Batch

Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs through optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

Key Concepts

See the Batch Domain for more details.

  • Job - A batch job composed of a sequence of one or more Steps.
  • JobInstance - an instance of a Job configured with parameters (requested, for a specific date for example). A collection of JobExecutions.
  • Step - An independent, discrete unit of work in a Job. A simple Step might load data from a file into the database, but may be more complex.
  • JobExecution - An single attempt of a Job run (may be in progress, succeeded, or failed)
  • StepExecution - A single attempt of a Step, associated with a JobExecution (may be in progress, succeeded or failed)
  • JobRepository - Persistent storage for maintaining the Batch Domain model and associated state
  • JobExecutionListener - Extension point for customizing JobExecution events
  • StepExecutionLister - Extension point for customizing StepExecution events
  • Remote Chunking, Partitioning* - Patterns for scalable distributed batch processing, see the docs for details.

Spring Cloud Task

Spring Cloud Task supports the development of short-lived Microservices. In general, these perform simple tasks on demand and then terminate. Batch applications are just one example of where short lived processes can be helpful. Spring Cloud Task records lifecycle events of a given task. The lifecycle consists of a single task execution. This is a physical execution of a Spring Boot application configured to be a task (annotated with the @EnableTask annotation).

Spring Cloud Task requires a SQL database, for a TaskRepository, similar to the Spring Batch JobRepository. The following databases are supported:

  • H2
  • MySql
  • Oracle
  • Postgres
  • SqlServer

Spring Cloud Task includes the following features:

  • @EnableTask annotation to auto-configure a TaskRepository and SimpleTaskConfiguration by default.
  • Integrates with Spring Batch enabling you can launch one or more Spring Batch Jobs from a Task
  • Supports highly scalable, distributed batch architectures (Remote Chunking, Partitioning) running as cloud native Microservices
  • Integrates with Spring Cloud Stream, so stream events may trigger Tasks, and Tasks events may be consumed by streams

See the code samples below.

Spring Cloud Data Flow

Spring Cloud Data Flow (SCDF) is an orchestration layer on top of Spring Cloud Task (and Spring Cloud Stream) for batch processing. SCDF uses a Deployer abstraction to allow deployment of distributed Microservices on different runtime platforms, such as Pivotal Platform. It also provides a DSL for defining tasks using registered applications as named resources, and a management UI.

Code Samples