Why are engineering teams willing to invest in this effort? After a few decades in production, monolithic applications tend to show their age, with these attributes:
Too difficult to enhance
Changes introduce long testing cycles
Different teams with colliding or conflicting requirements, changes, and scheduling demands
Applications that are difficult to scale become a performance bottleneck
There is a proven playbook to deal with these applications. But what about the data? Sure, enterprise teams often make an effort to decompose monolithic applications into microservices. However, data management often remains the same.
Thankfully, this is changing. Many teams now realize that the goal is for both the application and data to be handled as a single microservice.
Challenges with large data monoliths
Why does the data layer need to be transformed anyway? Three reasons stand out:
Data monoliths are brittle and change-resistant, a mismatch when paired with agile application development. Schema changes can have ripple effects.
They are massive. When you consider all the database instances needed (dev, test/QA, prod, performance testing, training sandbox), the total infrastructure maintenance costs add up. Procuring resources adds wait time.
They become a drag on performance as they scale. Why? Because they support a large number of applications with their own schemas. The problem is compounded by the frequency of ETL jobs. Cloud native patterns demand horizontal scalability, and that’s not how these systems were designed.
What’s the ideal data store? You need on-demand provisioning via self service, and you need horizontal scaling!
Cloud native applications need cloud native data
When you need to transform your data monoliths, the same cloud native principles that have evolved in support of applications can also be applied to data.
Think of cloud native data as data that’s scoped to match the domain of a single microservice. It’s an approach to refactoring the application data into smaller domains.
This reduces the scope of the monolith. During the initial phases of the transition plan, this data subset will be a replica of the monolith. The monolith can be retired once all the smaller datasets have been implemented.
The clients and downstream systems can use either the shared monolithic database or its cloud native data service for participle domains, with decreasing reliance on the monolith over time.
I will often refer to the cloud native data that is tied to microservices as a “data service.” The migration from monolith to microservice data service can be developed in phases. For example, in the diagram below, the monolithic database is migrated into cloud native data across three phases. It is common practice to convert the application logic first, and come back to the data “later.” However, the full benefits of microservices are not realized until the data is also addressed.
Here, each phase introduces new microservices that progressively replace the monolithic architecture, for the application logic as well as the relevant data.
The number of phases is a function of complexity. It's up to the business stakeholders to determine the number of phases needed to completely and safely migrate the data to a modernized cloud native data implementation.
The corresponding database schema tables are implemented as data services that are owned by (and bound to) each microservice as isolated resources. You can see how this addresses the challenges with large data monoliths mentioned earlier. This approach also yields the following benefits:
Releasing new versions of the microservice is easier because we’ve eliminated database dependencies across services.
Each microservice’s data service can be scaled independently, which is more predictable and cost effective than scaling an entire monolith.
Any lapse in the availability of a microservice’s data service should not impact other microservices.
Implementation: How it works
The most flexible architecture will look something like this:
The above table provides an overview of the architecture modules. This should help developers implement this design.
Service – This layer contains the data-related business logic. It is the gateway or facade for the Gang of Four design patterns book. It provides an abstraction layer for clients. The Spring @Service convention is a great example.
Repository – This layer is often referred to as the data access layer (DAL) or data access objects (DAO). The repository handles cloud native database connections and read/write operations. The connections are managed using the dependency injection or inversion control (IoC) frameworks. For example, see Spring Data with @Repository for Java implementations.
Domain – This is the domain model of data structures that represent business entities. The objects hold the value of attributes along with any needed relationships in order to implement business features. They are the transport mechanism for read/write operations between the application, database/data store, and remote clients. They may be shared across internal components or external dependent downstream systems. Note that it is very important to version the data domain objects that are shared across components or systems. Versioning these objects can reduce change management risks. Changes can be introduced into newer versions of the domain if older clients cannot be updated.
Cache – The cache can be implemented with an in-memory database (IMDB). The cache plays a broad and critical role in this architecture. It provides fast performance for both read and write access to data. It forms the basis of the elasticity and scalability of the data layer. The cache maintains high availability to the data, even when the underlying persistent store is not available. Asynchronous updates and event-driven architectures, supported by the caching layer, protect the autonomy between teams and allow for high-velocity software development. You can read more about the role of caching in microservices architectures in this white paper or this video on a caching approach to database modernization. The cache allows data domain objects to be stored in an intermediate storage area that is solely owned by the microservice. This can be implemented as GemFire. Note that many in-memory databases can be configured to store data in persistent storage. This can allow data to be read from storage after a cache restart. Also, persistence storage can be used when not all data can be cached in memory. If the data domain objects can be referenced by a key, then a NoSQL IMDB can be used. If a query is needed to access data, then a SQL-based IMDB like GemFire can be used.
Write (Behind/Through) – The Write (Behind/Through) module is normally a feature of the cache. It will save the data in the original database server when a microservice client performs a write operation of the cached information. This allows existing components or downstream systems to continue to read directly from the database until the migration is complete. The write-behind can be disabled when the migration is completed. See GemFire Cache Writer.
Loader – The loader module keeps the cache data in sync with its wrapped database sources. The loader is needed until all of the underlying legacy database sources/schemas are decommissioned. The loader should support an initial load of data in the microservice-owned data from the persisted datastore. The external database source can use a cache eviction or triggering. See GemFire Cache Loader.
Persistence – A persistent storage is often needed in conjunction with the cache. This persistent data store can survive restarts and potential application failures. The cache can be loaded from the persistent storage using lazy or eager loading techniques. The persistent storage can be backed up for disaster recovery. See GemFire Disk Stores or using a relational database for persistence.
That’s a look at a modern model to carry forward. With this model, clients and downstream systems can use the monolithic database or its data service within a microservice interchangeably.
Let’s now go back to the status quo, and take a deeper look at the monolith we’re trying to modernize.
How a monolith can be transformed
The following is an example of a monolithic application/database. The application server represents a large monolithic system such as an enterprise application server (EAS), business process manager (BPM), or enterprise service bus (ESB).
What is the problem with this architecture? These systems are large programs with many components that are hard to decouple. Over time, these systems become the “one-stop shop” for all processing needs of a particular business unit. They also become complex, hard to change, and hard to manage.
Over time, the data in the monolith takes on the same characteristics. Very often, one component (such as the “Customer” component) needs data-related changes that will break other components (e.g., “Accounting” and “Origination”). What makes this particular architecture even worse is that administrators have granted external systems (like CRM or a marketing automation add-on read/ write access to the underlying database. These interactions may not be fully understood or managed at all.
Database admins can be challenged with continuously monitoring processes and reacting to frequent issues. New functionality cannot be introduced because there is simply no more bandwidth. The system needs larger and more powerful hardware, which can take a long time to provision. The larger databases may require a significant amount of resources, such as memory, processors, disk space, and networking bandwidth.
Start the monolith to microservices migration
The example architecture diagram below shows the initial migration phase.
In this case, the customer, account, and relationship components are merged into a microservice data service named Client Mgmt (Management). These components use the knowledge and customer database schemas. It's important to note that the application is the system of record for the data domains related to the customer, accounting, and relationship components. If the application was not the system of record for the data domains, then an anti-corruption layer pattern can be used. (Note that the anti-corruption layer pattern encapsulates potential data domain object changes into a more stable version used by the microservice data service and its clients. Only the anti-corruption layer objects are impacted when changes are made externally.)
How to think about “read” and “write” operations during this phase
The following describes the activities for reading data domain object information that exists in the monolithic database but not in the microservice data service. This diagram illustrates how the data is retrieved from the monolith using an on-demand load pattern.
Step 1: A client initiates a read operation to a microservice. The client may pass some criteria details that identify the data domain object to be retrieved.
Step 2: A microservice delegates the read operation to the data service with the criteria.
Step 3: The repository will use the cache to get the data domain by a key based on the input criteria.
Step 4: The cache will return the value associated with the key provided in the criteria. In the event of a cache miss (the requested key value doesn’t exist in the cache), the cache can use the configured loader to load the data domain object. Note that the loader would not be called more than once for the same key once the data domain object is loaded in the cache.
Step 5: The loader acts as a data access layer to select the raw records from the database of the legacy monolith. Example: Select * from customer, account where customer_id= :key and customer.acc_id = account.acc_id
Step 6: The loader will convert the result set from the select into a data domain object. This is usually done using an object relational mapping (ORM) framework like JPA.
Step 7: The cache will store the data domain retrieved from the loader in the persistence storage of the microservice. This persistence layer can become the system of record once the monolith application is decommissioned. The persistence is used to refresh the cache's data domain entries during restarts.
*The data domain object is returned back to the calling client.
The following describes the sequence of steps for writing data domain objects in the data service that must also be saved in the monolithic application data store. Writing back to the monolithic application data store is typically needed if the monolith has system dependencies. This Write (Behind/Through) can be disabled once the third-party dependencies are migrated and the monolithic data store is fully decommissioned.
Step 1: A client initiates a write operation to a microservice. The client will pass the data domain to be saved.
Step 2: The microservice delegates the write operation with the domain to the data service.
Step 3: The data service uses the cache to put the data domain object by its key.
Step 4: The cache should use a configured Writer (Behind/Through) strategy to write the data domain to an external monolith database.
Step 5: The Writer (Behind/Through) object will save the data domain data in the monolith data store, such as a relational database management system.
Example: insert into customer(...) values(?,,,,,?)
It is important that failures to write to the database must also prevent cache entries from being written in order to maintain consistency. If there’s a problem with writing to the persistence store, the update to the cache will be rolled back. This eliminates consistency issues.
Step 6: The cache will store the data domain in the persistence layer.
Step 7: The cache is updated to save the domain once the write completes successfully.
What a data architecture for cloud native microservices looks like
The following is an example of technical architecture that can be used to implement a cloud native micro(service) data architecture to migrate monolithic databases.
Here's an overview of some key technologies:
Spring Boot – Spring Boot is a cloud native framework based on Spring that simplifies the deployment of complex applications and their needed runtime dependencies. REST web services can be enabled with an embedded web container, such as through text-based configurations and Java annotations. It also provides runtime metrics, health checks, and externalized configuration for operational support of production applications.
Spring Data – Spring Data provides a data access layer for data stores. It provides wrapper implementations for many data technologies, like JDBC, JPA, and GemFire. Spring Data can aid in the development of data access, loaders and Writer (Behind/Through) modules to Spring Boot–based applications.
GemFire – GemFire is an in-memory database. GemFire can be used as the cache within Spring Boot applications. GemFire supports plugin modules to add loader and write-back strategy implementations to synchronize data externally with the monolith until it is fully decommissioned. GemFire for Kubernetes is a cloud native 12-factor backing service implementation of GemFire that works on Kubernetes.
PostgreSQL Database – PostgreSQL (also known as Postgres) is a popular relational database management system (RDBMS). It can be used as a persistence storage system of record for cached data used by the loader and write-back backed by Spring Data JPA. See VMware Postgres.
We’ve shown you how monolith legacy databases and apps can be transformed into smaller, more manageable microservices.
Editor's note: This blog post was originally published in May 2019 and has been updated throughout to reflect current technologies, products, and guidance.
About the AuthorMore Content by Gregory Green