Accessing Data with Cassandra

This guide walks you through the process of using Spring Data Cassandra to build an application that stores data in and retrieves it from Apache Cassandra, a high-performance distributed database.

What You Will build

You will store and retrieve data from Apache Cassandra by using Spring Data Cassandra.

What You Need

How to complete this guide

Like most Spring Getting Started guides, you can start from scratch and complete each step or you can bypass basic setup steps that are already familiar to you. Either way, you end up with working code.

To start from scratch, move on to Starting with Spring Initializr.

To skip the basics, do the following:

When you finish, you can check your results against the code in gs-accessing-data-cassandra/complete.

Starting with Spring Initializr

You can use this pre-initialized project and click Generate to download a ZIP file. This project is configured to fit the examples in this tutorial.

To manually initialize the project:

  1. Navigate to https://start.spring.io. This service pulls in all the dependencies you need for an application and does most of the setup for you.

  2. Choose either Gradle or Maven and the language you want to use. This guide assumes that you chose Java.

  3. Click Dependencies and select Spring Data for Apache Cassandra.

  4. Click Generate.

  5. Download the resulting ZIP file, which is an archive of a web application that is configured with your choices.

If your IDE has the Spring Initializr integration, you can complete this process from your IDE.
You can also fork the project from Github and open it in your IDE or other editor.

Setting up a Database

Before you can build the application, you need to set up a Cassandra database. Apache Cassandra is an open-source NoSQL data store optimized for fast reads and fast writes in large datasets. In the next subsections, you can choose between using DataStax Astra DB Cassandra-as-a-Service or running it locally on a Docker container. This guide describes how to use the free tier of DataStax Astra Cassandra-as-a-Service so you can create and store data in your Cassandra database in a matter of minutes.

Add the following properties in your application.properties (src/main/resources/application.properties) to configure Spring Data Cassandra:

spring.cassandra.schema-action=CREATE_IF_NOT_EXISTS
spring.cassandra.request.timeout=10s
spring.cassandra.connection.connect-timeout=10s
spring.cassandra.connection.init-query-timeout=10s

The spring.data.cassandra.schema-action property defines the schema action to take at startup and can be none, create, create-if-not-exists, recreate or recreate-drop-unused. We use create-if-not-exists to create the required schema. See the documentation for details.

It is a good security practice to set this to none in production, to avoid the creation or re-creation of the database at startup.

We also increase the default timeouts, which might be needed when first creating the schema or with slow remote network connections.

Astra DB Setup

To use a managed database, you can use the robust free tier of DataStax Astra DB Cassandra-as-a-Service. It scales to zero when unused. Follow the instructions in the following link to create a database and a keystore named spring_cassandra.

The Spring Boot Astra starter pulls in and autoconfigures all the required dependencies. To use DataStax Astra DB, you need to add it to your pom.xml:

<dependency>
	<groupId>com.datastax.astra</groupId>
	<artifactId>astra-spring-boot-starter</artifactId>
	<version>0.1.13</version>
</dependency>
For Gradle, add implementation 'com.datastax.astra:astra-spring-boot-starter:0.1.13' to your build.gradle file.

The Astra auto-configuration needs configuration information to connect to your cloud database. You need to:

  • Define the credentials: client ID, client secret, and application token.

  • Select your instance with the cloud region, database ID and keyspace (spring_cassandra).

Then you need to add these extra properties in your application.properties (src/main/resources/application.properties) to configure Astra:

# Credentials to Astra DB
astra.client-id=<CLIENT_ID>
astra.client-secret=<CLIENT_SECRET>
astra.application-token=<APP_TOKEN>

# Select an Astra instance
astra.cloud-region=<DB_REGION>
astra.database-id=<DB_ID>
astra.keyspace=spring_cassandra

Docker Setup

If you prefer to run Cassandra locally in a containerized environment, run the following docker run command:

docker run -p 9042:9042 --rm --name cassandra -d cassandra:4.0.7

After the container is created, access the Cassandra query language shell:

docker exec -it cassandra bash -c "cqlsh -u cassandra -p cassandra"

And create a keyspace for the application:

CREATE KEYSPACE spring_cassandra WITH replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1};

Now that you have your database running, configure Spring Data Cassandra to access your database.

Add the following properties in your application.properties (src/main/resources/application.properties) to connect to your local database:

spring.cassandra.local-datacenter=datacenter1
spring.cassandra.keyspace-name=spring_cassandra

Alternatively, for a convenient bundle of Cassandra and related Kubernetes ecosystem projects, you can spin up a single node Cassandra cluster on K8ssandra in about 10 minutes.

Create the Cassandra Entity

In this example, you define a Vet (Veterinarian) entity. The following listing shows the Vet class (in src/main/java/com/example/accessingdatacassandra/Vet.java):

package com.example.accessingdatacassandra;

import java.util.Set;
import java.util.UUID;

import org.springframework.data.cassandra.core.mapping.PrimaryKey;
import org.springframework.data.cassandra.core.mapping.Table;

@Table
public class Vet {
  
  @PrimaryKey
  private UUID id;

  private String firstName;
  
  private String lastName;
  
  private Set<String> specialties;
  
  public Vet(UUID id, String firstName, String lastName, Set<String> specialties) {
    this.id = id;
    this.firstName = firstName;
    this.lastName = lastName;
    this.specialties = specialties;
  }

  public UUID getId() {
    return id;
  }

  public void setId(UUID id) {
    this.id = id;
  }

  public String getFirstName() {
    return firstName;
  }

  public void setFirstName(String firstName) {
    this.firstName = firstName;
  }

  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  public Set<String> getSpecialties() {
    return specialties;
  }

  public void setSpecialties(Set<String> specialties) {
    this.specialties = specialties;
  }
}

The Vet class is annotated with @Table, which maps it to a Cassandra Table. Each property is mapped to a column.

The class uses a simple @PrimaryKey of type UUID. Choosing the right primary key is essential, because it determines our partition key and cannot be changed later.

Why is it so important? The partition key not only defines data uniqueness but also controls data locality. When inserting data, the primary key is hashed and used to choose the node where to store the data. This way, we know the data can always be found in that node.

Cassandra denormalizes data and does not need table joins like SQL/RDBMS does, which lets you retrieve data much more quickly. For that reason, we have modeled our specialties as a Set<String>.

Create Simple Queries

Spring Data Cassandra is focused on storing data in Apache Cassandra. However, it inherits functionality from the Spring Data Commons project, including the ability to derive queries. Essentially, you need not learn the query language of Cassandra. Instead, you can write a handful of methods and let the queries be written for you.

To see how this works, create a repository interface that queries Vet entities, as the following listing (in src/main/java/com/example/accessingdatacaddandra/VetRepository.java) shows:

package com.example.accessingdatacassandra;

import java.util.UUID;

import org.springframework.data.repository.CrudRepository;

public interface VetRepository extends CrudRepository<Vet, UUID> {  
  Vet findByFirstName(String username);
}

VetRepository extends the CassandraRepository interface and specifies types for the generic type parameters for both the value and the key that the repository works with — Vet and UUID, respectively. This interface comes with many operations, including basic CRUD (Create, Read, Update, Delete) and simple query (such as findById(..)) data access operations. CassandraRepository does not extend from PagingAndSortingRepository, because classic paging patterns using limit or offset are not applicable to Cassandra.

You can define other queries as needed by declaring their method signature. However, you can perform only queries that include the primary key. The findByFirstName method is a valid Spring Data method but is not allowed in Cassandra as firstName is not part of the primary key.

Some generated methods in the repository might require a full table scan. One example is the findAll method, which requires querying all nodes in the cluster. Such queries are not recommended with large datasets, because they can impact performance.

Adding a CommandLineRunner

Define a bean of type CommandLineRunner and inject the VetRepository to set up some data and use its methods.

Spring Boot automatically handles those repositories as long as they are included in the same package (or a sub-package) of your @SpringBootApplication class. For more control over the registration process, you can use the @EnableCassandraRepositories annotation.

By default, @EnableCassandraRepositories scans the current package for any interfaces that extend one of Spring Data’s repository interfaces. You can use its basePackageClasses=MyRepository.class to safely tell Spring Data Cassandra to scan a different root package by type if your project layout has multiple projects and it does not find your repositories.

Spring Data Cassandra uses the CassandraTemplate to execute the queries behind your find* methods. You can use the template yourself for more complex queries, but this guide does not cover that. (See the Spring Data Cassandra Reference Guide[https://docs.spring.io/spring-data/cassandra/docs/current/reference/html/#reference]).

The following listing shows the finished AccessingDataCassandraApplication class (at /src/main/java/com/example/accessingdatacassandra/AccessingDataCassandraApplication.java):

package com.example.accessingdatacassandra;

import java.util.Arrays;
import java.util.HashSet;
import java.util.UUID;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.data.cassandra.core.CassandraTemplate;

@SpringBootApplication
public class AccessingDataCassandraApplication {

  private final static Logger log = LoggerFactory.getLogger(AccessingDataCassandraApplication.class);
  
  public static void main(String[] args) {
    SpringApplication.run(AccessingDataCassandraApplication.class, args);
  }
  
  @Bean
  public CommandLineRunner clr(VetRepository vetRepository) {
    return args -> {
      vetRepository.deleteAll();
      
      Vet john = new Vet(UUID.randomUUID(), "John", "Doe", new HashSet<>(Arrays.asList("surgery")));
      Vet jane = new Vet(UUID.randomUUID(), "Jane", "Doe", new HashSet<>(Arrays.asList("radiology, surgery")));
      
      Vet savedJohn = vetRepository.save(john);
      Vet savedJane = vetRepository.save(jane);

      vetRepository.findAll()
        .forEach(v -> log.info("Vet: {}", v.getFirstName()));
      
      vetRepository.findById(savedJohn.getId())
        .ifPresent(v -> log.info("Vet by id: {}", v.getFirstName()));
    };
  }
}

Build an executable JAR

You can run the application from the command line with Gradle or Maven. You can also build a single executable JAR file that contains all the necessary dependencies, classes, and resources and run that. Building an executable jar makes it easy to ship, version, and deploy the service as an application throughout the development lifecycle, across different environments, and so forth.

If you use Gradle, you can run the application by using ./gradlew bootRun. Alternatively, you can build the JAR file by using ./gradlew build and then run the JAR file, as follows:

java -jar build/libs/gs-accessing-data-cassandra-0.1.0.jar

If you use Maven, you can run the application by using ./mvnw spring-boot:run. Alternatively, you can build the JAR file with ./mvnw clean package and then run the JAR file, as follows:

java -jar target/gs-accessing-data-cassandra-0.1.0.jar
The steps described here create a runnable JAR. You can also build a classic WAR file.

Summary

Congratulations! You have developed a Spring application that uses Spring Data Cassandra to access distributed data.

See Also

The following guides may also be helpful:

Want to write a new guide or contribute to an existing one? Check out our contribution guidelines.

All guides are released with an ASLv2 license for the code, and an Attribution, NoDerivatives creative commons license for the writing.

Get the Code