Apache Cassandra

Cassandra = An open source, distributed and decentralized storage system (database)

It is a column-oriented database.
Elastic scalability: Highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.
Always on architecture: No single point of failure and it is continuously available for business-critical applications that cannot afford a failure.
Linear-scale performance: It increases throughput as the number of nodes in the cluster are increased.
Flexible data storage: Accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need.
Easy data distribution: Provides the flexibility to distribute data where you need by replicating data across multiple data centers.
Transaction support: Supports ACID like properties.

Architecture

Components

Clients / Drivers — Application libraries that know contact points and token metadata; they pick coordinators or round-robin.
Coordinator Node — Any node that receives a client request; it coordinates reads/writes (contacts replicas, enforces consistency level).
Ring / Tokens / vnodes — Data partitioned by token derived from partition key; each node owns token ranges. vnodes (virtual nodes) make balancing easier.
Seed nodes — Used only at startup to bootstrap gossip (not a single point of failure).
- Seed nodes are designated nodes in the Cassandra cluster that serve as contact points for the gossip protocol during bootstrap. When a new node joins the cluster or an existing node restarts, it contacts the seed nodes to learn about the cluster topology and the location of other nodes in the ring. Seed nodes aren't special in terms of functionality - they're regular nodes that are just configured as "seeds" in the cassandra.yaml file. You typically configure 2-3 seed nodes per datacenter for redundancy.
Gossip — Lightweight cluster membership and state propagation protocol.
Snitch — Determines network topology (rack/dc) so replica placement is topology-aware (e.g., GossipingPropertyFileSnitch).
Replication — Each keyspace has a replication strategy (SimpleStrategy, NetworkTopologyStrategy) and replication factor (RF).
Consistency levels — e.g., ONE / QUORUM / ALL — determine how many replicas must ack reads/writes.

Streaming — Efficient data transfer between nodes (bootstrap, repair, rebalancing).
Port 9042: It is the default port for Cassandra's native binary protocol (CQL native transport). This is the port that CQL clients and drivers use to communicate with Cassandra nodes. It's the modern, preferred way to interact with Cassandra, replacing the older Thrift protocol. When your application connects to Cassandra using a CQL driver, it connects to this port.

Data flow

Commit log — Append-only write-ahead log for durability.
- The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
Memtable — In-memory write buffer (data structure); flushed to disk as SSTables.
- After commit log, the data will be written to the memtable. Sometimes, for a single-column family, there will be multiple memtables.
SSTables (Sorted String Tables) — Immutable, on-disk sorted-string tables with indexes + bloom filters.
- It is a disk file to which the data is flushed from the memtable when its contents reach a threshold value.
- Bloom filters are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Compaction — Periodic merging of SSTables to reclaim space and reduce read amplification.

Write path (simplified)

Client sends write to any node (coordinator).
Coordinator consults partitioner/token metadata to find replica nodes for the partition key (based on keyspace RF + strategy).
Coordinator forwards the write to all replicas (or a subset depending on CL) — in parallel.
On each replica:
- Append to Commit Log (durability).
- Apply to Memtable (in-memory).
- Return an ack to coordinator.
Coordinator waits for sufficient acks per requested Consistency Level, then returns success to client.
Memtables are flushed to disk as SSTables when thresholds reached; compaction merges SSTables later.

Read path (simplified)

Client sends read to a coordinator.
Coordinator finds replicas owning the data and queries one or more (depends on CL and replica timestamps).
If replica has tombstone/older SSTables, coordinator may contact multiple replicas and merge results (read-repair may be scheduled).
Coordinator reconciles data (most recent timestamp wins), may initiate read-repair if inconsistency found.
Result returned to client.

Failure handling & consistency

Hinted handoff — Temporarily store hints for unavailable replicas; replay when they come back.
- Hinted handoff is Cassandra's mechanism for handling temporary node failures. When a coordinator node tries to write data to a replica node that is temporarily down, instead of failing the write, it stores a "hint" locally. This hint contains the data that should have been written to the unavailable node. Once the failed node comes back online, the coordinator replays these hints to bring that node up to date. This improves write availability and reduces the need for repair operations. Hints are stored for a configurable time window (default 3 hours).
Repair / Anti-entropy — nodetool repair fixes divergent replicas; Merkle trees used in anti-entropy streaming.
If a replica is down, coordinator will (depending on config):
- Store hint (hinted handoff) to replay later; or
- Rely on other replicas and consistency level to succeed/fail.
Repairs + anti-entropy ensure long-term consistency; hinted handoff provides fast short-term resilience.
Tunable consistency: pick CL based on your latency vs durability needs (e.g., QUORUM read + QUORUM write yields strong consistency across RF nodes).

Operational notes / best practices (short)

Use network topology-aware replication (NetworkTopologyStrategy) for multi-DC setups.
Monitor compaction, tombstones, GC grace period — large tombstone volume causes latency spikes.
Use vnodes (default) for easier scaling; but be aware of repair/streaming behavior.
Regular nodetool repair (or incremental repairs / AD hoc depending on version) to avoid data divergence.
Tune commit log and memtable sizes according to workload and available memory/disk.

🧩 Partitioning (Data Distribution)

Spread data evenly across all nodes so that Cassandra scales horizontally.
How it works:
- Each row in Cassandra has a partition key (part of your table’s PRIMARY KEY)
- The partition key is hashed using a consistent hashing algorithm (Murmur3 by default) to produce a token value.
- This token determines which node(s) in the ring are responsible for storing that data.
- The token ring is divided into ranges, and each node owns certain token ranges.
- Virtual nodes (vnodes): Instead of each physical node owning one contiguous token range, it owns many smaller, non-contiguous ranges. This provides better load distribution and faster rebalancing when nodes are added/removed.
- Partition key → Hash function → Token → Assigned node(s)
Example: Let’s say you have 4 nodes (N1–N4) forming a ring, and you store user data.

Ring of tokens (simplified view):

+-------------+-------------+-------------+-------------+

| (0–25) | (26–50) | (51–75) | (76–100) |

+-------------+-------------+-------------+-------------+

Token range = which hash values each node owns

Now some user IDs:

| User ID | Hash(token) | Goes to node |

| ------- | ----------- | ------------ |

| "alice" | 12 | Node1 |

| "bob" | 47 | Node2 |

| "carol" | 63 | Node3 |

| "dave" | 88 | Node4 |

✅ Evenly distributes data (avoids hot spots)

✅ Enables linear scalability — add nodes, rebalance, done

✅ Determines where reads/writes are routed

🔁 Replication

Store multiple copies (replicas) of each partition for durability and availability.
How it works: Each keyspace defines a replication factor (RF).
For example: RF = 3 → each partition stored on 3 different nodes

Ring with RF=3 (each partition stored on 3 neighbors)

+---------+

| Node A |◄──────────────┐

+---------+ │

▲ │

│ │

+---------+ +---------+

| Node D |────▶| Node B |

+---------+ +---------+

▲

│

└────────▶ Node C

If a write for partition “alice” is hashed to NodeA (primary), replicas might be placed on NodeB and NodeC as backups.

Strategy

Replication Factor (RF): Determines how many copies of each partition exist. RF=3 means 3 copies across 3 different nodes
Replication Strategy: Determines which nodes get the replicas
- SimpleStrategy: Places replicas on consecutive nodes in the ring (for single datacenter)
- NetworkTopologyStrategy: Distributes replicas across different racks/datacenters (for production)

Mechanism

The partition key's token determines the primary replica node
Additional replicas are placed on the next N-1 nodes in the ring (for SimpleStrategy)
All replica nodes are equally capable of handling reads/writes for that partition

Coordinator behavior: When a client writes data:

The coordinator node finds which nodes own that partition.
It sends the write to all replica nodes.
It waits for enough acknowledgments depending on the consistency level (e.g., ONE, QUORUM, ALL).

Consistency Levels

The consistency level (CL) on reads/writes determines how many replicas must acknowledge before the operation succeeds:

| ------ | ---------------------------- | ------------ | ----------------------------------|

This is how Cassandra achieves tunable consistency (eventual → strong, depending on CL) -- you can trade off between consistency, availability, and latency based on your application's needs (you can trade off performance vs safety).

Partitioning + Replication + Consistency Flow

⚙️ ACID in Cassandra (and distributed systems) = “AID” per partition + Tunable “C”

Atomicity = all or nothing (i.e., transactions roll back fully)
- ✅ Partition-level atomicity: All writes within a single partition are atomic. If you insert/update multiple rows with the same partition key, either all succeed or all fail.
- ❌ No multi-partition transactions: Writes across different partitions are NOT atomic by default. Each partition write is independent.
- ⚠️ Lightweight Transactions (LWT): Cassandra offers conditional writes using IF clauses (e.g., INSERT ... IF NOT EXISTS), which provide atomicity but at a significant performance cost (uses Paxos consensus).
Consistency – all replicas see same data instantly (i.e., strict consistency after commit)
- In Cassandra it is tunable: you choose via consistency levels (e.g. `ONE`, `QUORUM`, `ALL`) — so it can be eventually consistent.
- ⚠️ Tunable consistency: Cassandra uses eventual consistency by default, but you can tune it per query:
  - Write at QUORUM + Read at QUORUM = Strong consistency
  - Write at ONE + Read at ONE = Eventual consistency
- The consistency level determines how many replicas must respond before acknowledging success
- Formula: Write_replicas + Read_replicas > Replication_Factor guarantees strong consistency
Isolation – transactions don’t interfere (i.e., locks ensure serial execution)
- In Cassandra writes are isolated within a partition (no dirty reads), but not globally serialized
- Cassandra does not support traditional isolation levels (Read Committed, Serializable, etc.)
- Concurrent writes to the same row use last-write-wins based on timestamp
- No read locks or write locks - reads never block writes and vice versa
- LWT provides linearizable isolation for conditional operations, but again with performance penalties
Durability – once committed, it’s saved (i.e., stored in WAL + disk)
- ✅ Strong durability: Writes are durable once written to:
  - Commit log (append-only write-ahead log on disk)
  - Memtable (in-memory structure)
- Even if a node crashes immediately after acknowledging a write, data can be recovered from the commit log
- With proper replication (RF=3) and consistency levels (QUORUM), data is safely replicated before acknowledgment

Summary

👉 Cassandra is not fully ACID — it’s “AID” per partition, and offers tunable consistency instead of strict ACID across the cluster.
It's an AP system (Available and Partition-tolerant) in the CAP theorem, favoring:
- High availability over strict consistency
- Horizontal scalability over complex transactions
- Partition tolerance for distributed operations
For applications requiring strict ACID guarantees across multiple entities, a traditional RDBMS might be more appropriate. Cassandra excels at high-throughput, always-available systems where eventual consistency is acceptable.

Cassandra Data Model

Cassandra is a NoSQL, wide-column store.
Think of it as a distributed, partitioned, multi-dimensional map rather than relational tables.

Cluster

└── Keyspaces

└── Tables (a.k.a. Column Families)

└── Partitions (grouped by partition key)

└── Rows (sorted by clustering columns)

└── Columns (name–value pairs)

Example structure:

Cluster: user_data_cluster

└── Keyspace: accounts

└── Table: users

├── Partition: user_id = 101

│ ├── Row: email = alice@example.com

│ ├── Row: city = Boston

│ └── Row: last_login = 2025-10-12

└── Partition: user_id = 102

├── Row: email = bob@example.com

└── Row: city = Chicago

Key concepts:

| Term | Description | Analogy |

| --------------------- | -------------------------------------------- | --------------- |

| Keyspace | Top-level namespace (like a database in SQL) | Database |

| Table (Column Family) | Collection of rows, defined by schema | SQL table |

| Partition Key | Determines which node stores the data | Hash key |

| Clustering Columns | Define sort order within a partition | ORDER BY |

| Primary Key | `(partition_key [, clustering_columns...])` | Unique row ID |

| Static columns | Columns shared by all rows in a partition | “Header” fields |

Cassandra Query Language (CQL)

CQL looks like SQL but has different semantics — it’s designed around partitions, not joins or foreign keys.
Design tables based on your queries, not your data. Because you can’t freely query any column, you design each table to serve a specific query pattern efficiently.

Create

CREATE KEYSPACE accounts

WITH replication = {

'class': 'NetworkTopologyStrategy',

'datacenter1': 3

};

USE accounts;

For Single datacenter setups --> {'class': 'SimpleStrategy', 'replication_factor': 3}
For Multi-datacenter setups --> {'class': 'NetworkTopologyStrategy', 'DC1': 3, 'DC2': 2}

CREATE TABLE users (

user_id UUID,

email TEXT,

city TEXT,

signup_date TIMESTAMP,

last_login TIMESTAMP,

PRIMARY KEY (user_id)

);

Insert/Update

INSERT INTO users (user_id, email, city, signup_date)

VALUES (uuid(), 'alice@example.com', 'Boston', toTimestamp(now()));

UPDATE users SET last_login = toTimestamp(now())

WHERE user_id = 123;

Writes are upserts (insert or overwrite).

Select

SELECT * FROM users WHERE user_id = 123;

✅ Efficient because user_id is the partition key.
⚠️ Inefficient or invalid if you query by non-key columns without indexes:

-- This will fail unless you create a secondary index:

SELECT * FROM users WHERE city = 'Boston';

With Clustering columns

In Cassandra, each table’s PRIMARY KEY has two parts:
- PRIMARY KEY ((partition_key), clustering_column1, clustering_column2, ...)
The partition key decides which node stores the data.
The clustering columns decide how data is ordered within that partition.

CREATE TABLE messages (

chat_id UUID,

sent_time TIMESTAMP,

sender TEXT,

message TEXT,

PRIMARY KEY ((chat_id), sent_time)

);

Here, Partition key = chat_id; Clustering column = sent_time

By default, clustering columns are sorted ascending. You can override this per table using: WITH CLUSTERING ORDER BY (column_name ASC|DESC).

CREATE TABLE messages (

chat_id UUID,

sent_time TIMESTAMP,

sender TEXT,

message TEXT,

PRIMARY KEY ((chat_id), sent_time)

) WITH CLUSTERING ORDER BY (sent_time DESC);

Cassandra does not allow arbitrary WHERE clauses — only those that match the partition key (and optionally clustering columns).

-- Fetch latest 10 messages for one chat

SELECT * FROM messages WHERE chat_id = X LIMIT 10;

Secondary Indexes (used sparingly)

CREATE INDEX ON users(city);

SELECT * FROM users WHERE city = 'Boston';

Use cautiously — they can cause performance issues on large datasets.

Materialized Views

Pre-computed, automatically updated “alternate query paths”.

CREATE MATERIALIZED VIEW users_by_city AS

SELECT city, user_id, email

FROM users

WHERE city IS NOT NULL AND user_id IS NOT NULL

PRIMARY KEY (city, user_id);

Now you can query efficiently by city.

Collections (maps, lists, sets)

CREATE TABLE profiles (

user_id UUID PRIMARY KEY,

phones SET<TEXT>,

preferences MAP<TEXT, TEXT>

);

UPDATE profiles SET phones = phones + {'555-1234'} WHERE user_id = 42;

Summary

| Concept | Description |

| --------------- | ------------------------------------------------------------------ |

| Data Model | Wide-column, partitioned by key |

| Query Language | SQL-like syntax, but partition-based |

| Primary Key | Determines distribution (partition) and sort order (clustering) |

| Indexes & Views | Used for alternate query paths (with caution) |

| Schema Design | Query-driven, denormalized |

| Best Practices | Avoid joins; read/write within partitions; tune replication factor |

cqlsh: command-line interactive client

$ cqlsh -- default connection to Host: 127.0.0.1; Port: 9042

$ cqlsh <hostname> <port> -u <username> -p <password>

Batch commands

BEGIN BATCH

INSERT INTO users (id, name, age) VALUES (uuid(), 'Bob', 25);

INSERT INTO users (id, name, age) VALUES (uuid(), 'Carol', 28);

APPLY BATCH;

Spring + Cassandra: via Spring Data Cassandra

Map Java objects (POJOs) to CQL tables using annotations (the Mapper pattern)
Use CassandraRepository interfaces like JPA repositories

Maven

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-data-cassandra</artifactId>

</dependency>

Configure Connection

In your src/main/resources/application.yml:

spring:

data:

cassandra:

keyspace-name: demo

contact-points: localhost

port: 9042

local-datacenter: datacenter1

schema-action: create-if-not-exists

📝 schema-action can be none, create_if_not_exists, recreate, etc.

If you’re using application.properties:

spring.data.cassandra.contact-points=localhost

spring.data.cassandra.port=9042

spring.data.cassandra.keyspace-name=demo

spring.data.cassandra.local-datacenter=datacenter1

spring.data.cassandra.schema-action=create-if-not-exists

Create a Data Model (Entity)

Example: User.java

import org.springframework.data.annotation.Id;

import org.springframework.data.cassandra.core.mapping.PrimaryKey;

import org.springframework.data.cassandra.core.mapping.Table;

@Table("users")

public class User {

@PrimaryKey

private String id;

private String name;

private int age;

// Getters and setters

public String getId() { return id; }

public void setId(String id) { this.id = id; }

public String getName() { return name; }

public void setName(String name) { this.name = name; }

public int getAge() { return age; }

public void setAge(int age) { this.age = age; }

}

Repository (Spring Data Mapper)

Create a repository interface

import org.springframework.data.cassandra.repository.CassandraRepository;

import org.springframework.stereotype.Repository;

@Repository

public interface UserRepository extends CassandraRepository<User, String> {

// You can also define custom queries

List<User> findByName(String name);

}

This acts as a Mapper — Spring Data automatically translates method calls to CQL queries.

Use it in a Service or Controller

Example service:

import org.springframework.stereotype.Service;

import java.util.List;

import java.util.UUID;

@Service

public class UserService {

private final UserRepository userRepository;

public UserService(UserRepository userRepository) {

this.userRepository = userRepository;

}

public User createUser(String name, int age) {

User user = new User();

user.setId(UUID.randomUUID().toString());

user.setName(name);

user.setAge(age);

return userRepository.save(user);

}

public List<User> getAllUsers() {

return userRepository.findAll();

}

public List<User> findByName(String name) {

return userRepository.findByName(name);

}

Example REST Controller

import org.springframework.web.bind.annotation.*;

import java.util.List;

@RestController

@RequestMapping("/users")

public class UserController {

private final UserService userService;

public UserController(UserService userService) {

this.userService = userService;

}

@PostMapping

public User create(@RequestBody User user) {

return userService.createUser(user.getName(), user.getAge());

}

@GetMapping

public List<User> all() {

return userService.getAllUsers();

}

@GetMapping("/{name}")

public List<User> byName(@PathVariable String name) {

return userService.findByName(name);

}

Now you can:

POST /users

GET /users

GET /users/Alice

Optional: Using the DataStax Object Mapper (Advanced)

If you want more control, you can use the DataStax Java Driver Mapper directly:

@Dao

public interface UserDao {

@Select

User findById(String id);

@Insert

void save(User user);

}

Then create a Mapper with:

Mapper<UserDao> mapper = new MapperBuilder(session).build();

UserDao dao = mapper.userDao();

dao.save(user);

But — for most Spring Boot apps, the built-in CassandraRepository is easier.

Google Sites

Report abuse