Friday, 24 April 2026

KvRocks as Your Main Database — Possible Use Cases

 

A technical deep-dive into when Apache KvRocks earns a primary role in your stack


Introduction

Most engineers first encounter Apache KvRocks as a Redis cost-reduction story: same wire protocol, same commands, less RAM, cheaper bill. That framing is accurate but undersells what KvRocks actually is. When you replace Redis's in-memory engine with RocksDB — one of the most battle-tested LSM-tree storage engines ever built — you get a system with fundamentally different durability, capacity, and operational characteristics. That changes which role it should play in your architecture.

This post is not about KvRocks as a Redis cache replacement. It is about the specific use cases where KvRocks is the right primary data store — not a cache in front of something else, not a temporary buffer, but the authoritative system of record for a class of data.

We will cover the storage engine mechanics that make this possible, the use cases where KvRocks earns a primary role, and the hard boundaries where it should not be used as one.


Understanding the Storage Engine Difference

Before use cases, the mechanics matter.

Redis / Valkey

Redis keeps all data in RAM. Persistence is optional and retrofitted — either RDB snapshots (point-in-time, lossy) or AOF (append-only log, configurable fsync). Neither was designed from the ground up for durability. An AOF rewrite under load causes measurable latency spikes. RDB fork under a large working set causes memory pressure. The persistence model is a concession, not a design centre.

KvRocks + RocksDB

KvRocks uses RocksDB as its storage backend. RocksDB is an LSM-tree (Log-Structured Merge-Tree) engine developed at Facebook/Meta, now widely deployed in production at companies including Cockroach Labs, TiKV, Yugabyte, and LinkedIn. It was designed for:

  • Sequential write amplification — writes land in a memtable, flush to L0 SSTables, and compact downward through levels. Write throughput is high because it is fundamentally sequential I/O.
  • Durability by default — WAL (Write-Ahead Log) is on by default. Crash recovery is structural, not configured.
  • Compression — Snappy, LZ4, and Zstd compression at the SSTable level. Cold data compresses aggressively, reducing storage cost significantly.
  • Bloom filters — per-SSTable bloom filters accelerate point lookups by avoiding unnecessary disk reads.
  • Block cache — a configurable in-memory block cache sits above the storage layer, giving you warm-read performance close to Redis for frequently accessed keys.

The implication for KvRocks: your dataset size is bounded by NVMe/SSD capacity, not RAM. A dataset that would require a 256 GB RAM instance in Redis can run on a server with 32 GB RAM and 2 TB NVMe in KvRocks, with hot data served from block cache and cold data fetched from disk with bloom filter-accelerated lookups.

Latency Profile

This is the trade-off you accept:

Operation Redis (RAM) KvRocks (block cache hit) KvRocks (disk read)
GET (hot key) ~50–200µs ~200–800µs ~1–5ms
SET ~50–200µs ~200–500µs ~200–500µs (WAL)
HGETALL (large hash) ~100µs–1ms ~500µs–3ms ~2–10ms
SCAN ~1–10ms ~5–30ms ~10–100ms

Write latency is competitive because RocksDB writes are sequential. Read latency for hot data with a well-tuned block cache is acceptable for most non-authorisation-path use cases. Cold reads are measurably slower, which is why dataset access patterns determine fitness.


Use Case 1: Idempotency Key Store

The Problem

Payment systems, API gateways, and distributed job schedulers must guarantee exactly-once processing. The standard mechanism is an idempotency key: the client sends a unique key with each request; the server checks whether it has seen this key before and returns a cached result if so, or processes and stores the result if not.

The requirements for an idempotency key store are:

  • Durability — losing an idempotency record means double-processing a payment, sending a duplicate notification, or running a job twice. This is a hard correctness requirement.
  • Long retention — regulatory frameworks and API contracts often require 24–90 days of idempotency key retention. PCI-DSS environments and banking integrations commonly mandate the longer end of this range.
  • High write throughput — every incoming request is a write.
  • Point-lookup reads — the read pattern is almost exclusively GET idempotency:{uuid}.
  • Automatic expiry — keys should expire after their retention window.

Why Redis/Valkey Struggles Here

The retention window is the killer. At 100,000 requests/day with a 90-day window, you accumulate approximately 9 million keys. If each record stores the request hash, response payload, and metadata (~2KB average), that is ~18 GB of RAM dedicated solely to idempotency keys — before any other workload. Scaling to 1M requests/day makes this untenable.

AOF durability adds operational anxiety: AOF rewrite storms, the risk of appendfsync everysec losing up to one second of keys under failure.

Why KvRocks is the Right Primary Store

KvRocks handles this use case cleanly:

SET idempotency:{uuid} {payload} EX 7776000
GET idempotency:{uuid}

Identical commands to Redis. No application change required. But now:

  • The same 18 GB dataset fits in ~4 GB of NVMe space after Snappy compression (LSM-tree compaction eliminates tombstones and compresses cold SSTables).
  • WAL ensures that a committed SET survives a crash without requiring appendfsync always.
  • RocksDB bloom filters mean GET on a non-existent key (the common path for new requests) avoids disk reads.
  • EX (TTL) is implemented natively; RocksDB compaction handles tombstone cleanup.

Configuration notes:

# kvrocks.conf
rocksdb.compression_per_level     no:no:lz4:lz4:zstd:zstd:zstd
rocksdb.block_cache_size          4096      # 4GB block cache for hot idempotency keys
rocksdb.write_buffer_size         128       # MB, tune for write throughput
rocksdb.bloom_locality            1         # enable bloom filters

Sizing guidance: With LZ4/Zstd compression, budget approximately 200–400 bytes per idempotency record on disk. A 90-day window at 1M requests/day requires roughly 200–400 GB NVMe — trivial on modern hardware, and easily sharded across KvRocks cluster nodes.


Use Case 2: Fraud Feature Store

The Problem

Machine learning-based fraud detection requires a feature store: a low-latency lookup table of precomputed features associated with entities (cards, accounts, merchants, devices). Examples:

  • card:{pan_hash}:avg_txn_amount_30d42.17
  • device:{fingerprint}:countries_seen_7d["GH","GB","NG"]
  • merchant:{mid}:chargeback_rate_90d0.0031
  • account:{id}:velocity_1h7

These features are computed by a streaming pipeline (Flink, Kafka Streams, or similar) and written continuously. They are read at authorisation time by the fraud scoring model.

Requirements

  • Large dataset — feature stores for production fraud systems easily reach 50–500 GB of computed features.
  • High write throughput — continuous feature updates from the streaming pipeline.
  • Sub-5ms read latency — the fraud scoring model reads 20–100 features per transaction, in parallel. Total feature retrieval budget is typically 5–15ms.
  • Durability matters, but not perfectly — losing a few seconds of feature updates is survivable (the feature becomes slightly stale, not absent). Losing the entire feature store is catastrophic.
  • Rich data structures — features are often hashes, sorted sets (percentile distributions), or lists (recent transaction sequences).

Why KvRocks Works Here

A 200 GB feature store in Redis costs approximately $5,000–$15,000/month in managed cloud RAM (r6g.4xlarge class). In KvRocks on NVMe, the same dataset fits on commodity hardware costing a fraction of that.

The read latency profile is acceptable: 20–100 parallel HGET calls at 1–3ms each (block cache warm) satisfy the 5–15ms total budget. The pipeline uses PIPELINE or MGET to batch reads:

import redis  # KvRocks is wire-compatible

client = redis.Redis(host='kvrocks-host', port=6380)

def fetch_features(pan_hash: str, merchant_id: str, device_fp: str) -> dict:
    pipe = client.pipeline(transaction=False)
    pipe.hgetall(f"card:{pan_hash}:features")
    pipe.hgetall(f"merchant:{merchant_id}:features")
    pipe.hgetall(f"device:{device_fp}:features")
    results = pipe.execute()
    return {
        "card": results[0],
        "merchant": results[1],
        "device": results[2],
    }

Feature update writes from the streaming pipeline:

def update_card_features(pan_hash: str, features: dict, ttl_seconds: int = 7776000):
    key = f"card:{pan_hash}:features"
    pipe = client.pipeline()
    pipe.hset(key, mapping=features)
    pipe.expire(key, ttl_seconds)
    pipe.execute()

Block cache tuning is critical here. Hot features (recently active cards/merchants) must stay in block cache. Size the block cache to cover your hot working set — typically 10–20% of total feature store size:

rocksdb.block_cache_size    20480   # 20GB block cache for 200GB feature store

Use Case 3: Durable Event / Message Queue

The Problem

Redis Streams and LIST-based queues are widely used for lightweight message passing — webhook delivery queues, payment event buses, notification pipelines. The problem is durability. Under load:

  • AOF rewrite pauses can cause producer backpressure and consumer lag.
  • appendfsync everysec loses up to one second of messages on hard failure.
  • appendfsync always imposes per-command fsync overhead that collapses throughput.
  • RDB snapshots create fork-induced latency spikes.

For non-critical queues (cache warming, analytics events), this is acceptable. For payment event queues — settlement notifications, chargeback webhooks, compliance audit events — losing messages has regulatory and financial consequences.

Why KvRocks is the Right Primary Store

KvRocks implements Redis Streams (XADD, XREAD, XREADGROUP, XACK, XPENDING) with RocksDB persistence underneath. Every XADD is durably written to the WAL before acknowledgement. No fsync tuning required. No AOF rewrite storms.

# Producer — payment settlement event
client.xadd(
    "stream:settlement:events",
    {
        "transaction_id": "txn_abc123",
        "amount": "149.99",
        "currency": "GHS",
        "merchant_id": "mid_xyz",
        "status": "settled",
        "timestamp": "2025-04-25T14:32:00Z",
    },
    maxlen=5_000_000,  # cap stream length
)

# Consumer group — settlement reconciliation worker
client.xgroup_create("stream:settlement:events", "reconciliation-workers", id="0", mkstream=True)

messages = client.xreadgroup(
    groupname="reconciliation-workers",
    consumername="worker-1",
    streams={"stream:settlement:events": ">"},
    count=100,
    block=5000,
)

for stream, entries in messages:
    for msg_id, fields in entries:
        process_settlement(fields)
        client.xack("stream:settlement:events", "reconciliation-workers", msg_id)

KvRocks's XPENDING and dead-letter handling work identically to Redis Streams, giving you full consumer group semantics with structural durability.

Operational advantage: Stream data compresses well under Zstd. A stream of 5 million JSON payment events that would occupy ~5 GB in Redis RAM occupies ~500 MB–1 GB on KvRocks NVMe storage.


Use Case 4: Configuration and Reference Data Store

The Problem

Payment gateways, API platforms, and SaaS products maintain large volumes of configuration data that must be:

  • Always available — a missing merchant config causes failed transactions.
  • Consistent — all application instances must see the same config.
  • Large — thousands to millions of entities, each with deep configuration hashes.
  • Occasionally written — config changes are infrequent relative to reads.
  • Durable — losing config data requires manual reconstruction from source systems.

Typical examples in a payment gateway:

  • Per-merchant routing configuration (acquirer selection rules, fallback chains)
  • BIN table (Bank Identification Number → issuer metadata, ~500,000 entries)
  • Currency/FX rate tables with validity windows
  • 3DS ACS URL and version configuration per card range
  • Velocity rule sets per merchant category code

Why This is a Poor Fit for Redis

A complete BIN table with issuer metadata (~500,000 hashes, ~1 KB each) occupies ~500 MB in Redis RAM. Multiplied across a dozen such reference tables, you have several GB of RAM locked into static data that changes a few times per day. This is expensive RAM with a poor utilisation profile.

Why KvRocks Works as the Primary Store

Reference data is an ideal KvRocks workload because:

  • The access pattern is overwhelmingly read-heavy with predictable hot keys (top merchants, common BIN prefixes). Block cache handles this efficiently.
  • Write throughput requirements are minimal (config changes are infrequent).
  • Compression ratios on structured JSON/msgpack config hashes are excellent.
  • Durability is required but low-urgency — a config change written to KvRocks persists through any crash.
# Loading BIN table entry
def get_bin_metadata(bin_prefix: str) -> dict:
    raw = client.hgetall(f"bin:{bin_prefix}")
    return {k.decode(): v.decode() for k, v in raw.items()}

# Writing merchant routing config
def set_merchant_config(merchant_id: str, config: dict):
    client.hset(f"merchant:config:{merchant_id}", mapping=config)
    # No TTL — config is permanent until explicitly updated

# Atomic config update with version tracking
def update_merchant_config(merchant_id: str, updates: dict, version: int):
    pipe = client.pipeline()
    pipe.hset(f"merchant:config:{merchant_id}", mapping=updates)
    pipe.hset(f"merchant:config:{merchant_id}", "config_version", version)
    pipe.execute()

For application-layer caching on top of KvRocks, a short-lived local in-process cache (e.g. cachetools.TTLCache in Python) with a 30-second TTL gives you sub-millisecond reads for the hottest config keys with near-zero staleness risk.


Use Case 5: Session Store with Long Retention Windows

The Problem

Standard web session stores use Redis with short TTLs — 15 to 60 minutes. This is a well-understood use case. The problem arises in specific contexts:

  • Banking mobile apps — regulatory guidance in several jurisdictions (including UK FCA and Bank of Ghana) allows extended session validity for authenticated users in specific risk tiers, sometimes up to 24 hours with continuous activity.
  • Operator portals — back-office users in payment operations maintain long-running sessions.
  • API access tokens — OAuth refresh tokens may have 30–90 day validity windows.
  • MFA state — trusted device records may be retained for 30–90 days.

When session TTLs extend beyond a few hours and the session payload grows beyond a simple token (e.g. storing entitlements, preferences, risk scores), keeping sessions in Redis RAM becomes expensive.

Why KvRocks is the Right Primary Store

Session data has exactly the access pattern KvRocks handles well: sparse reads against a large key space (most sessions are idle at any moment), with bursts of activity for active users served from block cache.

import json
import uuid
from datetime import timedelta

SESSION_TTL = int(timedelta(hours=24).total_seconds())

def create_session(user_id: str, entitlements: list, risk_tier: str) -> str:
    session_id = str(uuid.uuid4())
    payload = {
        "user_id": user_id,
        "entitlements": entitlements,
        "risk_tier": risk_tier,
        "created_at": "2025-04-25T00:00:00Z",
    }
    client.set(f"session:{session_id}", json.dumps(payload), ex=SESSION_TTL)
    return session_id

def get_session(session_id: str) -> dict | None:
    raw = client.get(f"session:{session_id}")
    if raw is None:
        return None
    # Sliding expiry — refresh TTL on access
    client.expire(f"session:{session_id}", SESSION_TTL)
    return json.loads(raw)

def invalidate_session(session_id: str):
    client.delete(f"session:{session_id}")

No application change from a Redis-backed session store. The gain is dataset size: 1 million concurrent 24-hour sessions at 4 KB each is 4 GB in Redis RAM. In KvRocks with Snappy compression, the same dataset occupies ~800 MB–1.5 GB on NVMe.


Use Case 6: Time-Series Metrics and Telemetry Buffer

The Problem

Observability pipelines, IoT telemetry collectors, and financial transaction monitoring systems often need a fast write buffer for time-series data before it lands in a purpose-built time-series database (InfluxDB, TimescaleDB, Prometheus remote storage). Requirements:

  • Very high write throughput.
  • Moderate read throughput (dashboards, alerting queries over recent windows).
  • Retention of hours to days of raw data.
  • Durability — lost telemetry creates gaps in compliance audit trails.

KvRocks as a Telemetry Buffer

Sorted sets provide a natural time-series primitive, with Unix timestamps as scores:

import time

def record_metric(entity_id: str, metric_name: str, value: float):
    key = f"metrics:{entity_id}:{metric_name}"
    ts = time.time()
    client.zadd(key, {f"{ts}:{value}": ts})
    # Trim to retain only the last 24 hours
    cutoff = ts - 86400
    client.zremrangebyscore(key, "-inf", cutoff)

def query_metric_range(entity_id: str, metric_name: str,
                       start_ts: float, end_ts: float) -> list[tuple]:
    key = f"metrics:{entity_id}:{metric_name}"
    raw = client.zrangebyscore(key, start_ts, end_ts, withscores=True)
    return [(member.decode().split(":")[1], score) for member, score in raw]

For a payment gateway, this pattern suits real-time merchant transaction rate monitoring, acquirer response time tracking, and fraud score distribution tracking — all of which need durable, queryable recent history without the cost of RAM-bound storage.


Operational Considerations When Using KvRocks as a Primary Store

Replication

KvRocks supports primary-replica replication using a Redis-compatible replication protocol. For production primary store use:

# On replica
replicaof kvrocks-primary 6380
replica-read-only yes

Unlike Redis, replication does not require the primary to fork and produce an RDB snapshot. KvRocks streams SSTable files during initial sync, which is significantly more efficient for large datasets.

Cluster Mode

KvRocks supports cluster mode with hash-slot-based sharding, compatible with Redis cluster clients. For large idempotency stores, feature stores, or session stores, horizontal sharding is the natural scaling path.

from redis.cluster import RedisCluster

client = RedisCluster(
    startup_nodes=[
        {"host": "kvrocks-node-1", "port": 6380},
        {"host": "kvrocks-node-2", "port": 6380},
        {"host": "kvrocks-node-3", "port": 6380},
    ],
    decode_responses=True,
)

Backup Strategy

Because KvRocks data is on disk, backup is straightforward — snapshot the RocksDB data directory. KvRocks exposes a BGSAVE-equivalent DEBUG SLEEP and checkpoint mechanism:

# Trigger a RocksDB checkpoint
redis-cli -p 6380 BGSAVE

# Or directly snapshot the data directory with rsync
rsync -av --checksum /var/lib/kvrocks/data/ backup-host:/backups/kvrocks/$(date +%Y%m%d)/

Monitoring

KvRocks exposes INFO output compatible with Redis monitoring tools. Key metrics to watch for primary store use:

redis-cli -p 6380 INFO stats | grep -E 'used_memory|rdb_|aof_|keyspace_hits|keyspace_misses'
redis-cli -p 6380 INFO keyspace

For RocksDB-specific metrics (compaction lag, write stall, block cache hit rate):

redis-cli -p 6380 INFO rocksdb

The block cache hit rate is the single most important metric for primary store workloads. A hit rate below 80% signals that the block cache is undersized for your working set.


Where KvRocks Should NOT Be Your Primary Store

Honesty about limitations is more useful than sales material.

Sub-millisecond hot path operations. If your use case has a strict < 500µs read SLA (card authorisation BIN lookups, real-time rate limiting counters on the critical path), Redis/Valkey remains the correct choice. KvRocks cannot guarantee block cache hit latency below ~200µs under contention.

ACID multi-key transactions. KvRocks inherits Redis's transaction model — MULTI/EXEC is a pipeline with no rollback on logic failure. If your workload requires atomic reserve-debit-credit sequences with rollback guarantees, use Tarantool or PostgreSQL.

Complex queries and aggregations. KvRocks has no SQL layer. Anything requiring range scans, aggregations, or joins across key spaces belongs in a relational or columnar store.

Extremely write-heavy workloads with large value sizes. RocksDB write amplification under heavy compaction pressure can cause write stalls. For workloads writing multi-megabyte values at sustained high throughput, benchmark carefully before committing.


Summary: When KvRocks Earns a Primary Role

Use Case Key Driver Fits?
Idempotency key store (30–90 day retention) Durability + large dataset ✅ Strong fit
Fraud feature store (50–500 GB) Cost reduction + durability ✅ Strong fit
Durable payment event queue Structural durability ✅ Strong fit
Merchant / reference config store Durability + large hashes ✅ Strong fit
Long-retention session store Dataset size + durability ✅ Strong fit
Time-series telemetry buffer High write throughput + durability ✅ Good fit
Sub-millisecond authorisation cache Latency requirements ❌ Use Redis
ACID settlement sequences Transaction semantics ❌ Use Tarantool
AML velocity SQL aggregations Distributed compute ❌ Use GridGain

The unifying theme across the strong fits is the same: you need Redis semantics, you need genuine durability, and your dataset is larger than you want to hold in RAM. KvRocks is the only system that satisfies all three simultaneously. The moment you find yourself sizing Redis instances around dataset volume rather than throughput, or engineering AOF flush strategies to avoid data loss, KvRocks deserves a serious evaluation as the primary store for that workload.



No comments:

Post a Comment

KvRocks as Your Main Database — Possible Use Cases

  A technical deep-dive into when Apache KvRocks earns a primary role in your stack Introduction Most engineers first encounter Apache Kv...