Breaking the Memory Barrier: Introducing EloqKV on EloqStore
Introduction
The hardware landscape is shifting under our feet. At CES 2026, NVIDIA CEO Jensen Huang delivered a stark warning: the industry is facing a critical shortage of DRAM. While the explosive growth of AI models is the primary driver, there is another massive consumer of memory that often flies under the radar: Caching Services.
Traditional services like Redis and Valkey to Momento, DragonflyDB, and Garnet are memory-bound. As data volumes reach Petabyte-scale, DRAM costs have become prohibitively expensive. Yet, for latency-sensitive businesses, DRAM remained the only choice because SSD-based alternatives could never solve the P9999 (tail latency) problem. In mission-critical environments, a single latency spike in the 99.99th percentile can disrupt entire real-time workflows—making unpredictable SSD performance a dealbreaker.
Today, we’re proud to introduce EloqKV powered by EloqStore. It is a persistent key-value store that speaks the Redis API and finally breaks the DRAM monopoly.
EloqKV is the first Redis-compatible store designed to deliver DRAM-level P9999 latency while harnessing modern NVMe technology. By eliminating the performance "tail" that plagues traditional disk-based systems, it offers up to 20X cost reduction without sacrificing the sub-millisecond reliability your business demands.

As shown above, EloqKV’s P9999 latency on a 2TB dataset is indistinguishable from pure DRAM Redis, and far superior to existing SSD-based stores like Apache Kvrocks. For detailed benchmark configurations, see the Benchmark section below.
The Long Tail Problem: Why SSD Caching is Hard
If DRAM is too expensive, the obvious alternative is the SSD. However, for high-performance caching, standard SSD implementations have historically failed. The dealbreaker isn't average latency—it’s long tail latency, especially when dealing with millions or billions of requests per second (RPS).
At the upcoming Unlocked Conference 2026, engineers from Uber will present "Real-world cache lessons from 1B RPS in prod." This highlights the scale modern infrastructure must handle. While a standard PostgreSQL or MySQL setup works fine for low workloads and small hot datasets, it crumbles under the "Internet Scale" traffic processed by Google, Amazon, Uber, and Snap.
The challenge is compounding in the AI Age. We are moving toward a future of Agent-to-Agent communication, which will generate 100x more data and RPS than human interaction. We see this with our own customers: AI chat scenarios involve massive data exploration because prompts are long, complex, and carry context payloads 100x larger than typical human chat messages.
When you throw this level of concurrency at a standard SSD-based database, traditional architectures fail. Relying on multi-threading and synchronous I/O simply cannot guarantee consistent low tail latency (e.g., P9999 < 4ms at 100K RPS) on commodity hardware (like a 16 vCore machine). The threads block, the context switches pile up, and the tail latency spikes.
The Software-Hardware Gap
Our research found that the hardware is no longer the bottleneck. Modern NVMe SSDs are incredibly fast, capable of delivering millions of IOPS (often exceeding 2.5M random read IOPS per drive). The hardware is ready to match the P9999 requirements of a cache. The problem is the software.
To bridge this gap, we had to fundamentally rethink the database engine using two specific techniques:
- Coroutines: To handle massive concurrency without the overhead of OS threads.
- io_uring: The Linux kernel interface that enables high-performance asynchronous I/O.
By combining these, we bypass the limitations of synchronous I/O. You can read more about our engineering philosophy on this topic in our deep dive: Coroutines & Async Programming.
Introducing EloqStore: Open Source & RocksDB-Free
To solve this, we built EloqStore, and we are thrilled to announce that it is now Open Source.
Traditionally, EloqKV utilized RocksDB and RocksCloud as its underlying storage engine. While robust, RocksDB relies on Log-Structured Merge-trees (LSM), which suffer from well-known issues that destabilize long tail latency:
- Write Amplification: Repeatedly rewriting data during compaction.
- Read Amplification: Checking multiple files to find a single key.
- Compression Stalls: CPU spikes during background tasks.
EloqStore addresses these issues head-on by abandoning the traditional Log-Structured Merge-tree (LSM) architecture in favor of a design optimized specifically for the high-parallelism nature of modern NVMe SSDs. While LSM-trees are designed to sequentialize writes for older spinning disks, EloqStore is built from the ground up to exploit the random-access strengths and massive IOPS of modern flash storage.
Deterministic Performance via B-tree Variant Indexing
To achieve deterministic performance, EloqStore utilizes a specialized B-tree variant indexing strategy. It maintains a compact, memory-resident mapping of all non-leaf nodes, ensuring that the internal path to any piece of data is always held in DRAM.
This architecture guarantees exactly one disk access per read. By bypassing the multiple "levels" checks found in RocksDB, EloqStore eliminates the read amplification that typically causes latency spikes during heavy lookups. Whether your dataset is 100GB or 10TB, a read request is always a direct, single-IOPS operation, providing the predictable latency required for mission-critical caching.
High-Throughput Batch Write Optimization
The write path is equally optimized through Batch Write Optimization. Leveraging io_uring, EloqStore bypasses the overhead of traditional synchronous system calls, allowing the engine to group multiple incoming writes into large, aligned blocks. This approach:
-
Reduces the frequency of expensive disk commits.
-
Significantly lowers write amplification.
-
Preserves the lifespan of the SSD while maintaining massive throughput.
Coroutines and Async I/O
Under the hood, EloqStore is built with Coroutines. This allows the system to handle thousands of concurrent requests without the heavy memory footprint or context-switching penalties of OS threads. When an I/O operation is pending, the coroutine simply yields, allowing the CPU to process other requests until the NVMe signals completion via io_uring. This synergy between language-level concurrency and kernel-level asynchronous I/O is what allows EloqKV to maintain sub-millisecond responsiveness under heavy load.
Eliminating the "Compaction Stall"
Most importantly, this architecture eliminates the jitter caused by background maintenance. Traditional LSM-trees must periodically merge and rewrite files to maintain performance—a process that consumes massive CPU and I/O resources, leading to the dreaded "compaction stall." EloqStore’s append-only design manages data reclamation more gracefully, ensuring that background tasks never interfere with foreground traffic. The result is a P9999 latency profile that stays flat, delivering a "memory-like" experience at disk-based economics.
Benchmark
We evaluated the performance of EloqKV on EloqStore using memtier_benchmark, specifically focusing on how tail latency and throughput scale as data volumes transition from DRAM-only sizes to massive SSD-based datasets.
The Setup:
- Hardware: Single Node, GCP Z3-16 instance.
- Specs: 16 vCore, 128GB RAM, 2.9TB NVMe SSD × 2.
- Data Scaling: Ranging from 20GB (pure DRAM) up to 2TB (full NVMe utilization with 800 Million items and large payloads of 1-4KB per item).
Performance & Scaling Analysis: Traditional DRAM-based systems like Redis are strictly limited by physical memory. As shown in the benchmark below, while Redis performance is restricted to smaller datasets, EloqKV breaks this barrier by leveraging NVMe without the "tail-latency penalty" typically associated with disk.

- DRAM-Level Consistency: At 20GB and 100GB, EloqKV actually provides higher throughput than Redis while maintaining a nearly identical P99.99 latency profile.
- Seamless Scaling to 2TB: While Redis hits a hard ceiling at 100GB on this hardware, EloqKV continues to scale. Even at a 2TB dataset, EloqKV maintains a stable P99.99 latency of ~2-3ms—performance previously thought possible only in pure DRAM environments.
- 20× Cost Reduction: To manage a 2TB dataset with Redis, an organization would typically need a cluster of 20 nodes to provide enough RAM. EloqKV delivers the same "long-tail" reliability on a single NVMe-optimized node, slashing infrastructure costs by 20×.
By solving the P99.99 latency problem on SSDs, EloqKV allows businesses to scale their data footprint by orders of magnitude without sacrificing the sub-millisecond responsiveness their users demand.
Rethinking the Key-Value Store for the AI Era
Beyond its performance as a cache, EloqKV is a full-featured key-value store. It does not require you to hold all keys in memory just to maintain performance. This architectural shift enables a new class of advanced features:
-
Cloud-Native "Scale to Zero": Because data is persisted on NVMe and/or object storage rather than volatile DRAM, EloqKV can spin down to zero when idle.
-
AI-Native "Quick Branching": For complex AI workflows and agent simulations, EloqKV supports rapid branching. You can fork your state to explore different prompt chains or agentic outcomes without duplicating massive DRAM footprints.
Meet us at Unlocked 2026
We are excited to announce that EloqData is a sponsor of the Unlocked Conference 2026, hosted by Momento and AWS.
Unlocked is the premier event for developers discussing the future of backend infrastructure. We are eager to discuss how the shift from DRAM to NVMe can unlock new possibilities for AI and hyperscale applications.
Come visit our booth, chat with our engineers, and see EloqStore in action. We look forward to seeing you there
