14 posts tagged with "Company"

Why Everyone is using PostgreSQL and DuckDB — And Why We May Need More

November 11, 2025 · 14 min read

EloqData

EloqData Core Team

PostgreSQL and DuckDB have become the go-to databases for developers everywhere. Postgres is the default choice for transactional workloads, while DuckDB has quietly taken over the analytics world. Both are simple, fast, and easy to use. You can spin them up in seconds, run them anywhere, and they "just work." For most use cases, that's more than enough. But it's worth noting that both are single-node systems at heart. They can scale up, but when you hit the limits of one machine, you have to look elsewhere to migrate your infrastructure.

Many people now argue that single-node databases are enough for almost everything. Hardware has become so powerful that even massive workloads can often fit on one beefy machine. OpenAI recently discussed how their main database runs on a single-writer PostgreSQL setup. Just one node handling all the writes, with many read replicas to scale out read traffic. That's a bold design, and it suggests that maybe we no longer need complicated distributed databases because modern CPUs, SSDs, and memory are making scale-out architectures look like overkill.

In this artcle, we discuss how we reached this state in the database landscape, and disucss the future for scalable databases. We draw many inspirations from history, and we believe that there is a very bright future for database community going forward as we entering the new era of growth and prosperity.

The Rise and Fall of the Scale-Out Mindset

Twenty years ago, distributed databases were the hot thing. As the internet exploded and apps started serving millions of users, the old single-node giants like Oracle and SQL Server began to crack under the load. That's when companies like Google and Amazon came up with Bigtable and Dynamo — systems designed to scale horizontally across hundreds or thousands of machines. Their success kicked off a wave of distributed data systems: Cassandra, MongoDB, Redis, HBase and many others. These databases powered the modern internet and became the backbone of web-scale engineering.

The first wave of distributed databases came with trade-offs. To scale across machines, they gave up the ACID guarantees people took for granted in traditional databases. These "NoSQL" systems and early data lakes were fast, but they pushed the pain of handling inconsistency back to the application developers. About a decade later, around 2015, the industry figured out how to bring transactions back — both for online and analytical workloads. That's when the "NewSQL" and "Lakehouse" movements took off, with systems like Google Spanner, CockroachDB, TiDB, Snowflake, and Databricks started to appear around that time. For a while, it looked like distributed databases were going to rule the world. Snowflake went public at a staggering valuation, and companies like Databricks, Clickhouse, Cockroach Labs, and Yugabyte attracted huge funding rounds each valued in the billions.

However, just as distributed databases were hitting their stride, a new idea started to spread — maybe massive scale-out isn't necessary anymore. The first hint came from SQLite, the tiny embedded database that quietly became the most deployed software in the world. Then PostgreSQL surged ahead of MySQL and became the default choice for serious applications, from startups to enterprises. And now, DuckDB — a single-node analytical engine that runs right on your laptop — is reshaping how people do analytics. Suddenly, the towering systems built around Bigtable, Spanner, TiDB, Clickhouse and Snowflake don't look as untouchable as they once did. So what happened?

To understand this shift in mindset, it helps to look back at what changed. Before the internet era, databases like Sybase and Oracle were built for a single organization, usually serving tens of thousands of users. When the first wave of internet applications arrived in the mid-90s, those same systems were pressed into service to run websites like eBay and AltaVista. It worked for a while, until it didn't. By the early 2000s, companies like Google discovered that traditional databases simply couldn't keep up with millions of online users, the new "netizens". That realization gave birth to Google's internal infrastructure, and with it, the entire era of Big Data and distributed databases.

Back then, a typical server had two sockets, one core per socket, running around 1 GHz, and maybe a gigabyte of RAM. Google's famous early papers described clusters of thousands of such machines — an impressive scale at the time. Fast forward to today: a single high-end server can pack terabytes of memory, hundreds of CPU cores running at several gigahertz with much improved instruction-per-cycle, and dozens of SSDs each delivering gigabytes per second of throughput. In other words, the capacity that once required an entire data center now fits in a single box. Meanwhile, the total number of users hasn't grown exponentially. Even with mobile devices and global internet access, we're still bounded by the size of the human population. For most large-scale applications, serving a few million users, a single modern database can easily handle the load.

Distributed OLTP companies such as CockroachDB, TiDB, and Yugabyte have all felt the impact of this shift. Founded between 2015 and 2017, they entered the market with high expectations for explosive growth. Those expectations never quite materialized. By around 2020, their products had become stable and feature-complete, and their valuations reached their peak. Yet, the demand was far smaller than investors had imagined. These systems now face competition from two sides: from powerful single-node databases that can handle most workloads, and from partially distributed but more efficient systems such as Amazon Aurora, Google AlloyDB, and starup offerings such as NeonDB.

The OLAP world has fared better. Analytical workloads depend not only on the number of users but also on how much data is collected, how long the history is kept, and how many users or systems consume the results. As companies collect more signals over longer periods, and as machine learning models begin to consume analytical outputs, the demand for performance and scalability remains high. Snowflake and Databricks had the advantage of starting a few years earlier than the OLTP vendors, which helped them secure a strong position in the market. Much of the growth in the analytics market today comes from a new kind of consumer: AI training and inference systems that rely on massive amounts of analytical data. Still, newer systems such as MotherDuck, which builds on DuckDB, have gained attention from both developers and investors.

The New Users: AI Powered Applications and Autonomous Agents

So what happens next? Hardware will keep getting faster, and there is no sign of that slowing down. Does that mean single-node databases like SQLite, PostgreSQL and DuckDB will take over completely, leaving little room for distributed systems? Probably not.

Due to the recent explosive AI innovation and the emergence of AI powered applications, the way we use software has changed. Every chat, prompt, or API call produces data. For instance, I use ChatGPT almost every day and generate several conversations daily. Each one is a few hundred kilobytes, stored as chat history that I can scroll through anytime. After less than a year, I already have more than a thousand conversations. Compare that with my Amazon account, where I might have made less than a thousand purchases in more than a decade, each only a few kilobytes of data. I have easily produced a hundred times more data with OpenAI than with Amazon, in a fraction of the time.

Programming tools such as Cursor go even further. They can generate tens of millions of tokens every day, and all that text needs to be stored for debugging, history tracking, or later analysis. This volume of data already puts serious pressure on backend systems.

More importantly, a new kind of user is emerging: autonomous agents powered by AI. This shift is unlike anything we have seen before. It is creating enormous demand for both data volume and query intensity. In the age of AI, we have to rethink our data infrastructure, just as the internet once forced us to do.

Soon, autonomous agents will become the dominant users of IT systems. Take something as simple as buying a T-shirt. A human might browse a few pages and make one request every half minute. An AI agent, given the same goal, could issue thousands of requests in seconds to find the perfect item. This kind of machine-driven activity will easily overwhelm infrastructure that was built for human speed and scale. Just like the netizens used to overwhelm infrastructures built for employees in a single company at the begining of the internet age.

And this is only the beginning. Other kinds of AI applications and physical robots will arrive soon. We are standing at the edge of a new wave of innovation. If you think back to 1999, it would have been almost impossible to imagine people posting daily videos on TikTok or calling a ride with a phone through Uber. The same thing will happen again. Many new AI-powered scenarios have not even been imagined yet. Any company that wants to stay ahead, whether a startup or a long-established enterprise, needs to prepare for the pressure that these new workloads will put on their data infrastructure.

The pain of non-scalable systems is not new. Engineers have shared many war stories about hitting a scaling wall and being forced to migrate their databases under fire. We believe it is time to think ahead. It is better to prepare for what AI applications will need, rather than wait until the pressure becomes unbearable. The pace of change may be much faster than people expect.

During the early internet era, Google gained a huge advantage from its internal data infrastructure. It gave their teams the freedom to run experiments, ship features, and index the web without worrying about system limits. I saw this gap first-hand while working on the early version of Bing. Competing with Google was extremely difficult because they could move faster, process more data, and try new ideas at a pace that others simply could not match. Their real-time indexing pipeline was built on top of Percolator, the large-scale distributed transactional database, which itself sat on top of Bigtable, the large-scale distributed NewSQL database. Reproducing that stack was almost impossible for anyone else at the time.

This kind of competitive advantage will matter even more as we enter the hyper-speed growth of the AI era. Companies that build strong data infrastructure early will have the freedom to innovate without being held back by system limits, while everyone else will be stuck fighting their own scalability problems. As mentioned before, OpenAI currently relies on PostgreSQL for its main database, and for now it works. But even at this stage, engineers have already talked about banning new tables creations in the main database and avoiding any write-heavy workload. They also may need hierachical replications as the replication workload of the many read-replicas are overwhelming the write replica. OpenAI is only two years into its hyper-growth phase. These kinds of infrastructure limits can quickly become a major liability as the company scales.

Looking Forward

Going forward, there are certainly spaces for simple, light databases such as SQLite and DuckDB. PostgreSQL is certainly still going to be a major force in the database ecosystem and will continue to power many important applications, just as Orcale DB continues to be super relevant in the internet age. However, we must not ignore the challenges for scalability and performance as we entering the AI era. One cannot dismiss the need for Caterpillar Excavactors just because a shovel is sufficient for my backyard work 99.9% of the time. This is especially true when we are on the cusp of taking on the endeavor of transforming the entire world.

So where does this leave us? If AI agents become the main users of software, the demands on storage and compute will rise far beyond what a single machine can handle. We will need systems that combine the simplicity and performance of single-node databases with the elasticity and fault tolerance of distributed ones. We will also need databases that speak multiple "dialects" at once, since AI applications mix transactional work, analytical work, vector search, caching, and document storage in the same request path. The future will not be served by a pile of disconnected systems glued together with application logic. It will be served by a unified, scalable, multi-model engine that treats data as a first-class substrate.

This is the direction we are building toward at EloqData. Our Data Substrate architecture is designed for the AI age. It brings the speed of in-memory systems, the reliability of transactional databases, and the flexibility of multi-model APIs together in one platform. Our database offerings build on top of this technology, i.e. the Redis API compatible KV store EloqKV, the MongoDB API compatible document DB EloqDoc and the MySQL API compatible relational database EloqSQL all inherit the architectural advantages of Data Substrate to provide scalable, high-performance, low cost, and transactional databases with AI native features such as low cost snapshot and branching. They all can scales out when you need, scales in when you don't, and keeps storage costs low by using object storage as the primary durable layer. They allow developers build AI-driven applications without worrying about whether the next spike in workload will tip the system over.

We believe the next decade of software will not be defined by yet another wave of siloed databases. It will be defined by systems that make data infrastructure disappear into the background, allowing AI systems to read, write, learn, and act at machine speed. Our goal is to help build that foundation. Join the discussion on our Discord Channel, leverage our free tier cloud service to try out, or visit our GitHub repo to contribute. We would be happy to hear your thoughts on the future of databases.

Don't Split My Data: I Will Use a Database (Not PostgreSQL) for My Data Needs

November 7, 2025 · 17 min read

EloqData

EloqData Core Team

The internet (or at least the IT community) had a field day when a couple of blog posts claimed you could replace Redis and Kafka with PostgreSQL. "Redis is fast, I'll cache in Postgres" and "Kafka is fast -- I'll use Postgres" have gotten much attention on HackerNews here and here, and on Reddit here and here. Obviously, some of the claims in the posts got roasted on HN and Reddit for suggesting you could replace Redis or Kafka with PostgreSQL. Many people (correctly) pointed out that the benchmarks were far from properly set up, and the workloads were non-typical. Some of the Kafka people also posted long articles to clarify what Kafka is designed for and why it is not hard to use. But, on the flip side, many of the posts also (correctly) preached a valid point: keeping fewer moving parts matters, and using the right tool for the job matters even more.

Those "Postgres can replace Redis/Kafka" posts usually benchmark light workloads where serious tooling simply isn't necessary, and then stretch that into a sweeping narrative that "you don't need Redis or Kafka at all". But whether a workload is small or not depends entirely on the application. For a local grocery store website, twenty concurrent checkouts might qualify as high traffic; for Instagram or Twitter, a million queries per second barely registers. We're not here to debate whether people may ever need ten thousand queries per second. Generalizing one's workload to everyone's infrastructure needs is non-productive. It's like writing an article about how much grocery a new convertable can carry and conclude that U-Haul is no longer needed. No one would post that on a trucking forum expecting a heated debate. So why does the same kind of reasoning explode into controversy every time someone applies it to databases?

Sure, part of the drama is emotional. Developers get attached to their tools. Kafka, Redis, and Postgres all have cult-like followings, and any praise or criticism feels personal. But there's more to it than fandom. These debates touch on real technical trade-offs: complexity, durability, scaling, and the cost of managing multiple systems. In this article, we'll look at why it actually makes sense to use a real database for workloads that used to require specialized systems, and then tackle the elephant in the room: why this shift hasn't happened before and what's needed to make it possible.

The Advantage of Combining Workloads on a Real Database

Back in the pre-internet days, "database" basically meant Oracle or Sybase: big relational systems that stored all of a company's data and powered every enterprise app. Then came the internet era, and data exploded in both volume and variety. The once-unified database stack shattered into a zoo of specialized systems: PostgreSQL and MySQL for transactional truth, Kafka for streams, Redis for caching, and Spark, Snowflake, or ClickHouse for analytics. We also have specialized tools for graphs, for vectors, for time-series, for tensors, and even just for account balances. Each tool solved a particular problem, but together, they created a monster of complexity. IT teams now spend countless hours wiring, tuning, and babysitting these silos just to keep data moving.

Lately, there's been a growing movement to consolidate around a single database (often PostgreSQL) as the backbone of data infrastructure. The obvious motivation is simplicity and cost: running one system is easier (and cheaper) than orchestrating a zoo of databases stitched together with duct tape and ETL jobs. But beyond operational convenience, there are deeper reasons this trend deserves attention. A converged data architecture unlocks advantages that go far beyond saving money and any systems architect can't afford to ignore.

1. Avoid Cascading Failures

Every engineer who's operated a large-scale system knows this pain: one small failure snowballs into a full-blown outage. This phenomenon is called cascading failure. When you glue together multiple systems, say Kafka feeding MySQL with Redis caching on top, each layer becomes a potential failure point. A hiccup in Kafka can stall writes, which can then cause cache invalidations to fail, which can then cause a thundering herd of retries.

Example: Twitter's Cache Incident (via Dan Luu) In 2013, Twitter experienced an outage where a minor latency issue in the caching layer caused by interrupt-affinity misconfiguration led to a GC spiral on the tweet service. As cache latency grew, more requests bypassed the cache, overwhelming downstream services and eventually collapsing success rates to near 0% in one datacenter.

The database itself wasn't the root cause, but the incident shows how fragile multi-tiered data stacks can be: when layers depend on each other for throughput, even small delays can cascade into total failure.

This is exactly the kind of chain reaction that converged database architectures can avoid by keeping durability, caching, and data access under one coordinated system instead of scattered layers of dependency. In a converged architecture, where all data flows through a single database, node failures degrade capacity proportionally rather than triggering a domino effect. The system scales down gracefully instead of collapsing catastrophically.

2. Avoid Development Complexity

Maintaining two code paths for the same data is a tax on every engineering team. A typical "read-through" cache setup means developers have to write one logic path for MySQL (cache miss) and another for Redis (cache hit). Over time, these two paths drift apart: schema changes, data formats, and business rules slowly go out of sync. Debugging that kind of inconsistency is pure misery. A single database eliminates that duality. Your application talks to one API, with one transaction model, and one source of truth. Simpler code, fewer bugs, faster iteration.

Here is an example code for reading user information. If Redis cache is involved, the code looks like this:

def get_user(user_id):
    # Try cache first
    try:
        user = redis.get(f"user:{user_id}")
        if user:
            return json.loads(user)
    except RedisTimeout:
        logger.warning("Redis timeout, falling back to DB")
    except RedisConnectionError:
        logger.error("Redis down, falling back to DB")
    
    # Cache miss or Redis failed, try database
    try:
        with postgres.cursor() as cur:
            cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))
            user = cur.fetchone()
            
            # Try to update cache (but don't fail if Redis is down)
            try:
                redis.set(f"user:{user_id}", json.dumps(user), ex=3600)
            except:
                pass  # "It's fine, we'll cache it next time"
                
            return user
    except PostgresConnectionError:
        # Both Redis and Postgres down? Try the read replica
        with postgres_replica.cursor() as cur:
            # ... more error handling ...

47 lines of code to read one user. And this doesn't even handle cache invalidation, which was another 200 lines of scattered redis.delete() calls that everyone was afraid to touch.

With everything in one database, the code becomes trivial:

def get_user(user_id):
    with db.cursor() as cur:
        cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))
        return cur.fetchone()

def update_user(user_id, data):
    with db.cursor() as cur:
        cur.execute("UPDATE users SET ... WHERE id = %s", (..., user_id))

3. Avoid Consistency and Durability Confusion

Distributed systems are already hard to reason about. Add multiple data tiers like Redis, Kafka, and a transactional database and you multiply the number of durability and consistency scenarios. Each component has its own guarantees: Kafka offers a choice of "at-least-once", "at-most-once", "exact-once" delivery, and whether you need flush every writes to achieve durability is sometimes debated. Redis is eventually consistent, meaning data can be stale, and provide replication, write-ahead-log and snapshot as persistency mechanisms. Your SQL database is usually transactionally consistent with full ACID guarantee, but only within its own boundary. Stitch them together, and you're left trying to reason about correctness in a house of cards.

Example: Stale Writes and Inconsistent Cache at Scale (via Dan Luu) Dan Luu's review of real-world cache incidents includes multiple cases where caches went out of sync with the source of truth. In one Twitter incident, stale data persisted in the cache after the underlying value had changed, leading to users seeing outdated timelines. In another, duplicate cache entries caused some users to appear in multiple shards simultaneously: an impossible state from the database's perspective. The fixes required strict cache invalidation ordering and retry logic to prevent stale data from overwriting newer updates.

None of these were database bugs; they were integration bugs, symptoms of managing consistency across disconnected systems.

A single, converged database architecture eliminates this entire class of problems. There's no cache to drift out of sync, no queue to replay twice, no data race between "source of truth" and "derived truth." Everything from reads to writes to streaming happens within one transaction boundary and one consistency model. You don't just simplify the system; you make correctness provable instead of probabilistic.

What About Scalability and Performance?

If converging everything into a single database simplifies architecture so dramatically, the obvious question is: why hasn't everyone already done it? The short answer: performance and scalability.

Specialized systems like Redis and Kafka were born because traditional databases couldn't keep up. Caches were faster, queues were more scalable, and analytics engines could crunch far more data than your average OLTP database. The trade-off was fragmentation and complexity, but at least things stayed fast. For decades, this was a necessary compromise. In early 2000 the godfather of database Mike Stonebreaker famously declared "One size does not Fit All" in database systems. Even as late as 2018, the CTO of Amazon Werner Vogels decalared that "A one-size-fits-all database doesn't fit anyone".

But is that still so? Clearly there is a demand for one size fit all database, as demonstrated by the blogs mentioned in the early part of this article. Indeed, only recently have database architectures evolved enough to make convergence a realistic alternative. In this section, we discuss the necessary capabilities of a database that must be satisfied to enable such convergence.

1. Scalable Transactions

For years, transactions were the deal-breaker for scaling databases. Everyone wanted the simplicity of ACID semantics, but once workloads outgrew a single machine, something had to give. The conventional wisdom was: you can have efficiency, transaction, or scale, but not all of them. That's how we ended up with a zoo of specialized systems. NoSQL databases like MongoDB and Cassandra threw away multi-record atomicity in exchange for horizontal scale. Developers got used to compensating in application code: implementing retries, deduplication, or manual rollback logic. It worked, but it was painful and brittle. For relational systems, manual sharding became the necessary evil: once you hit the single-node ceiling, you split your data and your sanity along with it.

As workloads grew, people learned that sharding relational databases by hand was a nightmare. Every cross-shard join, every transaction that touched more than one key, became a distributed systems problem. You could scale reads, but writes were a different story. And once data started to overlap across shards, consistency slipped through your fingers.

That's why the new generation of distributed transactional databases, often grouped under the "NewSQL" label, was such a breakthrough. Systems like Google Spanner, TiDB, and CockroachDB showed that you could have global scale and serializable transactions, thanks to better consensus protocols, hybrid logical clocks, and deterministic commit algorithms. Even previously non-ACID systems have evolved: MongoDB added multi-document transactions in 4.0, and Aerospike introduced distributed ACID transactions feature in 8.0.0. The takeaway is that distributed transactions are already standard features in many systems.

This changes the calculus completely. When your database can handle distributed transactions efficiently, you no longer need to glue together different engines just to scale. You get scalability and correctness in one system that can become a foundation for building truly unified data architectures.

Yet, most existing distributed transactional databases are still less efficient than their single node counter part. This is an issue we need to address (see below).

2. Independent Resource Scaling

Different workloads stress different parts of a system. Streaming ingestion (like Kafka) is IO-bound and needs massive sequential write throughput. Caching workloads (like Redis) are memory-bound, thriving on low-latency access to hot data. Analytical queries and vector search, on the other hand, are CPU-bound, demanding large compute bursts. Traditional shared-nothing databases where compute, storage, and memory scale together force you to over-provision one resource just to satisfy another. You end up paying for RAM you don't use or SSDs that sit mostly idle.

That's one of the reasons why people historically broke their pipelines into multiple specialized systems. You could put Kafka on write-optimized disks, Redis on high-RAM nodes, and MySQL on balanced hardware.

Now, cloud infrastructure is changing that equation. Modern cloud platforms allow elastic and independent scaling of compute, storage, and memory. Databases are starting to embrace this model directly. The separation of compute and storage, first popularized by cloud data warehouses like Snowflake and BigQuery, is now making its way into OLTP systems as well. This decoupling lets a single database scale ingest, query, and cache layers independently.

A truly converged database architecture must exploit this flexibility: scale write nodes when ingestion spikes, scale compute when CPU demands surge, and expand storage as data volumn grows.

3. Performance and the Zero-Overhead Principle

C++ developers have a mantra: the Zero-Overhead Principle

You don't pay for what you don't use.
What you do use is just as efficient as what you could reasonably write by hand.

That's exactly the mindset a converged database must adopt. Existing databases rarely meet this bar. Even if you just want to run an in-memory cache, most engines still insist on running full durability machinery: write-ahead logging, background checkpoints, transaction journals, and page eviction logic. The result? You're paying CPU and latency overhead for guarantees you may not need on a given workload.

This is why Redis can outperform MySQL on pure in-memory reads even when both are running on the same hardware and the dataset is identical. The overhead isn't only in the query parser; it's in the architectural layers designed for persistence, recovery, and buffer management that never get out of the way.

A truly unified database must treat these mechanisms as pluggable, not mandatory. Durable writes, replication, and recovery logging should be modular: enabled only when needed, bypassed when not. The same engine should be able to act as a blazing-fast cache, a strongly consistent OLTP store, or a long-term analytical system, without paying unnecessary tax in the fast path.

In other words, unification can not come at the cost of overhead. It has to come from rethinking the core execution and storage engine so that the same system can be both flexible and ruthlessly efficient, depending on how it's used.

PostgreSQL May Not Be the Answer

If "just use Postgres for everything" sounds too easy, that's because it is. PostgreSQL deserves its reputation: it's battle-tested, reliable, and absurdly extensible. But it was never designed to be the all-in-one database the modern world needs. Its architecture still reflects the assumptions of the 1990s: local disks, tightly coupled storage and compute, and a single-threaded execution model wrapped in process-based concurrency. The query pipeline, the storage system, the concurrency control mechanism are all designed with a single-node, row-based MVCC ACID relational database in mind.

You can retrofit Postgres with external layers: Citus for sharding, Neon for decoupled storage, TimeScale (now TigerData) and PGVector for handling time-series and vector data. But this is often not optimal, and often just recreates the complexity we were trying to eliminate (i.e. they do not compose). When serious modification is needed, such as making it fully multi-writer capable as in CockroachDB and YugabyteDB or streaming capable as in Materialize and RisingWave, a major rewrite is often needed and often only the wire-protocol can be reused. Extensions like logical replication, foreign data wrappers, and background workers are impressive engineering feats, but they're patches on a monolith, not blueprints for a unified data platform.

When people benchmark Postgres against Redis or Kafka, they're really comparing apples to power tools. Postgres can simulate a cache or a queue, but it isn't optimized for them. WAL writes still happen, visibility maps still update, vacuum still runs. The performance gaps show up not because Postgres is "slow," but because it's doing far more work than those specialized systems ever attempt. Its design trades raw throughput for correctness and compatibility, a perfectly valid choice, just not one that scales linearly to every use case.

Recently, there is another school of thought argues that as hardware keeps improving, scalability and efficiency becomes less relevant. If a single server can pack terabytes of RAM and hundreds of cores, why bother with distributed systems at all? Why not just keep everything in one big Postgres instance and call it a day? We fundamentally disagree with this line of thinking. Hardware growth helps, but data growth is exponential. User expectations for availability, latency, and global presence grow even faster. Scaling vertically might postpone the problem, it doesn't solve it. This is especially true in the AI age, when data is the lifeline of all applications. We'll dedicate another article to discuss this in detail.

PostgreSQL has earned its place as the default database of our time, but it's not the endgame. There is opportunity in databases that internalize the lessons of distributed systems, cloud elasticity, and modular design.

The Road Ahead: Toward a True Data Substrate

The industry's obsession with patching and stitching different data systems together is a legacy of the past twenty years of hardware and software limits. But those limits are fading. The next generation of data infrastructure won't be defined by whether it speaks SQL or supports transactions. It will be defined by whether it can unify the data lifecycle without compromise: streaming, caching, analytics, and transactions under a single, coherent architecture.

That's exactly what we're building with EloqData's Data Substrate. Instead of bolting more features onto yesterday's database engines, we started from a clean slate: modular storage and compute layers, fully ACID-compliant distributed transactions, object storage as a first-class persistence medium, and elastic scaling across workloads. The same engine that serves as a durable operational database can act as a high-throughput cache, a streaming log, or an analytical backend without duplicating data or wiring together half a dozen systems.

This is the promise of a converged data platform: simplicity without trade-offs, scalability without fragmentation, and performance without overhead. The future of data infrastructure belongs to systems that treat data as a continuous substrate rather than a pile of disconnected silos. That's where we're headed. Of course, we still have a very long way to go, but we are working hard on this goal. Join the discussion on our Discord Channel, or visit our GitHub repo to contribute. We would be happy to hear your thoughts on the future of databases.

How NVMe and S3 Reshape Decoupling of Compute and Storage for Online Databases

October 24, 2025 · 10 min read

EloqData

EloqData Core Team

Cloud native databases are designed from the ground up to embrace core cloud principles: distributed architecture, automatic scalability, high availability, and elasticity. A prominent example is Amazon Aurora, which established the prevailing paradigm for online databases by championing the decoupling of compute and storage. This architecture allows the compute layer (responsible for query and transaction processing) and the storage layer (handling data persistence) to scale independently. As a result, database users benefit from granular resource allocation, cost efficiency through pay-per-use pricing, flexibility in hardware choices, and improved resilience by isolating persistent data from ephemeral compute instances.

In this blog post, we re-examine this decoupled architecture through the lens of cloud storage mediums. We argue that this prevailing model is at a turning point, poised to be reshaped by the emerging synergy between instance-level, volatile NVMe and highly durable object storage.

Lessons from the AWS us-east-1 Outage: Why Local NVMe as Primary DB Storage Is Risky

October 20, 2025 · 5 min read

EloqData

EloqData Core Team

On October 20, 2025, AWS experienced a major disruption across multiple services in the us-east-1 region. According to AWS Health Status, various compute, storage, and networking services were impacted simultaneously. For many teams running OLTP databases on instances backed by local NVMe, this was not just a downtime problem-it was a data durability nightmare.

Cloud databases must constantly balance durability, performance, and cost. In modern cloud environments, there are three main types of storage available:

Storage Type	Durability	Latency	Cost	Persistence Across VM Crash
Block Storage (EBS)	High	Medium	High	Data persists
Local NVMe	None	Ultra-fast	Low per IOPS	Lost on restart/crash
Object Storage (S3)	Very High	Slow	Lowest	Persistent

Let’s break down the trade-offs and why recent events place a spotlight on risky architectural choices.

Option 1: Block-Level Storage (EBS) - Durable but Expensive and Slow

EBS is the default choice for reliability:

It survives instance failures.
It supports cross-AZ replication via multi-replica setups.
It enables quick reattachment to replacement nodes.

But the downside?

GP2/GP3 disks deliver modest IOPS and high latency.
High-performance variants like IO2 are extremely expensive when provisioned for hundreds of thousands of IOPS.
Scaling performance often means scaling cost linearly.

EBS gives you durability-but performance per dollar is disappointing.

Option 2: Local NVMe - Fast but Ephemeral (and Now Proven Risky)

Instance families like i4i provide 400K+ to 1M+ IOPS from local NVMe, making them a natural fit for databases chasing performance.

So many database vendors recommend:

Use local NVMe for primary storage
Add cross-AZ replicas for durability

But here’s the problem: Local NVMe is tied to the node lifecycle. If the node restarts, fails, gets terminated due to spot interruption, or is impacted by a region-level failure such as the recent us-east-1 outage-you lose ALL the data.

During routine failures, cross-AZ replicas often protect you. But during region-wide degradation or cascading incidents, with local NVMe, there is nothing to recover. The storage is simply gone. What you can do is to recovery from recent backups - often lagging days. Write loss is guaranteed between last backup and crash.

In contrast, EBS volumes can always be reattached to a new node.

The AWS us-east-1 outage just validated that “local NVMe + async replication” is a high-risk strategy for mission-critical databases.

Option 3: Object Storage (S3) - Durable & Cheap, But Latency Is a Challenge

Object storage is:

3x cheaper than block storage
Regionally and cross-region durable
Built to survive region-level failures
Practically infinite
A first-class citizen for modern cloud-native platforms

But the challenge remains: S3 latency is too high for OLTP if accessed synchronously.

This is why traditional OLTP engines avoid it.

So the question becomes: How do we get the cost & durability benefits of S3 without paying the latency penalty?

The Data Substrate Approach: Object Storage First, NVMe as Cache, EBS for Logs

EloqData treats object storage (e.g., S3) as the primary data store, and architect the system to avoid the usual latency pitfalls:

Layer	Role	Why
S3 (Object Storage)	Primary data store	Ultra-durable, Cheap
EBS (Block Storage)	Durable log storage	Small volume, low latency writes
Local NVMe	High-performance cache	Accelerates reads & async flushes

Through Data Substrate, we decouple storage from compute and split durability between:

Log: persists immediately to EBS
Data store: periodically checkpointed to S3 (async + batched)
NVMe: purely a cache layer, safe to lose at any time

This allows us to:

Withstand node crashes seamlessly
Recover fully even if local NVMe is wiped
Handle region-level disruption by replaying logs and checkpoints
Enjoy millions of IOPS from NVMe without durability risk
Cut storage cost by 3x+ compared to full EBS-based systems

Check out more on our products powered by Data Substrate:

The Larger Industry Trend

We are not alone in this shift. The broader ecosystem is moving object-storage-first:

System	Use of Object Storage
Snowflake	OLAP on S3
StreamNative Ursa	Streaming data on S3
Confluent Freight Clusters	Streaming data on S3
Turbopuffer	Vector & full-text search on S3

EloqData brings this model to OLTP with a transactional, low-latency engine powered by Data Substrate.

After the Outage: A Hard Question Every Architect Should Ask

If my database node died right now, would I lose all my data?

If you're running a primary database on local NVMe, and relying solely on async replicas, the answer might be yes.

It’s time to rethink durability assumptions in the cloud era.

Summary

Strategy	Performance	Durability	Region Outage Risk	Cost
EBS only	Medium	✅	✅	$$$
Local NVMe only	Fast	❌	❌	$$
NVMe + async replicas	Fast	Partial	High	$$
Object Storage + Log + NVMe Cache (EloqData)	Fast	✅✅	✅✅	$

AWS us-east-1 just reminded the industry: Performance is replaceable. Lost data is not.

With the right architecture, you don’t have to choose.

Build fast.
Stay durable.
Be outage-proof.

That’s the future we’re building at EloqData.

Check out more on our open source databases:

Coroutines and Async Programming: The Future of Online Databases

September 26, 2025 · 8 min read

EloqData

EloqData Core Team

Online databases are the backbone of interactive applications. Despite coming in many different types, online databases are all engineered for low-latency, high-throughput CRUD operations. At EloqData, we use the universal Data Substrate to build online databases for any model—from key-value and tables to JSON documents and vectors. In this post, we explore one of our core engineering practices for future online databases.

The Benefits of Data Substrate Architecture

July 16, 2025 · 14 min read

EloqData

EloqData Core Team

In the previous article, we discussed the details of some of the architecture design of Data Substrate. In this article, we continue the discussion and elaborate on why we made these design choices and how these choices affect the resulting database solutions we built.

A Deeper Dive Into Data Substrate Architecture

July 15, 2025 · 18 min read

EloqData

EloqData Core Team

In this article, we dive deeper into the technical foundations of Data Substrate—highlighting the key design decisions, abstractions, and architectural choices that set it apart from both classical and modern distributed databases.

Data Substrate Technology Explained

July 14, 2025 · 3 min read

EloqData

EloqData Core Team

At EloqData, we've developed Data Substrate—a database architecture designed to meet the unprecedented demands of modern applications in the AI age. Unlike traditional database systems that struggle with the scale and complexity of AI workloads, Data Substrate reimagines the database as a unified, distributed computer where memory, compute, logging, and storage are fully decoupled yet globally addressable.

Building a Data Foundation for Agentic AI Applications

This series of articles explore the motivations, technical foundations, and benefits of Data Substrate, providing a comprehensive understanding of how this architecture addresses the critical challenges facing modern data infrastructure in the AI age.

Some of the topics covered are rather heavy in technical jargons, and require a good understanding of database internal mechanisms to appreciate. We apologize in advance.

1. Data Substrate: Motivation and Philosophy

This article introduces the core philosophy behind Data Substrate. We explore why traditional database architectures fall short in the AI era and present our vision for a new approach that treats the entire distributed system as a single, unified computer.

2. A Deeper Dive Into Data Substrate Architecture

This technical deep-dive explores the architectural foundations of Data Substrate. We examine the key design decisions, abstractions, and technical choices that set Data Substrate apart from both classical and modern distributed databases.

3. The Benefits of Data Substrate Architecture

This article examines the practical benefits and real-world implications of Data Substrate. We discuss how our design choices translate into concrete advantages for modern applications, particularly in cloud environments.

Why Data Substrate Matters

Traditional database architectures were designed for a different era—one where data volumes were smaller, workloads were more predictable, and the demands of AI applications were unimaginable. Data Substrate represents a fundamental rethinking of database design, built from the ground up for the challenges and opportunities of the AI age.

By treating the distributed system as a single, unified computer, Data Substrate eliminates many of the complexities that have traditionally made distributed databases difficult to build, operate, and reason about. This approach enables:

Modular architecture enables community collaboration and avoid reinventing the (many) wheels
True scalability without sacrificing consistency
Independent resource scaling for compute, memory, logging, and storage
Better performance through optimized hardware utilization and innovative algorithm design
Cloud-native features like auto-scaling and scale-to-zero
Simplified development through familiar single-node programming models

Get Started with Data Substrate

Ready to explore Data Substrate in action? Our open-source implementations are available on GitHub:

EloqKV: A high-performance key-value store built on Data Substrate
EloqSQL: A MySQL-compatible distributed SQL database
EloqDoc: A document database for modern applications

Join our Discord community to connect with other developers and stay updated on the latest developments in Data Substrate technology.

The Rise of Object Storage in Cloud OLTP Architecture

June 11, 2025 · 5 min read

EloqData

EloqData Core Team

At the recent Data Stream Summit 2025, Hubert Zhang, CTO of EloqData, delivered a talk on building elastic, agentic AI data pipelines using Apache Pulsar and EloqDoc.

Exploring EloqKV Decoupled Architecture: A Tour with an Agentic AI

March 28, 2025 · 7 min read

EloqData

EloqData Core Team

In the previous blog, we discussed the future database foundation for Agentic AI Applications. In this blog we will simplify the agentic application and use EloqKV as data store to explore EloqKV's decoupled architecture.

Building a Data Foundation for Agentic AI Applications

March 19, 2025 · 7 min read

EloqData

EloqData Core Team

We have recently open sourced our three products: EloqKV, EloqSQL, and EloqDoc. These offerings reflect our commitment to addressing the evolving demands of modern data infrastructure, particularly as we enter an era dominated by powerful, autonomous AI systems.

LLM-powered Artificial Intelligence (AI) applications are driving transformative changes across industries, from healthcare to finance and beyond. We are rapidly entering the Agentic Application Age, an era where autonomous, AI-driven agents not only assist but actively make decisions, manage tasks, and optimize outcomes independently.

However, the backbone of these applications—the data infrastructure—faces immense challenges in scalability, consistency, and performance. In this post, we explore the critical limitations of current solutions and introduce EloqData’s innovative approach specifically designed to address these challenges. We also share our vision for an AI-native database, purpose-built to empower the Agentic Application Age, paving the way for smarter, more autonomous, and responsive AI applications in the future.

Why We Develop EloqDB Mainly in C++

October 26, 2024 · 8 min read

EloqData

EloqData Core Team

We have recently introduced EloqKV, our distributed database product built on a cutting-edge architecture known as Data Substrate. Over the past several years, the EloqData team has worked tirelessly to develop this software, ensuring it meets the highest standards of performance and scalability. One key detail we’d like to share is that the majority of EloqKV’s codebase was written in C++.

ACID in EloqKV : Atomic Operations

September 1, 2024 · 8 min read

EloqData

EloqData Core Team

In the previous blog, we discussed the durable feature of EloqKV and benchmarked the write performance of EloqKV with the Write-Ahead-Log enabled. In this blog, we will continue to explore the transaction capabilities of EloqKV and benchmark the performance of distributed atomic operations using the Redis MULTI EXEC commands.

Introduction to Data Substrate

August 11, 2024 · 12 min read

EloqData

EloqData Core Team

In this blog post, we introduce our transformative concept Data Substrate. Data Substrate abstracts core functionality in online transactional databases (OLTP) by providing a unified layer for CRUD operations. A database built on this unified layer is modular: a database module is optional, can be replaced and can scale up/out independently of other modules.

The Rise and Fall of the Scale-Out Mindset​

The New Users: AI Powered Applications and Autonomous Agents​

Looking Forward​

The Advantage of Combining Workloads on a Real Database​

1. Avoid Cascading Failures​

2. Avoid Development Complexity​

3. Avoid Consistency and Durability Confusion​

What About Scalability and Performance?​

1. Scalable Transactions​

2. Independent Resource Scaling​

3. Performance and the Zero-Overhead Principle​

PostgreSQL May Not Be the Answer​

The Road Ahead: Toward a True Data Substrate​

Option 1: Block-Level Storage (EBS) - Durable but Expensive and Slow​

Option 2: Local NVMe - Fast but Ephemeral (and Now Proven Risky)​

Option 3: Object Storage (S3) - Durable & Cheap, But Latency Is a Challenge​

The Data Substrate Approach: Object Storage First, NVMe as Cache, EBS for Logs​

The Larger Industry Trend​

After the Outage: A Hard Question Every Architect Should Ask​

Summary​

1. Data Substrate: Motivation and Philosophy​

2. A Deeper Dive Into Data Substrate Architecture​

3. The Benefits of Data Substrate Architecture​

Why Data Substrate Matters​

Get Started with Data Substrate​

The Rise and Fall of the Scale-Out Mindset

The New Users: AI Powered Applications and Autonomous Agents

Looking Forward

The Advantage of Combining Workloads on a Real Database

1. Avoid Cascading Failures

2. Avoid Development Complexity

3. Avoid Consistency and Durability Confusion

What About Scalability and Performance?

1. Scalable Transactions

2. Independent Resource Scaling

3. Performance and the Zero-Overhead Principle

PostgreSQL May Not Be the Answer

The Road Ahead: Toward a True Data Substrate

Option 1: Block-Level Storage (EBS) - Durable but Expensive and Slow

Option 2: Local NVMe - Fast but Ephemeral (and Now Proven Risky)

Option 3: Object Storage (S3) - Durable & Cheap, But Latency Is a Challenge

The Data Substrate Approach: Object Storage First, NVMe as Cache, EBS for Logs

The Larger Industry Trend

After the Outage: A Hard Question Every Architect Should Ask

Summary

1. Data Substrate: Motivation and Philosophy

2. A Deeper Dive Into Data Substrate Architecture

3. The Benefits of Data Substrate Architecture

Why Data Substrate Matters

Get Started with Data Substrate