Why Everyone is using PostgreSQL and DuckDB — And Why We May Need More
PostgreSQL and DuckDB have become the go-to databases for developers everywhere. Postgres is the default choice for transactional workloads, while DuckDB has quietly taken over the analytics world. Both are simple, fast, and easy to use. You can spin them up in seconds, run them anywhere, and they "just work." For most use cases, that's more than enough. But it's worth noting that both are single-node systems at heart. They can scale up, but when you hit the limits of one machine, you have to look elsewhere to migrate your infrastructure.
Many people now argue that single-node databases are enough for almost everything. Hardware has become so powerful that even massive workloads can often fit on one beefy machine. OpenAI recently discussed how their main database runs on a single-writer PostgreSQL setup. Just one node handling all the writes, with many read replicas to scale out read traffic. That's a bold design, and it suggests that maybe we no longer need complicated distributed databases because modern CPUs, SSDs, and memory are making scale-out architectures look like overkill.
In this artcle, we discuss how we reached this state in the database landscape, and disucss the future for scalable databases. We draw many inspirations from history, and we believe that there is a very bright future for database community going forward as we entering the new era of growth and prosperity.

The Rise and Fall of the Scale-Out Mindset
Twenty years ago, distributed databases were the hot thing. As the internet exploded and apps started serving millions of users, the old single-node giants like Oracle and SQL Server began to crack under the load. That's when companies like Google and Amazon came up with Bigtable and Dynamo — systems designed to scale horizontally across hundreds or thousands of machines. Their success kicked off a wave of distributed data systems: Cassandra, MongoDB, Redis, HBase and many others. These databases powered the modern internet and became the backbone of web-scale engineering.
The first wave of distributed databases came with trade-offs. To scale across machines, they gave up the ACID guarantees people took for granted in traditional databases. These "NoSQL" systems and early data lakes were fast, but they pushed the pain of handling inconsistency back to the application developers. About a decade later, around 2015, the industry figured out how to bring transactions back — both for online and analytical workloads. That's when the "NewSQL" and "Lakehouse" movements took off, with systems like Google Spanner, CockroachDB, TiDB, Snowflake, and Databricks started to appear around that time. For a while, it looked like distributed databases were going to rule the world. Snowflake went public at a staggering valuation, and companies like Databricks, Clickhouse, Cockroach Labs, and Yugabyte attracted huge funding rounds each valued in the billions.
However, just as distributed databases were hitting their stride, a new idea started to spread — maybe massive scale-out isn't necessary anymore. The first hint came from SQLite, the tiny embedded database that quietly became the most deployed software in the world. Then PostgreSQL surged ahead of MySQL and became the default choice for serious applications, from startups to enterprises. And now, DuckDB — a single-node analytical engine that runs right on your laptop — is reshaping how people do analytics. Suddenly, the towering systems built around Bigtable, Spanner, TiDB, Clickhouse and Snowflake don't look as untouchable as they once did. So what happened?
To understand this shift in mindset, it helps to look back at what changed. Before the internet era, databases like Sybase and Oracle were built for a single organization, usually serving tens of thousands of users. When the first wave of internet applications arrived in the mid-90s, those same systems were pressed into service to run websites like eBay and AltaVista. It worked for a while, until it didn't. By the early 2000s, companies like Google discovered that traditional databases simply couldn't keep up with millions of online users, the new "netizens". That realization gave birth to Google's internal infrastructure, and with it, the entire era of Big Data and distributed databases.
Back then, a typical server had two sockets, one core per socket, running around 1 GHz, and maybe a gigabyte of RAM. Google's famous early papers described clusters of thousands of such machines — an impressive scale at the time. Fast forward to today: a single high-end server can pack terabytes of memory, hundreds of CPU cores running at several gigahertz with much improved instruction-per-cycle, and dozens of SSDs each delivering gigabytes per second of throughput. In other words, the capacity that once required an entire data center now fits in a single box. Meanwhile, the total number of users hasn't grown exponentially. Even with mobile devices and global internet access, we're still bounded by the size of the human population. For most large-scale applications, serving a few million users, a single modern database can easily handle the load.
Distributed OLTP companies such as CockroachDB, TiDB, and Yugabyte have all felt the impact of this shift. Founded between 2015 and 2017, they entered the market with high expectations for explosive growth. Those expectations never quite materialized. By around 2020, their products had become stable and feature-complete, and their valuations reached their peak. Yet, the demand was far smaller than investors had imagined. These systems now face competition from two sides: from powerful single-node databases that can handle most workloads, and from partially distributed but more efficient systems such as Amazon Aurora, Google AlloyDB, and starup offerings such as NeonDB.
The OLAP world has fared better. Analytical workloads depend not only on the number of users but also on how much data is collected, how long the history is kept, and how many users or systems consume the results. As companies collect more signals over longer periods, and as machine learning models begin to consume analytical outputs, the demand for performance and scalability remains high. Snowflake and Databricks had the advantage of starting a few years earlier than the OLTP vendors, which helped them secure a strong position in the market. Much of the growth in the analytics market today comes from a new kind of consumer: AI training and inference systems that rely on massive amounts of analytical data. Still, newer systems such as MotherDuck, which builds on DuckDB, have gained attention from both developers and investors.
The New Users: AI Powered Applications and Autonomous Agents
So what happens next? Hardware will keep getting faster, and there is no sign of that slowing down. Does that mean single-node databases like SQLite, PostgreSQL and DuckDB will take over completely, leaving little room for distributed systems? Probably not.
Due to the recent explosive AI innovation and the emergence of AI powered applications, the way we use software has changed. Every chat, prompt, or API call produces data. For instance, I use ChatGPT almost every day and generate several conversations daily. Each one is a few hundred kilobytes, stored as chat history that I can scroll through anytime. After less than a year, I already have more than a thousand conversations. Compare that with my Amazon account, where I might have made less than a thousand purchases in more than a decade, each only a few kilobytes of data. I have easily produced a hundred times more data with OpenAI than with Amazon, in a fraction of the time.
Programming tools such as Cursor go even further. They can generate tens of millions of tokens every day, and all that text needs to be stored for debugging, history tracking, or later analysis. This volume of data already puts serious pressure on backend systems.
More importantly, a new kind of user is emerging: autonomous agents powered by AI. This shift is unlike anything we have seen before. It is creating enormous demand for both data volume and query intensity. In the age of AI, we have to rethink our data infrastructure, just as the internet once forced us to do.
Soon, autonomous agents will become the dominant users of IT systems. Take something as simple as buying a T-shirt. A human might browse a few pages and make one request every half minute. An AI agent, given the same goal, could issue thousands of requests in seconds to find the perfect item. This kind of machine-driven activity will easily overwhelm infrastructure that was built for human speed and scale. Just like the netizens used to overwhelm infrastructures built for employees in a single company at the begining of the internet age.
And this is only the beginning. Other kinds of AI applications and physical robots will arrive soon. We are standing at the edge of a new wave of innovation. If you think back to 1999, it would have been almost impossible to imagine people posting daily videos on TikTok or calling a ride with a phone through Uber. The same thing will happen again. Many new AI-powered scenarios have not even been imagined yet. Any company that wants to stay ahead, whether a startup or a long-established enterprise, needs to prepare for the pressure that these new workloads will put on their data infrastructure.
The pain of non-scalable systems is not new. Engineers have shared many war stories about hitting a scaling wall and being forced to migrate their databases under fire. We believe it is time to think ahead. It is better to prepare for what AI applications will need, rather than wait until the pressure becomes unbearable. The pace of change may be much faster than people expect.
During the early internet era, Google gained a huge advantage from its internal data infrastructure. It gave their teams the freedom to run experiments, ship features, and index the web without worrying about system limits. I saw this gap first-hand while working on the early version of Bing. Competing with Google was extremely difficult because they could move faster, process more data, and try new ideas at a pace that others simply could not match. Their real-time indexing pipeline was built on top of Percolator, the large-scale distributed transactional database, which itself sat on top of Bigtable, the large-scale distributed NewSQL database. Reproducing that stack was almost impossible for anyone else at the time.
This kind of competitive advantage will matter even more as we enter the hyper-speed growth of the AI era. Companies that build strong data infrastructure early will have the freedom to innovate without being held back by system limits, while everyone else will be stuck fighting their own scalability problems. As mentioned before, OpenAI currently relies on PostgreSQL for its main database, and for now it works. But even at this stage, engineers have already talked about banning new tables creations in the main database and avoiding any write-heavy workload. They also may need hierachical replications as the replication workload of the many read-replicas are overwhelming the write replica. OpenAI is only two years into its hyper-growth phase. These kinds of infrastructure limits can quickly become a major liability as the company scales.
Looking Forward

Going forward, there are certainly spaces for simple, light databases such as SQLite and DuckDB. PostgreSQL is certainly still going to be a major force in the database ecosystem and will continue to power many important applications, just as Orcale DB continues to be super relevant in the internet age. However, we must not ignore the challenges for scalability and performance as we entering the AI era. One cannot dismiss the need for Caterpillar Excavactors just because a shovel is sufficient for my backyard work 99.9% of the time. This is especially true when we are on the cusp of taking on the endeavor of transforming the entire world.
So where does this leave us? If AI agents become the main users of software, the demands on storage and compute will rise far beyond what a single machine can handle. We will need systems that combine the simplicity and performance of single-node databases with the elasticity and fault tolerance of distributed ones. We will also need databases that speak multiple "dialects" at once, since AI applications mix transactional work, analytical work, vector search, caching, and document storage in the same request path. The future will not be served by a pile of disconnected systems glued together with application logic. It will be served by a unified, scalable, multi-model engine that treats data as a first-class substrate.
This is the direction we are building toward at EloqData. Our Data Substrate architecture is designed for the AI age. It brings the speed of in-memory systems, the reliability of transactional databases, and the flexibility of multi-model APIs together in one platform. Our database offerings build on top of this technology, i.e. the Redis API compatible KV store EloqKV, the MongoDB API compatible document DB EloqDoc and the MySQL API compatible relational database EloqSQL all inherit the architectural advantages of Data Substrate to provide scalable, high-performance, low cost, and transactional databases with AI native features such as low cost snapshot and branching. They all can scales out when you need, scales in when you don't, and keeps storage costs low by using object storage as the primary durable layer. They allow developers build AI-driven applications without worrying about whether the next spike in workload will tip the system over.
We believe the next decade of software will not be defined by yet another wave of siloed databases. It will be defined by systems that make data infrastructure disappear into the background, allowing AI systems to read, write, learn, and act at machine speed. Our goal is to help build that foundation. Join the discussion on our Discord Channel, leverage our free tier cloud service to try out, or visit our GitHub repo to contribute. We would be happy to hear your thoughts on the future of databases.




