We have recently introduced EloqKV, our distributed database product built on a cutting-edge architecture known as Data Substrate. Over the past several years, the EloqData team has worked tirelessly to develop this software, ensuring it meets the highest standards of performance and scalability. One key detail we’d like to share is that the majority of EloqKV’s codebase was written in C++.
Had we launched our product a decade ago, using C++ would have been an obvious and unremarkable choice. However, it's 2024, and the landscape has changed. Today, languages like Rust, Zig, and other type-safe options like Golang are considered modern and trendy for systems programming. So, when we chose C++, a language that some might view as outdated or less "cool", or even bug-prone and "unsafe", it’s natural for people to wonder why.
In this article, we’d like to share the thought process behind our decision to choose C++ over some of the newer, more fashionable languages, the historical lessons we drew inspiration from, and the upcoming progress we expect in the future.
Choosing a Programming Language Is Important
Selecting the right programming language is crucial for any software project, but it becomes even more significant for complex systems software such as databases. The choice of language influences various aspects, including performance, ease of development, and maintainability. In a domain where efficiency and reliability are paramount, the programming language serves as the foundation upon which the entire system is built.
For databases, the implications of this choice are profound. A database must be capable of handling vast amounts of data while providing fast query responses and ensuring data integrity. These requirements necessitate a language that not only excels in performance but also allows for scalable and efficient development practices. Additionally, databases often undergo continuous development and enhancement over decades, making maintainability a critical factor. A well-chosen language can simplify the process of updating and expanding the software's features over time, ensuring that it remains relevant and effective in an ever-evolving technological landscape.
Consider the Hadoop big data stack, which is predominantly built on the Java Virtual Machine (JVM). While Java and JVM ecosystems have been one of the most popular programming language families and were lauded for their portability and rich features, in retrospect, this choice may not have been without controversy. The performance and memory overhead of the JVM, particularly issues related to garbage collection, has caused numerous challenges for developers. Indeed, RedPanda and ScyllaDB are notable examples of rewriting mature, widely-used Java-based frameworks—Kafka and Cassandra, respectively—in C++ from scratch to avoid the JVM penalties.
Another important consideration is the popularity of the programming language and the availability of developers familiar with it. For instance, Spark and Kafka are developed using Scala, while Couchbase and Rabbitmq are in Erlang. Although these languages offer robust features and capabilities, they are not as widely adopted as other programming languages. This relative lack of popularity can create challenges when it comes to larger-scale developer engagement and finding experienced programmers. Toolchain support is generally not on par with more popular programming languages. A less common language may result in increased difficulty in recruiting talent, slowing down development processes and limiting community support for troubleshooting and innovation.
By the late 2010s, Rust emerged as one of the leading programming languages for developing database software. Newer projects such as TiDB, RisingWave, DataFusion, and NeonDB are prominent examples that leverage Rust's capabilities to build efficient and high-quality databases. Notably, RisingWave even published a blog post detailing their decision to discard ten months of work in C++ to rewrite their entire codebase in Rust. Given that EloqData began its journey around 2021, when Rust was already well-established as a robust programming language with excellent features for building safe and performant databases, one might wonder why we opted for C++ instead.
Building a Database from Scratch in C++ in 2024
When we began our project, we were keenly aware that Rust was a highly competitive language for building the foundations of our database. Our eventual decision to choose C++ was based on three main factors.
The first strength of C/C++ lies in its database ecosystem support. Most existing and popular databases are developed in C/C++, providing a wealth of resources and innovations we could leverage. Our Data Substrate technology aims to create a unified, modular architecture that can capitalize on these existing resources while avoiding the need to reinvent the wheel. Although Rust offers good interoperability with C/C++, its memory management model and certain safety restrictions can complicate integration with many established projects.
Another advantage of C++ is its extensive support for foundational libraries. Since most operating systems and lower-level drivers are written in C or C++, bindings for these languages are often the native and best-supported APIs. Performance-focused libraries for IO and networking, such as DPDK, RDMA and liburing, as well as memory management tools such as mimalloc, are developed with C/C++ and provide native support. In contrast, other languages typically require additional layers to effectively utilize these libraries. We anticipate that this trend will continue, with newer hardware and OS abstractions favoring C/C++ support first and foremost.
The third advantage of C++ is its longevity and mature toolchain. Infrastructure software often requires continuous updates and improvements over several decades. For instance, Oracle Database is over 45 years old, while MySQL and PostgreSQL have been around for around 30 years. Even relatively newer systems like Cassandra, MongoDB, and Redis are over 15 years old. To develop a reliable infrastructure solution, we need to be prepared to maintain the codebase for potentially half a century. A lot can change in the tech world over such a long period—consider that 20 years ago, Perl was a highly popular language, and Delphi was more widely used than Python.
When building long-lasting software, it’s crucial to consider the long-term survivability of the programming language, such as continued improvements on the compilers, up-to-date library developments, and modern IDE, debugger and profiler support. In this respect, C++ is a much safer bet. Its extensive history, active development community, and proven resilience over time give us confidence that it will continue to be relevant and well-supported for decades to come.
Of course, C++ comes with its share of legacies that can present challenges compared to many modern languages. To maximize productivity on C++ projects requires a certain level of discipline. While we won’t elaborate on the myriad best practices we’ve implemented to mitigate some of C++'s shortcomings — since these are well documented and widely discussed elsewhere — we acknowledge that effective use of the language demands a strong commitment to coding standards and testing methodologies. In particular, the most harsh arguments against using C++, i.e. memory unsafeness, can be significantly mitigated when developing with a certain modern subset of the C++ language.
Going Forward
At EloqData, we strongly adhere to a modular design philosophy, as we are committed to building a lasting system that will support decades of continued improvements. We recognize that effective API interface design is crucial for enhancing software development productivity and maintainability. This principle is not only reflected in the overall architecture of Data Substrate, which accommodates various query and storage engines, but is also embedded throughout our software development process. We anticipate that advancements will continue to emerge—such as improved memory allocators, more efficient RPC libraries, and optimized hash-table implementations—and we aim to leverage these innovations as they become available in the future.
It is relatively straightforward for us to experiment with other programming languages in our projects when appropriate. We are eager to replace certain modules with components implemented in type-safe languages like Rust where it makes sense. Rust is an exceptional language with a strong following in the systems community, and we aim to utilize it more in many of our upcoming projects.
In contrast to many startup companies that emphasize rapid iteration, quick feedback loops, and fast prototyping, EloqData has taken a different approach. We place a stronger emphasis on doing things right from the onset to avoid future technical debts. While this focus may slow us down a bit, we believe it is a necessary investment. However, it’s worth noting that avoiding future debts is futile if the product and technology lack a viable future to begin with. Ultimately, whether we made the right choice will take time to determine. Regardless, we take pride in the decisions we’ve made and look forward to seeing how our efforts can help our customers tackle their most challenging data problems.