DuckDB is an embedded, columnar database for data science and analytical workloads. Data analysts usually load the data locally in tools like pandas or data.table to quickly analyze patterns and form hypotheses before scaling the solution in the server. However, we're now using DuckDB for such use cases, because it unlocks the potential to do larger than memory analysis. DuckDB supports range joins, vectorized execution and multiversion concurrency control (MVCC) for large transactions, and our teams are quite happy with it.
DuckDB is an embedded, columnar database for data science and analytical workloads. Analysts spend significant time cleaning and visualizing data locally before scaling it to servers. Although databases have been around for decades, most of them are designed for client-server use cases and therefore not suitable for local interactive queries. To work around this limitation analysts usually end up using in-memory data-processing tools such as Pandas or data.table. Although these tools are effective, they do limit the scope of analysis to the volume of data that can fit in memory. We feel DuckDB neatly fills this gap in tooling with an embedded columnar engine that is optimized for analytics on local, larger-than-memory data sets.