Synthetic Data Vault (SDV) is a synthetic data generation ecosystem of libraries that can learn the distribution of a data set to generate synthetic data with the same format and statistical properties as the source. In the past, we talked about the downsides of using production data in test environments. However, the nuances of data distribution in production can hardly be replicated manually, resulting in defects and surprises. We've had good experiences using SDV to generate large data for performance testing. SDV fares well with modeling a single table. However, data generation time increases considerably as the number of tables with foreign key constraints increases. Nonetheless, SDV offers great promise for local performance testing. It's a good tool for synthetic data generation and worth considering for your testing needs.
Synthetic Data Vault (SDV) is a synthetic data generation ecosystem of libraries that can learn the distribution of a data set to generate synthetic data with the same format and statistical properties as the source. In the past, we talked about the downsides of using production data in test environments. However, the nuances of data distribution in production can hardly be replicated manually, resulting in defects and surprises. We believe SDV and similar tools can address this gap by generating production-like data for single-table, complex multi-table and multivariate timeseries data. Although SDV isn't new, we quite like it and decided to highlight it.