Synthetic Data Vault

Technology Radar

Last updated : Apr 26, 2023

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2023

Trial

Synthetic Data Vault (SDV) is a synthetic data generation ecosystem of libraries that can learn the distribution of a data set to generate synthetic data with the same format and statistical properties as the source. In the past, we talked about the downsides of using production data in test environments. However, the nuances of data distribution in production can hardly be replicated manually, resulting in defects and surprises. We've had good experiences using SDV to generate large data for performance testing. SDV fares well with modeling a single table. However, data generation time increases considerably as the number of tables with foreign key constraints increases. Nonetheless, SDV offers great promise for local performance testing. It's a good tool for synthetic data generation and worth considering for your testing needs.

Oct 2022

Assess

Synthetic Data Vault (SDV) is a synthetic data generation ecosystem of libraries that can learn the distribution of a data set to generate synthetic data with the same format and statistical properties as the source. In the past, we talked about the downsides of using production data in test environments. However, the nuances of data distribution in production can hardly be replicated manually, resulting in defects and surprises. We believe SDV and similar tools can address this gap by generating production-like data for single-table, complex multi-table and multivariate timeseries data. Although SDV isn't new, we quite like it and decided to highlight it.

Published : Oct 26, 2022