Versioning data for reproducible analytics

技术雷达

本页面中的信息并不完全以您的首选语言展示，我们正在完善其他语言版本。想要以您的首选语言了解相关信息，可以点击这里下载PDF。

发布于 : Nov 14, 2018

不在本期内容中

这一条目不在当前版本的技术雷达中。如果它出现在最近几期中，那么它很有可能仍然具有相关参考价值。如果这一条目出现在更早的雷达中，那么它很有可能已经不再具有相关性，我们的评估将不再适用于当下。很遗憾我们没有足够的带宽来持续评估以往的雷达内容。了解更多

Nov 2018

试验

When it comes to large-scale data analysis or machine intelligence problems, being able to reproduce different versions of analysis done on different data sets and parameters is immensely valuable. To achieve reproducible analysis, both the data and the model (including algorithm choice, parameters and hyperparameters) need to be version controlled. Versioning data for reproducible analytics is a relatively trickier problem than versioning models because of the data size. Tools such as DVC help in versioning data by allowing users to commit and push data files to a remote cloud storage bucket using a git-like workflow. This makes it easy for collaborators to pull a specific version of data to reproduce an analysis.

下载 PDF

English | Español | Português | 中文

订阅技术雷达简报

立即订阅

解决方案

行业

特色

数字出版物和工具

所有洞见

下载 PDF

订阅技术雷达简报

查看存档并阅读往期内容