Data science and analytics
Introducing Agile Analytics
Within Agile the best measure of progress is to see the DW / BI system running as early as possible, this is achieved by dividing the work into iterations.
Iterations are usually two to four weeks long and at the end of each iteration a demonstration, known as showcase, is done with the results. But, how to build a DW / BI system using agile practices?It is common to have little versioning in DW / BI systems or almost no versioning. It is of utmost importance to use a version control system (SCM) to version all artifacts created during the project. Some of the most widely used Open Source SCM are git, SVN, CVS.
The following base structure can be used as a reference for project artifacts versioning:dw_bi_system ├── doc ├── provisioning └── src ├── apps ├── data ├── db_migrations ├── etls ├── reports └── schemas |
Data Warehousing / Business Intelligence System System Documentation Environment provisioning code System Source Code BI Applications code System Static data (.csv, .txt, .xml, .sql) SQL scripts for databases change management ETL Code (Data Warehousing) Dashboards and reports source files Metadata schemes or models |
One of the agile success factors is automating repetitive tasks, so that development teams can focus on issues that add value to the DW / BI system.
With regard to environments where the DW / BI system will run, automation implies creating code that allows provision of the operating system, base software, database server, settings, tools, etc. This usually is called IaC (Infrastructure as Code). Ansible is a platform that allows you to create YAML code for provisioning environments and works together with Vagrant which is a virtual environment manager.Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.