For organizations using Azure as their primary cloud provider, Azure Data Factory is currently the default for orchestrating data-processing pipelines. It supports data ingestion, copying data from and to different storage types on prem or on Azure and executing transformation logic. Although we've had adequate experience with Azure Data Factory for simple migrations of data stores from on prem to the cloud, we discourage the use of Azure Data Factory for orchestration of complex data-processing pipelines and workflows. We've had some success with Azure Data Factory when it's used primarily to move data between systems. For more complex data pipelines, it still has its challenges, including poor debuggability and error reporting; limited observability as Azure Data Factory logging capabilities don't integrate with other products such as Azure Data Lake Storage or Databricks, making it difficult to get an end-to-end observability in place; and availability of data source-triggering mechanisms only to certain regions. At this time, we encourage using other open-source orchestration tools (e.g., Airflow) for complex data pipelines and limiting Azure Data Factory for data copying or snapshotting. Our teams continue to use Data Factory to move and extract data, but for larger operations we recommend other, more well-rounded workflow tools.
Azure Data Factory (ADF) is currently Azure's default product for orchestrating data-processing pipelines. It supports data ingestion, copying data from and to different storage types on prem or on Azure and executing transformation logic. While we've had a reasonable experience with ADF for simple migrations of data stores from on prem to cloud, we discourage the use of Azure Data Factory for orchestration of complex data-processing pipelines. Our experience has been challenging due to several factors, including limited coverage of capabilities that can be implemented through coding first, as it appears that ADF is prioritizing enabling low-code platform capabilities first; poor debuggability and error reporting; limited observability as ADF logging capabilities don't integrate with other products such as Azure Data Lake Storage or Databricks, making it difficult to get an end-to-end observability in place; and availability of data source-triggering mechanisms only to certain regions. At this time, we encourage using other open-source orchestration tools (e.g., Airflow) for complex data pipelines and limit ADF for data copying or snapshotting. We're hoping that ADF will address these concerns to support for more complex data-processing workflows and prioritize access to capabilities through code first.