OpenLineage

Technology Radar

Published : Apr 26, 2023

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2023

Assess

OpenLineage is an open standard for lineage metadata collection for data pipelines, designed to instrument jobs as they're running. It defines a generic model of run, job and data set entities using consistent naming conventions. The core lineage model is extensible by defining specific facets to enrich those entities. OpenLineage solves the interoperability problem between producers and consumers of lineage data who otherwise would need to know how to speak to each other in various ways. Although there is a risk of it being another "standard in the middle," being a Linux Foundation AI & Data Foundation project increases its chances of gaining widespread adoption. OpenLineage currently supports data collection for multiple platforms, such as Spark, Airflow and dbt, although users need to configure its listeners. Support for OpenLineage data consumers is more limited at this time.