Databricks Overwatch is a Databricks Labs project that enables teams to analyze various operational metrics of Databricks workloads around cost, governance and performance with support to run what-if experiments. It's essentially a set of data pipelines that populate tables in Databricks, which can then be analyzed using tools like notebooks. Overwatch is very much a power tool; however, it's still in its early stages and it may take some effort to set it up — our use of it required Databricks solution architects to help set it up and populate a price reference table for cost calculations — but we expect adoption to get easier over time. The level of analysis made possible by Overwatch is deeper than what is allowed by cloud providers' cost analysis tools. For example, we were able to analyze the cost of job failures — recognizing that failing fast saves money compared to jobs that only fail near the final step — and break down the cost by various groupings (workspace, cluster, job, notebook, team). We also appreciated the improved operational visibility, as we could easily audit access controls around cluster configurations and analyze operational metrics like finding the longest running notebook or largest read/write volume. Overwatch can analyze historical data, but its real-time mode allows for alerting which helps you to add appropriate controls to your Databricks workloads.