Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Apr 03, 2024
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Apr 2024
Trial ?

Comparing DataFrames is a common task in data engineering, often done to compare the output of two data transformation approaches to make sure no meaningful deviations or inconsistencies have occurred. DataComPy is a Python library that facilitates the comparison of two DataFrames in pandas, Spark and more. The library goes beyond basic equality checks by providing detailed insights into discrepancies at both row and column levels. DataComPy also has the ability to specify absolute or relative tolerance for comparison of numeric columns as well as known differences it need not highlight in its report. Some of our teams use it as part of their smoke testing suite; they find it efficient when comparing large and wide DataFrames and consider its reports easy to understand and act upon.

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes