Base class
This part of the project documentation focuses on a task-oriented approach.
Use it as a guide to accomplish implementing your own Waterfall class.
Abstract methods
The PandasWaterfall and SparkWaterfall classes inherent from the Waterfall base class. The base Waterfall class has two abstract methods that need to implemented in any class inheriting from it.
_count_or_distinct
Column count for self.columns and distinct column count for self.distinct_columns including NaNs.
Source code in waterfall_logging/log.py
76 77 78 | |
_count_or_distinct_dropna
Column count for self.columns and distinct column count for self.distinct_columns excluding NaNs.
Source code in waterfall_logging/log.py
80 81 82 | |
Contributions
If you find a better way to implement these functions (for instance, a computationally less expensive way in Spark), please contribute to this package in Github!