Base class
This part of the project documentation focuses on a task-oriented approach.
Use it as a guide to accomplish implementing your own Waterfall
class.
Abstract methods
The PandasWaterfall
and SparkWaterfall
classes inherent from the Waterfall
base class. The base Waterfall
class has two abstract methods that need to implemented in any class inheriting from it.
_count_or_distinct
Column count for self.columns
and distinct column count for self.distinct_columns
including NaNs.
Source code in waterfall_logging/log.py
76 77 78 |
|
_count_or_distinct_dropna
Column count for self.columns
and distinct column count for self.distinct_columns
excluding NaNs.
Source code in waterfall_logging/log.py
80 81 82 |
|
Contributions
If you find a better way to implement these functions (for instance, a computationally less expensive way in Spark), please contribute to this package in Github!