Skip to content

Base class

This part of the project documentation focuses on a task-oriented approach. Use it as a guide to accomplish implementing your own Waterfall class.

Abstract methods

The PandasWaterfall and SparkWaterfall classes inherent from the Waterfall base class. The base Waterfall class has two abstract methods that need to implemented in any class inheriting from it.

_count_or_distinct

Column count for self.columns and distinct column count for self.distinct_columns including NaNs.

Source code in waterfall_logging/log.py
76
77
78
@abc.abstractmethod
def _count_or_distinct(self, table) -> List[int]:
    """Column count for `self.columns` and distinct column count for `self.distinct_columns` including NaNs."""

_count_or_distinct_dropna

Column count for self.columns and distinct column count for self.distinct_columns excluding NaNs.

Source code in waterfall_logging/log.py
80
81
82
@abc.abstractmethod
def _count_or_distinct_dropna(self, table) -> List[int]:
    """Column count for `self.columns` and distinct column count for `self.distinct_columns` excluding NaNs."""

Contributions

If you find a better way to implement these functions (for instance, a computationally less expensive way in Spark), please contribute to this package in Github!