cascade.utils.tables#

class cascade.utils.tables.CSVDataset(csv_file_path: str, *args: Any, **kwargs: Any)[source]#

Wrapper for .csv files.

__init__(csv_file_path: str, *args: Any, **kwargs: Any) None[source]#

Passes all args and kwargs to pd.read_csv

Parameters:

csv_file_path – path to the .csv file

class cascade.utils.tables.FeatureTable(table: TableDataset | DataFrame, *args: Any, **kwargs: Any)[source]#
__init__(table: TableDataset | DataFrame, *args: Any, **kwargs: Any) None[source]#

Table dataset which allows to easily define and compute features

Example

```python >>> import pandas as pd >>> from cascade.utils.tables import FeatureTable >>> df = pd.read_csv(r’data .csv’, index_col=0) >>> df id count name 0 0 1 aaa 1 1 5 bbb 2 2 0 ccc >>> ft = FeatureTable(df) >>> ft.get_features() [‘id’, ‘count’, ‘ name’] >>> ft.add_feature(‘square’, lambda df: df[‘count’] * df[‘count’]) >>> def counts(df): >>> return df[‘count’] * 2, df[‘count’] * 3

>>> ft.add_feature(('count_2', 'count_3'), counts)
>>> ft.get_features()
['id', 'count', ' name', 'square', ('count_2', 'count_3')]
>>> ft.get_table(['count', ('count_2', 'count_3')])
   count  count_2  count_3
0      1        2        3
1      5       10       15
2      0        0        0

```

Parameters:

table (Union[TableDataset, pd.DataFrame]) – The table to wrap

add_feature(name: str | Tuple[str], func: Callable[[DataFrame], Series | Tuple[str]], *args: Any, **kwargs: Any) None[source]#
get_features() List[str | Tuple[str]][source]#

Returns the list of feature names with all computed features added before

Returns:

List of feature names

Return type:

List[str]

get_meta() List[Dict[Any, Any]][source]#
Returns:

meta – A list where last element is this dataset’s metadata. Meta can be anything that is worth to document about the dataset and its data. This is done in form of list to enable cascade-like calls in Modifiers and Samplers.

Return type:

Meta

get_table(features: str | List[str | Tuple[str]] | None = None, dropna: bool = False) DataFrame[source]#
class cascade.utils.tables.LargeCSVDataset(*args: Any, t=None, **kwargs: Any)[source]#
__init__(*args: Any, t=None, **kwargs: Any) None[source]#
Parameters:

t (optional) – pd.DataFrame or TableDataset to be set as table

class cascade.utils.tables.PartedTableLoader(*args: Any, t=None, **kwargs: Any)[source]#
__init__(*args: Any, t=None, **kwargs: Any) None[source]#
Parameters:

t (optional) – pd.DataFrame or TableDataset to be set as table

class cascade.utils.tables.TableDataset(*args: Any, t: DataFrame | TableDataset | None = None, **kwargs: Any)[source]#

Wrapper for ``pd.DataFrame``s which allows to manage metadata and perform validation.

__getitem__(index: int) Series[source]#

Returns a row from table by index

__init__(*args: Any, t: DataFrame | TableDataset | None = None, **kwargs: Any) None[source]#
Parameters:

t (optional) – pd.DataFrame or TableDataset to be set as table

__len__() int[source]#

Returns length of the table

get_meta() List[Dict[Any, Any]][source]#
Returns:

meta – A list where last element is this dataset’s metadata. Meta can be anything that is worth to document about the dataset and its data. This is done in form of list to enable cascade-like calls in Modifiers and Samplers.

Return type:

Meta

to_csv(path: str, **kwargs: Any) None[source]#

Saves the table to .csv file. Any kwargs are sent to pd.DataFrame.to_csv.

class cascade.utils.tables.TableFilter(dataset: TableDataset, mask: List[bool], *args: Any, **kwargs: Any)[source]#

Filter for table values

__init__(dataset: TableDataset, mask: List[bool], *args: Any, **kwargs: Any) None[source]#
Parameters:
  • dataset (TableDataset) – Dataset to be filtered.

  • mask (Iterable[bool]) – Binary mask to select values from table.

class cascade.utils.tables.TableIterator(csv_file_path: str, *args: Any, chunk_size: int = 1000, **kwargs: Any)[source]#

Iterates over the table from path by the chunks.

__init__(csv_file_path: str, *args: Any, chunk_size: int = 1000, **kwargs: Any) None[source]#
Parameters:
  • csv_file_path (str) – Path to the .csv file

  • chunk_size (int, optional) – number of rows to return in one __next__