cascade.utils.nlp#

class cascade.utils.nlp.TextClassificationFolder(path: str, encoding: str = 'utf-8', *args: Any, **kwargs: Any)[source]#

Dataset to simplify loading of data for text classification. Texts of different classes should be placed in different folders.

__init__(path: str, encoding: str = 'utf-8', *args: Any, **kwargs: Any) None[source]#
Parameters:
  • path (str) – Path to the folder with folders of text files. In each folder should be only one class of texts.

  • encoding (str, optional) – Encoding that is used to open files.

__len__() int[source]#

Total number of files.

get_meta() List[Dict[Any, Any]][source]#
Returns:

meta – A list where last element is this dataset’s metadata. Meta can be anything that is worth to document about the dataset and its data. This is done in form of list to enable cascade-like calls in Modifiers and Samplers.

Return type:

Meta