API Reference

class chelo.CheLoDataset(selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: ABC

Abstract Base Class for datasets.

__init__(selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the dataset with optional selected features and targets.

Parameters:
  • selected_features – List of features to select (default: all).

  • selected_targets – List of targets to select (default: all).

abstractmethod get_dataset_info() Dict[str, Any][source]

Provide metadata about the dataset (e.g., source, size, description).

get_features_shape()[source]

Returns the shape of the dataset’s feature data.

Returns:

Tuple representing the feature data shape.

get_target_shape()[source]

Returns the shape of the dataset’s target data.

Returns:

Tuple representing the target data shape.

list_features() List[str][source]

Return the list of available features.

list_targets() List[str][source]

Return the list of available targets.

abstractmethod load_data() None[source]

Load the dataset and populate self.raw_features and self.raw_targets.

preview(n: int = 5) Dict[str, Dict[str, List[Any]]][source]

Preview the first n rows of the dataset.

Returns:

A dictionary with the first few rows.

select_features(feature_names: List[str]) None[source]

Dynamically select features from the dataset.

Parameters:

feature_names – List of feature names to select.

select_targets(target_names: List[str]) None[source]

Dynamically select targets from the dataset.

Parameters:

target_names – List of target names to select.

selected_features()[source]

Returns a list of feature names selected.

Returns:

List of feature names.

selected_targets()[source]

Returns a list of target names selected.

Returns:

List of target names.

size() int[source]

Get the size of the dataset (number of samples).

statistics() Dict[str, Dict[str, float]][source]

Compute basic statistics for the features and targets.

Returns:

A dictionary of statistics (mean, std, min, max) for each feature and target.

to_numpy() Tuple[ndarray, ndarray][source]

Convert the dataset to numpy arrays.

Returns:

Tuple of (features, targets) in numpy format.

to_pandas() DataFrame[source]

Convert the dataset to a single pandas DataFrame with both features and targets.

Returns:

A pandas DataFrame containing both features and targets.

to_pytorch()[source]

Provide a PyTorch Dataset object.

Returns:

A PyTorch Dataset containing features and targets.

class chelo.DatasetRegistry[source]

Bases: object

A registry to manage available datasets in CheLo.

classmethod get_dataset(name: str, **kwargs: Any) Any[source]

Retrieve an instance of the specified dataset by name.

Parameters:
  • name – Name of the dataset to retrieve.

  • kwargs – Additional arguments to pass to the dataset constructor.

Returns:

An instance of the dataset.

Raises:

ValueError – If the dataset is not found.

classmethod list_datasets() List[str][source]

List all registered datasets.

Returns:

A list of names of registered datasets.

classmethod register(dataset_cls: Type) None[source]

Register a dataset class with the registry.

Parameters:

dataset_cls – The dataset class to register.

Raises:

ValueError – If the dataset is already registered.

class chelo.datasets.AmesMutagenicityDataset(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None)[source]

Bases: CheLoDataset

__init__(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None) None[source]

Initialize the Ames Mutagenicity dataset.

Parameters:
  • selected_features – Features to select (default: all features).

  • selected_targets – Targets to select (default: all targets).

get_dataset_info() Dict[str, str | Sequence[str]][source]

Retrieve metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the dataset into memory.

class chelo.datasets.BCFactorDataset(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None)[source]

Bases: CheLoDataset

__init__(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None) None[source]

Initialize the Bioconcentration Factor (BCF) dataset.

Parameters:
  • selected_features – Features to select (default: all features).

  • selected_targets – Targets to select (default: all targets).

get_dataset_info() Dict[str, str | Sequence[str]][source]

Retrieve metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the dataset into memory.

class chelo.datasets.CSTRDataset(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None, window: int | None = None)[source]

Bases: CheLoDataset

__init__(selected_features: Sequence[str] | None = None, selected_targets: Sequence[str] | None = None, window: int | None = None) None[source]

Initialize the CSTR Dataset.

The dataset contains the concentrations of three species (A, B, and X) over time. The inlet concentrations are fixed.

Parameters:
  • selected_features – Features to select (default: all features).

  • selected_targets – Targets to select (default: all targets).

  • window – Number of previous time-steps to include in each feature (default: 1).

get_dataset_info() Dict[str, str | Sequence[str]][source]

Retrieve metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the CSTRDataset dataset.

class chelo.datasets.CoalFiredPlantDataset(selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: CheLoDataset

Dataset class for Coal Fired Power Plant Thermal Performance.

Provides utilities to load, process, and interact with the dataset.

__init__(selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the Coal Fired Power Plant Thermal Performance Dataset.

Parameters:
  • selected_features – List of features to select (default: all features).

  • selected_targets – List of targets to select (default: all targets).

get_dataset_info() Dict[str, str | List[str]][source]

Get metadata about the dataset.

Returns:

A dictionary containing dataset metadata including name, description, features, and targets.

load_data() None[source]

Load the dataset from Kaggle or cache, and preprocess it.

Downloads the dataset if not already cached, removes missing values, and initializes the feature and target sets.

class chelo.datasets.OPSDPVDataset(country: str = 'GR', start_date: datetime | None = None, end_date: datetime | None = None, historical_window: int = 48, prediction_horizon: int = 12, prediction_window: int = 24, prediction_step: int = 6, use_future_weather: bool = False, selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: CheLoDataset

A dataset class for Open Power System Data PV dataset. Provides functionalities to download, process, and prepare the dataset for forecasting tasks.

__init__(country: str = 'GR', start_date: datetime | None = None, end_date: datetime | None = None, historical_window: int = 48, prediction_horizon: int = 12, prediction_window: int = 24, prediction_step: int = 6, use_future_weather: bool = False, selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the OPSD PV Dataset.

Parameters:
  • country – The country to use. Must be one of the available countries.

  • start_date – The start date of the dataset. Defaults to earliest available data if not provided. Format: YYYY-MM-DD hour:minute:second

  • end_date – The end date of the dataset. Defaults to the latest available data if not provided. Format: YYYY-MM-DD hour:minute:second

  • historical_window – Number of time steps in the historical window for feature processing.

  • prediction_horizon – Time steps into the future for prediction targets.

  • prediction_window – The length of the prediction window.

  • prediction_step – The step size for prediction data.

  • use_future_weather – Whether to use future weather as feature (e.g., as forecast).

  • selected_features – List of selected features to include.

  • selected_targets – List of selected targets to include.

get_dataset_info() Dict[str, str | List[str]][source]

Return metadata about the dataset.

load_data() None[source]

Download, process, and cache the dataset for the specified country and date range.

class chelo.datasets.VLEDataset(selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: CheLoDataset

__init__(selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the VLEDataset.

Parameters:
  • selected_features – Features to select (default: all).

  • selected_targets – Targets to select (default: all).

get_dataset_info() Dict[str, str | List[str]][source]

Get metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the VLEDataset dataset.

class chelo.datasets.VaporPressureDataset(selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: CheLoDataset

__init__(selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the VaporPressureDataset Dataset.

Parameters:
  • selected_features – Features to select (default: all).

  • selected_targets – Targets to select (default: all).

get_dataset_info() Dict[str, str | List[str]][source]

Get metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the VaporPressureDataset dataset.

class chelo.datasets.WineQualityDataset(wine_type: str = 'red', selected_features: List[str] | None = None, selected_targets: List[str] | None = None)[source]

Bases: CheLoDataset

__init__(wine_type: str = 'red', selected_features: List[str] | None = None, selected_targets: List[str] | None = None) None[source]

Initialize the Wine Quality Dataset.

Parameters:
  • wine_type – Type of wine (‘red’ or ‘white’).

  • selected_features – Features to select (default: all).

  • selected_targets – Targets to select (default: all).

get_dataset_info() Dict[str, str | List[str]][source]

Get metadata about the dataset.

Returns:

A dictionary containing dataset metadata.

load_data() None[source]

Load the dataset from the UCI repository or cache.