nanoml.data module

nanoml.data.load_dataset_flexible(dataset_path: str, *args, **kwargs)[source]

Get the appropriate dataset loader based on the dataset path.

Args:

dataset_path (str): The path to the dataset.

Raises:

Exception: If the dataset is not found.

Returns:

datasets.Dataset: The dataset.

nanoml.data.split_hf_dataset(dataset, val_size=0.1, test_size=0.1, **kwargs)[source]

Split a Hugging Face dataset into train, validation, and test sets.

Args:
dataset (datasets.Dataset): The dataset to split.
val_size (float | int, optional): The size of the validation set. Defaults to 0.1.
test_size (float | int, optional): The size of the test set. Defaults to 0.1.
**kwargs: Additional keyword arguments to pass to the train_test_split method.
Returns:

datasets.DatasetDict: A dictionary containing the train, validation, and test sets.