nanoml.data module¶

nanoml.data.load_dataset_flexible(dataset_path: str, *args, **kwargs)[source]¶

Get the appropriate dataset loader based on the dataset path.

Args:: dataset_path (str): The path to the dataset.
Raises:: Exception: If the dataset is not found.
Returns:: datasets.Dataset: The dataset.

nanoml.data.split_hf_dataset(dataset, val_size=0.1, test_size=0.1, **kwargs)[source]¶

Split a Hugging Face dataset into train, validation, and test sets.

Args:: dataset (datasets.Dataset): The dataset to split.

val_size (float | int, optional): The size of the validation set. Defaults to 0.1.

test_size (float | int, optional): The size of the test set. Defaults to 0.1.

**kwargs: Additional keyword arguments to pass to the train_test_split method.
Returns:: datasets.DatasetDict: A dictionary containing the train, validation, and test sets.