nanoml.data module¶
- nanoml.data.load_dataset_flexible(dataset_path: str, *args, **kwargs)[source]¶
Get the appropriate dataset loader based on the dataset path.
- Args:
dataset_path (str): The path to the dataset.
- Raises:
Exception: If the dataset is not found.
- Returns:
datasets.Dataset: The dataset.
- nanoml.data.split_hf_dataset(dataset, val_size=0.1, test_size=0.1, **kwargs)[source]¶
Split a Hugging Face dataset into train, validation, and test sets.
- Args:
- dataset (datasets.Dataset): The dataset to split.val_size (float | int, optional): The size of the validation set. Defaults to 0.1.test_size (float | int, optional): The size of the test set. Defaults to 0.1.**kwargs: Additional keyword arguments to pass to the train_test_split method.
- Returns:
datasets.DatasetDict: A dictionary containing the train, validation, and test sets.