Model Selection

Contents

class datalib.model_selection.BootstrapSplit(n_splits=5, *, random_state=None, n_samples=None)[source]

Bases: BaseBootstrapSplit

Bootstrap K-Folds cross-validator

Provides train/test indices to split data in bootstraped train/test sets. The folds are determined by the number of bootstrap iterations. At each bootstrap round, the train folds are nothing else than the boostrapped samples of the dataset whereas the test sets are composed of all observations that are missing from the train folds.

Parameters

n_splits: int, default=5

Number of bootstrap rounds. Must at least be 2.

random_stateint, RandomState instance or None, default=None

When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold for each class. Otherwise, leave random_state as None. Pass an int for reproducible output across multiple function calls.

References

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning - Sebastian Raschka

Examples

>>> import numpy as np
>>> from datalib.model_selection import BootstrapSplit
>>> X=np.array([[1, 2], [3, 4], [5, 6], [7, 8], [3, 4], [5, 6]])
>>> y=np.array([0, 1, 0, 1, 0, 1])
>>> boot=BootstrapSplit()
>>> boot.get_n_splits(X)
5
>>> print(boot)
BootstrapSplit(n_splits=5, random_state=None)
>>> for train_index, test_index in boot.split(X, y):
...     print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [5 1 5 2 5] TEST: [0, 3, 4]
TRAIN: [3 0 0 4 2] TEST: [1, 5]
TRAIN: [2 4 0 1 2] TEST: [3, 5]
TRAIN: [4 4 3 4 4] TEST: [0, 1, 2, 5]
TRAIN: [5 4 2 1 2] TEST: [0, 3]

Notes

Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.