Some classification problems can exhibit a large imbalance in the distribution of the target classes: for instance there could be several times more negative samples than positive samples. In such cases it is recommended to use stratified sampling as implemented in StratifiedKFold
and StratifiedShuffleSplit
to ensure that relative class frequencies is approximately preserved in each train and validation fold.