CostSensitiveSampler#
- class empulse.samplers.CostSensitiveSampler(method='rejection sampling', *, oversampling_norm=0.1, percentile_threshold=0.975, random_state=None, fp_cost=0.0, fn_cost=0.0)[source]#
Sampler which performs cost-proportionate resampling.
This method adjusts the sampling probability of each sample based on the cost of misclassification. This is done either by rejection sampling [1] or oversampling [2].
Read more in the User Guide.
- Parameters:
- method{‘rejection sampling’, ‘oversampling’}, default=’rejection sampling’
Method to perform the cost-proportionate sampling, either ‘RejectionSampling’ or ‘OverSampling’.
- oversampling_norm: float, default=0.1
Oversampling norm for the cost. The smaller the oversampling_norm, the more samples are generated.
- percentile_threshold: float, default=0.975
Outlier adjustment for the cost. Costs are normalized and cost values above the percentile_threshold’th percentile are set to 1.
- random_stateint or
numpy.random.RandomState, optional Random number generator seed for reproducibility.
- fp_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false positives. If
float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification. Is overwritten if another fp_cost is passed to thefit_resamplemethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefit_resamplemethod.- fn_costfloat or array-like, shape=(n_samples,), default=0.0
Cost of false negatives. If
float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification. Is overwritten if another fn_cost is passed to thefit_resamplemethod.Note
It is not recommended to pass instance-dependent costs to the
__init__method. Instead, pass them to thefit_resamplemethod.
- Attributes:
- sample_indices_numpy.ndarray
Indices of the samples that were selected.
Notes
code modified from costcla.sampling.cost_sampling.
References
[1]B. Zadrozny, J. Langford, N. Naoki, “Cost-sensitive learning by cost-proportionate example weighting”, in Proceedings of the Third IEEE International Conference on Data Mining, 435-442, 2003.
[2]C. Elkan, “The foundations of Cost-Sensitive Learning”, in Seventeenth International Joint Conference on Artificial Intelligence, 973-978, 2001.
Examples
import numpy as np from empulse.samplers import CostSensitiveSampler from sklearn.datasets import make_classification X, y = make_classification() fp_cost = np.ones_like(y) * 10 fn_cost = np.ones_like(y) sampler = CostSensitiveSampler(method='oversampling', random_state=42) X_re, y_re = sampler.fit_resample(X, y, fp_cost=fp_cost, fn_cost=fn_cost)
- fit(X, y, **params)#
Check inputs and statistics of the sampler.
You should use
fit_resamplein all cases.- Parameters:
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Data array.
- yarray-like of shape (n_samples,)
Target array.
- Returns:
- selfobject
Return the instance itself.
- fit_resample(X, y, *, fp_cost=Parameter.UNCHANGED, fn_cost=Parameter.UNCHANGED)[source]#
Resample the dataset.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
- yarray-like of shape (n_samples,)
- fp_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of false positives. If
float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.- fn_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$
Cost of false negatives. If
float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification.
- Returns:
- X_resampledndarray of shape (n_samples_new, n_features)
The array containing the resampled data.
- y_resampledndarray of shape (n_samples_new,)
The corresponding label of X_resampled.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_resample_request(*, fn_cost='$UNCHANGED$', fp_cost='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fit_resamplemethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofit_resampleif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit_resample.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- fn_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fn_costparameter infit_resample.- fp_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
fp_costparameter infit_resample.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.