BiasRelabler#
- class empulse.samplers.BiasRelabler(estimator, *, strategy='statistical parity', transform_feature=None)[source]#
Sampler which relabels instances to remove bias against a subgroup.
Read more in the User Guide.
- Parameters:
- estimatorEstimator instance
Base estimator which is used to determine the number of promotion and demotion pairs.
- strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’
Determines how the group weights are computed. Group weights determine how many instances to relabel for each combination of target and sensitive_feature.
'statistical parity'or'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.Callable: function which computes the number of labels swaps based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the number of pairs needed to be swapped.
- transform_featureOptional[Callable[[numpy.ndarray], numpy.ndarray]], default=None
Function which transforms sensitive feature before resampling the training data. The function takes in the sensitive feature in the form of a numpy.ndarray and outputs the transformed sensitive feature as a numpy.ndarray. This can be useful if you want to transform a continuous variable to a binary variable at fit time.
- Attributes:
- estimator_Estimator instance
Fitted estimator.
References
[1]Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159
Examples
import numpy as np from empulse.samplers import BiasRelabler from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression X, y = make_classification() high_clv = np.random.randint(0, 2, y.shape) sampler = BiasRelabler(LogisticRegression()) sampler.fit_resample(X, y, sensitive_feature=high_clv)
Example with passing high-clv indicator through cross-validation:
import numpy as np from empulse.samplers import BiasRelabler from imblearn.pipeline import Pipeline from sklearn import set_config from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score set_config(enable_metadata_routing=True) X, y = make_classification() high_clv = np.random.randint(0, 2, y.shape) pipeline = Pipeline([ ('sampler', BiasRelabler( LogisticRegression() ).set_fit_resample_request(sensitive_feature=True)), ('model', LogisticRegression()) ]) cross_val_score(pipeline, X, y, params={'sensitive_feature': high_clv})
Example with passing clv through a grid search and dynamically determining high_clv customer based on training data:
import numpy as np from empulse.samplers import BiasRelabler from imblearn.pipeline import Pipeline from sklearn import set_config from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV set_config(enable_metadata_routing=True) X, y = make_classification() clv = np.random.rand(y.size) def to_high_clv(clv: np.ndarray) -> np.ndarray: return (clv > np.median(clv)).astype(np.int8) pipeline = Pipeline([ ('sampler', BiasRelabler( LogisticRegression(), transform_feature=to_high_clv ).set_fit_resample_request(sensitive_feature=True)), ('model', LogisticRegression()) ]) param_grid = {'model__C': np.logspace(-5, 2, 10)} grid_search = GridSearchCV(pipeline, param_grid=param_grid) grid_search.fit(X, y, sensitive_feature=clv)
- fit(X, y, **params)#
Check inputs and statistics of the sampler.
You should use
fit_resamplein all cases.- Parameters:
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Data array.
- yarray-like of shape (n_samples,)
Target array.
- Returns:
- selfobject
Return the instance itself.
- fit_resample(X, y, *, sensitive_feature=None)[source]#
Fit the estimator and relabel the data according to the strategy.
- Parameters:
- X2D array-like, shape=(n_samples, n_features)
- y1D array-like, shape=(n_samples,)
- sensitive_feature1D array-like, shape=(n_samples,)
Sensitive feature used to determine the number of promotion and demotion pairs.
- Returns:
- X2D array-like, shape=(n_samples, n_features)
Original training data.
- ynp.ndarray
Relabeled target values.
- get_feature_names_out(input_features=None)#
Get output feature names for transformation.
- Parameters:
- input_featuresarray-like of str or None, default=None
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns:
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_fit_resample_request(*, sensitive_feature='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fit_resamplemethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofit_resampleif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit_resample.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sensitive_featureparameter infit_resample.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.