BiasResamplingClassifier#
- class empulse.models.BiasResamplingClassifier(estimator, *, strategy='statistical parity', transform_feature=None)[source]#
Classifier which resamples instances during training to remove bias against a subgroup.
Read more in the User Guide.
- Parameters:
- estimatorEstimator instance
Base estimator which is used for fitting and predicting.
- strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’
Determines how the group weights are computed. Group weights determine how much to over or undersample each combination of target and sensitive feature. For example, a weight of 2 for the pair (y_true == 1, sensitive_feature == 0) means that the resampled dataset should have twice as many instances with y_true == 1 and sensitive_feature == 0 compared to the original dataset.
'statistical parity'or'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.Callable: function which computes the group weights based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the group weights. Group weights are a 2x2 matrix where the rows represent the target variable and the columns represent the sensitive feature. The element at position (i, j) is the weight for the pair (y_true == i, sensitive_feature == j).
- transform_featureOptional[Callable], default=None
Function which transforms sensitive feature before resampling the training data.
- Attributes:
- classes_numpy.ndarray, shape=(n_classes,)
Unique classes in the target.
- estimator_Estimator instance
Fitted base estimator.
References
[1]Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159
Examples
Using the BiasResamplingClassifier with a logistic regression model:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasResamplingClassifier X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) model = BiasResamplingClassifier(estimator=LogisticRegression()) model.fit(X, y, sensitive_feature=high_clv)
Converting a continuous attribute to a binary attribute:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasResamplingClassifier X, y = make_classification() clv = np.random.rand(X.shape[0]) * 100 model = BiasResamplingClassifier( estimator=LogisticRegression(), transform_feature=lambda clv: (clv > np.quantile(clv, 0.8)).astype(int) ) model.fit(X, y, sensitive_feature=clv)
Using a custom strategy function:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasResamplingClassifier X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) # Simple strategy to double the weight for the sensitive feature def strategy(y_true, sensitive_feature): return np.array([ [1, 2], [1, 2] ]) model = BiasResamplingClassifier( estimator=LogisticRegression(), strategy=strategy ) model.fit(X, y, sensitive_feature=high_clv)
Passing the sensitive feature in a cross-validation grid search:
import numpy as np from sklearn import config_context from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from empulse.models import BiasResamplingClassifier with config_context(enable_metadata_routing=True): X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) param_grid = {'model__estimator__C': [0.1, 1, 10]} pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', BiasResamplingClassifier(LogisticRegression()).set_fit_request(sensitive_feature=True)) ]) search = GridSearchCV(pipeline, param_grid) search.fit(X, y, sensitive_feature=high_clv)
- fit(X, y, *, sensitive_feature=None, **fit_params)[source]#
Fit the estimator and resample the instances according to the strategy.
- Parameters:
- X2D array-like, shape=(n_samples, n_dim)
Training data.
- y1D array-like, shape=(n_samples,)
Target values.
- sensitive_feature1D array-like, shape=(n_samples,), default = None
Sensitive feature used to determine the group weights.
- fit_paramsdict
Additional parameters passed to the estimator’s fit method.
- Returns:
- selfBiasResamplingClassifier
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class labels for X.
- Parameters:
- X2D array-like, shape=(n_samples, n_dim)
- Returns:
- y_pred1D numpy.ndarray, shape=(n_samples,)
Predicted class labels.
- predict_proba(X)[source]#
Predict class probabilities for X.
- Parameters:
- X2D array-like, shape=(n_samples, n_dim)
- Returns:
- y_pred2D numpy.ndarray, shape=(n_samples, n_classes)
Predicted class probabilities.
- score(X, y, sample_weight=None)#
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_fit_request(*, sensitive_feature='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sensitive_featureparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.