BiasReweighingClassifier#
- class empulse.models.BiasReweighingClassifier(estimator, *, strategy='statistical parity', transform_feature=None)[source]#
Classifier which reweighs instances during training to remove bias against a subgroup.
Read more in the User Guide.
- Parameters:
- estimatorEstimator instance
Base estimator which is used for fitting and predicting. Base estimator must accept sample_weight as an argument in its fit method.
- strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’
Determines how the sample weights are computed. Sample weights are passed to the estimator’s fit method.
'statistical parity'or'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.Callable: function which computes the sample weights based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the sample weights. Sample weights are a numpy array where each represents the weight given to that respective instance. Sample weights should be normalized to fall between 0 and 1.
- transform_featureOptional[Callable], default=None
Function which transforms sensitive feature before computing sample weights.
References
[1]Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159
Examples
Using the BiasReweighingClassifier with a logistic regression model:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasReweighingClassifier X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) model = BiasReweighingClassifier(estimator=LogisticRegression()) model.fit(X, y, sensitive_feature=high_clv)
Converting a continuous attribute to a binary attribute:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasReweighingClassifier X, y = make_classification() clv = np.random.rand(X.shape[0]) * 100 model = BiasReweighingClassifier( estimator=LogisticRegression(), transform_feature=lambda clv: (clv > np.quantile(clv, 0.8)).astype(int) ) model.fit(X, y, sensitive_feature=clv)
Using a custom strategy function:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from empulse.models import BiasReweighingClassifier X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) # Simple strategy to double the weight for the sensitive feature def strategy(y_true, sensitive_feature): sample_weights = np.ones(len(sensitive_feature)) sample_weights[np.where(sensitive_feature == 0)] = 0.5 return sample_weights model = BiasReweighingClassifier( estimator=LogisticRegression(), strategy=strategy ) model.fit(X, y, sensitive_feature=high_clv)
Passing the sensitive feature in a cross-validation grid search:
import numpy as np from sklearn import config_context from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from empulse.models import BiasReweighingClassifier with config_context(enable_metadata_routing=True): X, y = make_classification() high_clv = np.random.randint(0, 2, size=X.shape[0]) param_grid = {'model__estimator__C': [0.1, 1, 10]} pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', BiasReweighingClassifier(LogisticRegression()).set_fit_request(sensitive_feature=True)) ]) search = GridSearchCV(pipeline, param_grid) search.fit(X, y, sensitive_feature=high_clv)
- fit(X, y, *, sensitive_feature=None, **fit_params)[source]#
Fit the estimator and reweigh the instances according to the strategy.
- Parameters:
- X2D array-like, shape=(n_samples, n_features)
- y1D array-like, shape=(n_samples,)
- sensitive_feature1D array-like, shape=(n_samples,), default = None
Sensitive attribute used to determine the sample weights.
- fit_paramsdict
Additional parameters passed to the estimator’s fit method.
- Returns:
- selfBiasReweighingClassifier
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class labels for X.
- Parameters:
- X2D array-like, shape=(n_samples, n_dim)
Features to predict.
- Returns:
- y_pred1D numpy.ndarray, shape=(n_samples,)
Predicted class labels.
- predict_proba(X)[source]#
Predict class probabilities for X.
- Parameters:
- X2D array-like, shape=(n_samples, n_dim)
Features to predict.
- Returns:
- y_pred2D numpy.ndarray, shape=(n_samples, n_classes)
Predicted class probabilities.
- score(X, y, sample_weight=None)#
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)w.r.t. y.
- set_fit_request(*, sensitive_feature='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sensitive_featureparameter infit.
- Returns:
- selfobject
The updated object.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.