BiasReweighingClassifier#

class empulse.models.BiasReweighingClassifier(estimator, *, strategy='statistical parity', transform_feature=None)[source]#

Classifier which reweighs instances during training to remove bias against a subgroup.

Read more in the User Guide.

Parameters:
estimatorEstimator instance

Base estimator which is used for fitting and predicting. Base estimator must accept sample_weight as an argument in its fit method.

strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’

Determines how the sample weights are computed. Sample weights are passed to the estimator’s fit method.

  • 'statistical parity' or 'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.

  • Callable: function which computes the sample weights based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the sample weights. Sample weights are a numpy array where each represents the weight given to that respective instance. Sample weights should be normalized to fall between 0 and 1.

transform_featureOptional[Callable], default=None

Function which transforms sensitive feature before computing sample weights.

References

[1]

Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159

Examples

  1. Using the BiasReweighingClassifier with a logistic regression model:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasReweighingClassifier

X, y = make_classification()
high_clv = np.random.randint(0, 2, size=X.shape[0])

model = BiasReweighingClassifier(estimator=LogisticRegression())
model.fit(X, y, sensitive_feature=high_clv)
  1. Converting a continuous attribute to a binary attribute:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasReweighingClassifier

X, y = make_classification()
clv = np.random.rand(X.shape[0]) * 100

model = BiasReweighingClassifier(
    estimator=LogisticRegression(),
    transform_feature=lambda clv: (clv > np.quantile(clv, 0.8)).astype(int)
)
model.fit(X, y, sensitive_feature=clv)
  1. Using a custom strategy function:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasReweighingClassifier

X, y = make_classification()
high_clv = np.random.randint(0, 2, size=X.shape[0])

# Simple strategy to double the weight for the sensitive feature
def strategy(y_true, sensitive_feature):
    sample_weights = np.ones(len(sensitive_feature))
    sample_weights[np.where(sensitive_feature == 0)] = 0.5
    return sample_weights

model = BiasReweighingClassifier(
    estimator=LogisticRegression(),
    strategy=strategy
)
model.fit(X, y, sensitive_feature=high_clv)
  1. Passing the sensitive feature in a cross-validation grid search:

import numpy as np
from sklearn import config_context
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from empulse.models import BiasReweighingClassifier

with config_context(enable_metadata_routing=True):
    X, y = make_classification()
    high_clv = np.random.randint(0, 2, size=X.shape[0])

    param_grid = {'model__estimator__C': [0.1, 1, 10]}
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('model', BiasReweighingClassifier(LogisticRegression()).set_fit_request(sensitive_feature=True))
    ])
    search = GridSearchCV(pipeline, param_grid)
    search.fit(X, y, sensitive_feature=high_clv)
fit(X, y, *, sensitive_feature=None, **fit_params)[source]#

Fit the estimator and reweigh the instances according to the strategy.

Parameters:
X2D array-like, shape=(n_samples, n_features)
y1D array-like, shape=(n_samples,)
sensitive_feature1D array-like, shape=(n_samples,), default = None

Sensitive attribute used to determine the sample weights.

fit_paramsdict

Additional parameters passed to the estimator’s fit method.

Returns:
selfBiasReweighingClassifier
get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class labels for X.

Parameters:
X2D array-like, shape=(n_samples, n_dim)

Features to predict.

Returns:
y_pred1D numpy.ndarray, shape=(n_samples,)

Predicted class labels.

predict_proba(X)[source]#

Predict class probabilities for X.

Parameters:
X2D array-like, shape=(n_samples, n_dim)

Features to predict.

Returns:
y_pred2D numpy.ndarray, shape=(n_samples, n_classes)

Predicted class probabilities.

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_fit_request(*, sensitive_feature='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sensitive_feature parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.