BiasResamplingClassifier#

class empulse.models.BiasResamplingClassifier(estimator, *, strategy='statistical parity', transform_feature=None)[source]#

Classifier which resamples instances during training to remove bias against a subgroup.

Read more in the User Guide.

Parameters:
estimatorEstimator instance

Base estimator which is used for fitting and predicting.

strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’

Determines how the group weights are computed. Group weights determine how much to over or undersample each combination of target and sensitive feature. For example, a weight of 2 for the pair (y_true == 1, sensitive_feature == 0) means that the resampled dataset should have twice as many instances with y_true == 1 and sensitive_feature == 0 compared to the original dataset.

  • 'statistical parity' or 'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.

  • Callable: function which computes the group weights based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the group weights. Group weights are a 2x2 matrix where the rows represent the target variable and the columns represent the sensitive feature. The element at position (i, j) is the weight for the pair (y_true == i, sensitive_feature == j).

transform_featureOptional[Callable], default=None

Function which transforms sensitive feature before resampling the training data.

Attributes:
classes_numpy.ndarray, shape=(n_classes,)

Unique classes in the target.

estimator_Estimator instance

Fitted base estimator.

References

[1]

Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159

Examples

  1. Using the BiasResamplingClassifier with a logistic regression model:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasResamplingClassifier

X, y = make_classification()
high_clv = np.random.randint(0, 2, size=X.shape[0])

model = BiasResamplingClassifier(estimator=LogisticRegression())
model.fit(X, y, sensitive_feature=high_clv)
  1. Converting a continuous attribute to a binary attribute:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasResamplingClassifier

X, y = make_classification()
clv = np.random.rand(X.shape[0]) * 100

model = BiasResamplingClassifier(
    estimator=LogisticRegression(),
    transform_feature=lambda clv: (clv > np.quantile(clv, 0.8)).astype(int)
)
model.fit(X, y, sensitive_feature=clv)
  1. Using a custom strategy function:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from empulse.models import BiasResamplingClassifier

X, y = make_classification()
high_clv = np.random.randint(0, 2, size=X.shape[0])

# Simple strategy to double the weight for the sensitive feature
def strategy(y_true, sensitive_feature):
    return np.array([
        [1, 2],
        [1, 2]
    ])

model = BiasResamplingClassifier(
    estimator=LogisticRegression(),
    strategy=strategy
)
model.fit(X, y, sensitive_feature=high_clv)
  1. Passing the sensitive feature in a cross-validation grid search:

import numpy as np
from sklearn import config_context
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from empulse.models import BiasResamplingClassifier

with config_context(enable_metadata_routing=True):
    X, y = make_classification()
    high_clv = np.random.randint(0, 2, size=X.shape[0])

    param_grid = {'model__estimator__C': [0.1, 1, 10]}
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('model', BiasResamplingClassifier(LogisticRegression()).set_fit_request(sensitive_feature=True))
    ])
    search = GridSearchCV(pipeline, param_grid)
    search.fit(X, y, sensitive_feature=high_clv)
fit(X, y, *, sensitive_feature=None, **fit_params)[source]#

Fit the estimator and resample the instances according to the strategy.

Parameters:
X2D array-like, shape=(n_samples, n_dim)

Training data.

y1D array-like, shape=(n_samples,)

Target values.

sensitive_feature1D array-like, shape=(n_samples,), default = None

Sensitive feature used to determine the group weights.

fit_paramsdict

Additional parameters passed to the estimator’s fit method.

Returns:
selfBiasResamplingClassifier
get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class labels for X.

Parameters:
X2D array-like, shape=(n_samples, n_dim)
Returns:
y_pred1D numpy.ndarray, shape=(n_samples,)

Predicted class labels.

predict_proba(X)[source]#

Predict class probabilities for X.

Parameters:
X2D array-like, shape=(n_samples, n_dim)
Returns:
y_pred2D numpy.ndarray, shape=(n_samples, n_classes)

Predicted class probabilities.

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_fit_request(*, sensitive_feature='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sensitive_feature parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.