BiasRelabler#

class empulse.samplers.BiasRelabler(estimator, *, strategy='statistical parity', transform_feature=None)[source]#

Sampler which relabels instances to remove bias against a subgroup.

Read more in the User Guide.

Parameters:

estimatorEstimator instance

Base estimator which is used to determine the number of promotion and demotion pairs.

strategy{‘statistical parity’, ‘demographic parity’} or Callable, default=’statistical parity’

Determines how the group weights are computed. Group weights determine how many instances to relabel for each combination of target and sensitive_feature.

'statistical parity' or 'demographic parity': probability of positive predictions are equal between subgroups of sensitive feature.
Callable: function which computes the number of labels swaps based on the target and sensitive feature. Callable accepts two arguments: y_true and sensitive_feature and returns the number of pairs needed to be swapped.

transform_featureOptional[Callable[[numpy.ndarray], numpy.ndarray]], default=None

Function which transforms sensitive feature before resampling the training data. The function takes in the sensitive feature in the form of a numpy.ndarray and outputs the transformed sensitive feature as a numpy.ndarray. This can be useful if you want to transform a continuous variable to a binary variable at fit time.

Attributes:

estimator_Estimator instance: Fitted estimator.

References

[1]

Rahman, S., Janssens, B., & Bogaert, M. (2025). Profit-driven pre-processing in B2B customer churn modeling using fairness techniques. Journal of Business Research, 189, 115159. doi:10.1016/j.jbusres.2024.115159

Examples

import numpy as np
from empulse.samplers import BiasRelabler
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification()
high_clv = np.random.randint(0, 2, y.shape)

sampler = BiasRelabler(LogisticRegression())
sampler.fit_resample(X, y, sensitive_feature=high_clv)

Example with passing high-clv indicator through cross-validation:

import numpy as np
from empulse.samplers import BiasRelabler
from imblearn.pipeline import Pipeline
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

set_config(enable_metadata_routing=True)

X, y = make_classification()
high_clv = np.random.randint(0, 2, y.shape)

pipeline = Pipeline([
    ('sampler', BiasRelabler(
        LogisticRegression()
    ).set_fit_resample_request(sensitive_feature=True)),
    ('model', LogisticRegression())
])

cross_val_score(pipeline, X, y, params={'sensitive_feature': high_clv})

Example with passing clv through a grid search and dynamically determining high_clv customer based on training data:

import numpy as np
from empulse.samplers import BiasRelabler
from imblearn.pipeline import Pipeline
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

set_config(enable_metadata_routing=True)

X, y = make_classification()
clv = np.random.rand(y.size)

def to_high_clv(clv: np.ndarray) -> np.ndarray:
    return (clv > np.median(clv)).astype(np.int8)

pipeline = Pipeline([
    ('sampler', BiasRelabler(
        LogisticRegression(),
        transform_feature=to_high_clv
    ).set_fit_resample_request(sensitive_feature=True)),
    ('model', LogisticRegression())
])
param_grid = {'model__C': np.logspace(-5, 2, 10)}

grid_search = GridSearchCV(pipeline, param_grid=param_grid)
grid_search.fit(X, y, sensitive_feature=clv)

fit(X, y, **params)#

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns:

selfobject: Return the instance itself.

fit_resample(X, y, *, sensitive_feature=None)[source]#

Fit the estimator and relabel the data according to the strategy.

Parameters:

X2D array-like, shape=(n_samples, n_features)
y1D array-like, shape=(n_samples,)
sensitive_feature1D array-like, shape=(n_samples,): Sensitive feature used to determine the number of promotion and demotion pairs.

Returns:

X2D array-like, shape=(n_samples, n_features): Original training data.
ynp.ndarray: Relabeled target values.

get_feature_names_out(input_features=None)#

Get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects: Same as input features.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_resample_request(*, sensitive_feature='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit_resample method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit_resample if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit_resample.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sensitive_featurestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sensitive_feature parameter in fit_resample.

Returns:

selfobject: The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.