B2BoostClassifier#

class empulse.models.B2BoostClassifier(estimator=None, *, accept_rate=0.3, clv=200, incentive_fraction=0.05, contact_cost=15)[source]#

Cost-sensitive gradient boosting classifier for B2B customer churn.

B2BoostClassifier supports xgboost.XGBClassifier, lightgbm.LGBMClassifier and catboost.CatBoostClassifier. By default, it uses XGBoost classifier with default hyperparameters.

Read more in the User Guide.

Parameters:
estimatorxgboost.XGBClassifier, lightgbm.LGBMClassifier or catboost.CatBoostClassifier, optional

XGBoost or LightGBM classifier to be fit with desired hyperparameters. If not provided, a XGBoost classifier with default hyperparameters is used.

accept_ratefloat, default=0.3

Probability of a customer responding to the retention offer (0 < accept_rate < 1). Is overwritten if another accept_rate is passed to the fit method.

clvfloat or 1D array-like, shape=(n_samples), default=200

If float: constant customer lifetime value per retained customer (clv > incentive_cost). If array: individualized customer lifetime value of each customer when retained (mean(clv) > incentive_cost). Is overwritten if another clv is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

incentive_fractionfloat, default=0.05

Cost of incentive offered to a customer, as a fraction of customer lifetime value (0 < incentive_fraction < 1). Is overwritten if another incentive_fraction is passed to the fit method.

contact_costfloat, default=15

Constant cost of contact (contact_cost > 0). Is overwritten if another contact_cost is passed to the fit method.

Attributes:
classes_numpy.ndarray, shape=(n_classes,)

Unique classes in the target.

estimator_xgboost.XGBClassifier

Fitted XGBoost classifier.

Notes

The instance-specific cost function for customer churn is defined as [1]:

\[C(s_i) = y_i[s_i(f-\gamma (1-\delta )CLV_i] + (1-y_i)[s_i(\delta CLV_i + f)]\]

The measure requires that the churn class is encoded as 0, and it is NOT interchangeable. However, this implementation assumes the standard notation (‘churn’: 1, ‘no churn’: 0).

See also

create_objective_churn : Creates the instance-dependent cost function for customer churn.

References

[1]

Janssens, B., Bogaert, M., Bagué, A., & Van den Poel, D. (2022). B2Boost: Instance-dependent profit-driven modelling of B2B churn. Annals of Operations Research, 1-27.

Examples

import numpy as np
from empulse.models import B2BoostClassifier
from sklearn.datasets import make_classification

X, y = make_classification()
clv = np.random.rand(y.size) * 100

model = B2BoostClassifier()
model.fit(X, y, clv=clv, incentive_fraction=0.1)
import numpy as np
from empulse.models import B2BoostClassifier
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

set_config(enable_metadata_routing=True)

X, y = make_classification(n_samples=50)
clv = np.random.rand(y.size) * 100

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', B2BoostClassifier(contact_cost=10).set_fit_request(clv=True))
])

cross_val_score(pipeline, X, y, params={'clv': clv})
import numpy as np
from empulse.metrics import empb_score
from empulse.models import B2BoostClassifier
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier

set_config(enable_metadata_routing=True)

X, y = make_classification()
clv = np.random.rand(y.size) * 100
contact_cost = 10

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', B2BoostClassifier(
        XGBClassifier(n_jobs=2, n_estimators=10),
        contact_cost=contact_cost
    ).set_fit_request(clv=True))
])
param_grid = {
    'model__estimator__learning_rate': np.logspace(-5, 0, 5),
}
scorer = make_scorer(
    empb_score,
    response_method='predict_proba',
    contact_cost=contact_cost
)
scorer = scorer.set_score_request(clv=True)

grid_search = GridSearchCV(pipeline, param_grid=param_grid, scoring=scorer)
grid_search.fit(X, y, clv=clv)
fit(X, y, *, accept_rate=Parameter.UNCHANGED, clv=Parameter.UNCHANGED, incentive_fraction=Parameter.UNCHANGED, contact_cost=Parameter.UNCHANGED, fit_params=None, **loss_params)[source]#

Fit the model.

Parameters:
Xarray-like of shape (n_samples, n_features)
yarray-like of shape (n_samples,)
accept_ratefloat, default=0.3

Probability of a customer responding to the retention offer (0 < accept_rate < 1).

clvfloat or 1D array-like, shape=(n_samples), default=200

If float: constant customer lifetime value per retained customer (clv > incentive_cost). If array: individualized customer lifetime value of each customer when retained (mean(clv) > incentive_cost).

incentive_fractionfloat, default=0.05

Cost of incentive offered to a customer, as a fraction of customer lifetime value (0 < incentive_fraction < 1).

contact_costfloat, default=15

Constant cost of contact (contact_cost > 0).

fit_paramsdict, optional

Additional parameters to pass to the estimator’s fit method.

loss_paramsdict

Additional keyword arguments to pass to the loss function.

Returns:
selfB2BoostClassifier

Fitted B2Boost model.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)#

Predict class labels for samples in X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Features.

Returns:
y_predndarray of shape (n_samples,)

Predicted labels for each sample.

predict_proba(X)#

Predict class probabilities for X.

Parameters:
X2D numpy.ndarray, shape=(n_samples, n_features)
Returns:
y_pred2D numpy.ndarray, shape=(n_samples, n_classes)

Predicted class probabilities.

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_fit_request(*, accept_rate='$UNCHANGED$', clv='$UNCHANGED$', contact_cost='$UNCHANGED$', fit_params='$UNCHANGED$', incentive_fraction='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
accept_ratestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for accept_rate parameter in fit.

clvstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for clv parameter in fit.

contact_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for contact_cost parameter in fit.

fit_paramsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for fit_params parameter in fit.

incentive_fractionstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for incentive_fraction parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.