CSLogitClassifier#

class empulse.models.CSLogitClassifier(*, tp_cost=0.0, tn_cost=0.0, fn_cost=0.0, fp_cost=0.0, loss=None, C=1.0, fit_intercept=True, soft_threshold=False, l1_ratio=1.0, optimize_fn=None, optimizer_params=None)[source]#

Cost-sensitive logistic regression classifier.

Read more in the User Guide.

See also

CSBoostClassifier : Cost-sensitive gradient boosting classifier.

CSTreeClassifier : Cost-sensitive decision tree classifier.

CSForestClassifier : Cost-sensitive random forest classifier.

Parameters:
tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification. Is overwritten if another tp_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification. Is overwritten if another fp_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

tn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification. Is overwritten if another tn_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification. Is overwritten if another fn_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

lossempulse.metrics.Metric, default=None

Loss function which should be optimized.

  • If Metric, metric parameters are passed as loss_params to the fit method.

Cfloat, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

soft_thresholdbool, default=False

If True, apply soft-thresholding to the regression coefficients.

l1_ratiofloat, default=1.0

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1.

  • For l1_ratio = 0 the penalty is a L2 penalty.

  • For l1_ratio = 1 it is a L1 penalty.

  • For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

optimize_fnCallable, optional

Optimization algorithm. Should be a Callable with signature optimize(objective, X). See Profit-Driven Logistic Regression (ProfLogit) for more information.

optimizer_paramsdict[str, Any], optional

Additional keyword arguments passed to optimize_fn.

Attributes:
classes_numpy.ndarray

Unique classes in the target found during fit.

result_scipy.optimize.OptimizeResult

Optimization result.

coef_numpy.ndarray, shape=(n_features,)

Coefficients of the logit model.

intercept_float

Intercept of the logit model. Only available when fit_intercept=True.

References

[1]

Höppner, S., Baesens, B., Verbeke, W., & Verdonck, T. (2022). Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300.

Examples

import numpy as np
from empulse.models import CSLogitClassifier
from sklearn.datasets import make_classification

X, y = make_classification()
fn_cost = np.random.rand(y.size)  # instance-dependent cost
fp_cost = 5  # constant cost

model = CSLogitClassifier(C=0.1)
model.fit(X, y, fn_cost=fn_cost, fp_cost=fp_cost)
y_proba = model.predict_proba(X)

Example with passing instance-dependent costs through cross-validation:

import numpy as np
from empulse.models import CSLogitClassifier
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

set_config(enable_metadata_routing=True)

X, y = make_classification()
fn_cost = np.random.rand(y.size)
fp_cost = 5

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', CSLogitClassifier(C=0.1).set_fit_request(fn_cost=True, fp_cost=True)),
])

cross_val_score(pipeline, X, y, params={'fn_cost': fn_cost, 'fp_cost': fp_cost})

Example with passing instance-dependent costs through a grid search:

import numpy as np
from empulse.metrics import expected_cost_loss
from empulse.models import CSLogitClassifier
from sklearn import set_config
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

set_config(enable_metadata_routing=True)

X, y = make_classification(n_samples=50)
fn_cost = np.random.rand(y.size)
fp_cost = 5

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', CSLogitClassifier().set_fit_request(fn_cost=True, fp_cost=True)),
])
param_grid = {'model__C': np.logspace(-5, 2, 5)}
scorer = make_scorer(
    expected_cost_loss,
    response_method='predict_proba',
    greater_is_better=False,
    normalize=True,
)
scorer = scorer.set_score_request(fn_cost=True, fp_cost=True)

grid_search = GridSearchCV(pipeline, param_grid=param_grid, scoring=scorer)
grid_search.fit(X, y, fn_cost=fn_cost, fp_cost=fp_cost)
fit(X, y, *, tp_cost=Parameter.UNCHANGED, fp_cost=Parameter.UNCHANGED, tn_cost=Parameter.UNCHANGED, fn_cost=Parameter.UNCHANGED, **loss_params)#

Fit the model according to the given training data.

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data.

yarray-like of shape (n_samples,)

Target values.

tp_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification.

fp_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification.

tn_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$

Cost of true negatives. If float, then all true negatives have the same cost. If array-like, then it is the cost of each true negative classification.

fn_costfloat or array-like, shape=(n_samples,), default=$UNCHANGED$

Cost of false negatives. If float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification.

loss_paramsAny

Additional parameter to be passed to the loss function.

Returns:
self

Fitted estimator.

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)#

Predict class labels for samples in X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Features.

Returns:
y_predndarray of shape (n_samples,)

Predicted labels for each sample.

predict_proba(X)#

Compute predicted probabilities.

Parameters:
X2D array-like, shape=(n_samples, n_features)

Features.

Returns:
y_pred2D numpy.ndarray, shape=(n_samples, 2)

Predicted probabilities.

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_fit_request(*, fn_cost='$UNCHANGED$', fp_cost='$UNCHANGED$', tn_cost='$UNCHANGED$', tp_cost='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
fn_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for fn_cost parameter in fit.

fp_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for fp_cost parameter in fit.

tn_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tn_cost parameter in fit.

tp_coststr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tp_cost parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.