2.1. Cost-Sensitive Logistic Regression (CSLogit)#
CSLogit is a cost-sensitive logistic regression model that optimizes the
expected_cost_loss during training with elastic net regularization [1].
Due to being based on a logistic model, the model is interpretable and trains relatively quickly.
2.1.1. Regularization#
The strength of regularization can be controlled by the C parameter.
The l1_ratio parameter controls the ratio of L1 regularization to L2 regularization.
By default l1_ratio is set to 1, which means L1 regularization is used
and a sparse solution if found for the coefficients.
from empulse.models import CSLogitClassifier
cslogit = CSLogitClassifier(C=100, l1_ratio=0.2)
2.1.2. Cost Matrix#
The CSLogit allows constant class-dependent costs to be passed during instantiation.
cslogit = CSLogitClassifier(fp_cost=5, fn_cost=1, tp_cost=1, tn_cost=1)
Instance-dependent costs can be passed during training in the fit method.
import numpy as np
from sklearn.datasets import make_classification
X, y = make_classification()
fp_cost = np.random.rand(X.shape[0]) # instance-dependent costs
cslogit = CSLogitClassifier(fn_cost=1, tp_cost=1, tn_cost=1) # class-dependent costs
cslogit.fit(X, y, fp_cost=fp_cost)
Note that class-dependent costs can also still be passed during training. If costs are both passed during instantiation and training, the costs passed during training will be used.
2.1.3. Optimization#
CSLogit, by default, optimizes the average expected cost through L-BFGS-B optimization. However, CSLogit offers flexibility in terms of customization. You can use different loss functions and change the optimization algorithms.
2.1.3.1. Custom Loss Functions#
CSLogit allows the use of any cost-sensitive metric as the loss function.
To use a different metric, simply pass the metric to the CSLogitClassifier initializer.
from empulse.metrics import Metric, Savings, CostMatrix
savings_score = Metric(
cost_matrix=CostMatrix().add_fp_cost('fp').add_fn_cost('fn'),
strategy=Savings()
)
cslogit = CSLogitClassifier(loss=savings_score)
2.1.3.2. Custom Optimization Algorithms#
CSLogit also supports the use of other optimization algorithms.
If you can fit them in an optimize function, you can use them to optimize the loss function.
For instance, if you want to use the L-BFGS-B algorithm from scipy.optimize
with the coefficients being bounded between -5 and 5, you can do the following:
from scipy.optimize import minimize, OptimizeResult
def optimize(objective, X, max_iter=10000, **kwargs) -> OptimizeResult:
initial_guess = np.zeros(X.shape[1])
bounds = [(-5, 5)] * X.shape[1]
result = minimize(
objective,
initial_guess,
method='L-BFGS-B',
bounds=bounds,
options={
'maxiter': max_iter,
'ftol': 1e-4,
},
**kwargs
)
return result
cslogit = CSLogitClassifier(optimize_fn=optimize)
Any arguments passed to optimizer_params will be passed to the optimize_fn during training.
So in this case we can dynamically change the maximum number of iterations for the optimizer.
def optimize(objective, X, max_iter=10000, **kwargs) -> OptimizeResult:
initial_guess = np.zeros(X.shape[1])
bounds = [(-5, 5)] * X.shape[1]
result = minimize(
objective,
initial_guess,
method='L-BFGS-B',
bounds=bounds,
options={
'maxiter': max_iter,
'ftol': 1e-4,
},
**kwargs
)
return result
cslogit = CSLogitClassifier(optimize_fn=optimize, optimizer_params={'max_iter': 1000})