2.4. Robust Cost-Sensitive Classification (RobustCS)#
Instance-dependent cost-sensitive (IDCS) learning methods have proven
useful for binary classification tasks where individual instances are associated
with variable misclassification costs.
However, IDCS methods are sensitive to noise and outliers in relation to instance-dependent misclassification
costs and their performance strongly depends on the cost distribution of the data sample.
The robust cost-sensitive classifier (RobustCSClassifier
) makes IDCS methods more robust by
applying a three-step framework:
Outlier detection: Outliers are detected by training a
HuberRegressor
on the instance-dependent costs.Outlier correction: Outlier costs are corrected using the predictions of the Huber regressor.
Robust cost-sensitive classification: The corrected costs are used to train a cost-sensitive classifier.
For an in-depth explanation of the robust cost-sensitive framework, please refer to the paper [1].
2.4.1. Usage#
To make any cost-sensitive classifier robust, you can use the RobustCSClassifier
class.
Simply pass the cost-sensitive classifier you want to make robust as the estimator
parameter.
from empulse.models import RobustCSClassifier
from empulse.models import CSLogitClassifier
robust_cslogit = RobustCSClassifier(estimator=CSLogitClassifier())
By default, the robust cost-sensitive classifier uses the HuberRegressor
with default parameters.
You can customize the outlier detection step by passing a custom outlier detector to the outlier_estimator
parameter.
from sklearn.linear_model import HuberRegressor
robust_cslogit = RobustCSClassifier(
CSLogitClassifier(),
outlier_estimator=HuberRegressor(max_iter=50)
)
RobustCSClassifier
considers a cost an outlier if it the predicted value of the Huber regressor
is larger than 2.5 times the standardized residuals.
You can change this threshold by setting the outlier_threshold
parameter.
robust_cslogit = RobustCSClassifier(CSLogitClassifier(), outlier_threshold=3)
By default, all instance-dependent costs are corrected (class-dependent costs are ignored).
If you wish to only correct particular costs, you can change the detect_outliers_for
parameter.
For instance, to only correct false positive costs, you can set detect_outliers_for='fp_cost'
.
robust_cslogit = RobustCSClassifier(CSLogitClassifier(), detect_outliers_for='fp_cost')
Or if if you want to correct multiple costs, you can pass a list of cost names.
robust_cslogit = RobustCSClassifier(
CSLogitClassifier(),
detect_outliers_for=['fp_cost', 'fn_cost']
)
To fit the robust cost-sensitive classifier,
you can use the fit
method with the instance-dependent costs as you would with any other cost-sensitive model.
import numpy as np
from sklearn.datasets import make_classification
X, y = make_classification()
fp_cost = np.random.rand(X.shape[0]) # instance-dependent costs
robust_cslogit = RobustCSClassifier(CSLogitClassifier())
robust_cslogit.fit(X, y, fp_cost=fp_cost)
After fitting you can inspect the corrected costs using the costs_
attribute.
print(robust_cslogit.costs_)