5.2. Churn in a TV Subscription Company#
5.2.1. Summary#
This is a private dataset provided by a TV cable provider [1]. The dataset consists of active customers during the first semester of 2014. The total dataset contains 9,410 individual registries, each one with 45 attributes, including a churn label indicating whenever a customer is a churner. This label was created internally in the company, and can be regarded as highly accurate. In the dataset only 455 customers are churners, leading to a churn ratio of 4.83 %.
The features names are anonymized to protect the privacy of the customers.
Classes |
2 |
Churners |
455 |
Non-churners |
8955 |
Samples |
9410 |
Features |
45 |
5.2.2. Using the Dataset#
The dataset can be loaded through the load_churn_tv_subscriptions
function.
This returns a Dataset
object with the following attributes:
data
: the feature matrixtarget
: the target vectortp_cost
: the cost of a true positivefp_cost
: the cost of a false positivefn_cost
: the cost of a false negativetn_cost
: the cost of a true negativefeature_names
: the feature namestarget_names
: the target namesDESCR
: the full description of the dataset
from empulse.datasets import load_churn_tv_subscriptions
dataset = load_churn_tv_subscriptions()
Alternatively, the load function can also return the features, target, and costs separately,
by setting return_X_y_costs=True
.
Additionally, you can specify that you want the output in a pandas.DataFrame
format,
by setting as_frame=True
.
The following code snippet demonstrates how to load the dataset and fit a model using the
CSLogitClassifier
:
from empulse.datasets import load_churn_tv_subscriptions
from empulse.models import CSLogitClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_churn_tv_subscriptions(
return_X_y_costs=True,
as_frame=True
)
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', CSLogitClassifier())
])
pipeline.fit(
X,
y,
model__tp_cost=tp_cost,
model__fp_cost=fp_cost,
model__fn_cost=fn_cost,
model__tn_cost=tn_cost
)
5.2.3. Cost Matrix#
Actual positive \(y_i = 1\) |
Actual negative \(y_i = 0\) |
|
Predicted positive \(\hat{y}_i = 1\) |
|
|
Predicted negative \(\hat{y}_i = 0\) |
|
|
- with
\(\gamma_i\) : probability of the customer accepting the retention offer
\(CLV_i\) : customer lifetime value of the retained customer
\(d_i\) : cost of incentive offered to the customer
\(c_i\) : cost of contacting the customer