5.2. Churn in a TV Subscription Company#

5.2.1. Summary#

This is a private dataset provided by a TV cable provider [1]. The dataset consists of active customers during the first semester of 2014. The total dataset contains 9,410 individual registries, each one with 45 attributes, including a churn label indicating whenever a customer is a churner. This label was created internally in the company, and can be regarded as highly accurate. In the dataset only 455 customers are churners, leading to a churn ratio of 4.83 %.

The features names are anonymized to protect the privacy of the customers.

Classes	2
Churners	455
Non-churners	8955
Samples	9410
Features	45

5.2.2. Using the Dataset#

The dataset can be loaded through the load_churn_tv_subscriptions function. This returns a Dataset object with the following attributes:

data: the feature matrix
target: the target vector
tp_cost: the cost of a true positive
fp_cost: the cost of a false positive
fn_cost: the cost of a false negative
tn_cost: the cost of a true negative
feature_names: the feature names
target_names: the target names
DESCR: the full description of the dataset

from empulse.datasets import load_churn_tv_subscriptions

dataset = load_churn_tv_subscriptions()

Alternatively, the load function can also return the features, target, and costs separately, by setting return_X_y_costs=True. Additionally, you can specify that you want the output in a pandas.DataFrame format, by setting as_frame=True.

The following code snippet demonstrates how to load the dataset and fit a model using the CSLogitClassifier:

from empulse.datasets import load_churn_tv_subscriptions
from empulse.models import CSLogitClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_churn_tv_subscriptions(
    return_X_y_costs=True,
    as_frame=True
)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', CSLogitClassifier())
])
pipeline.fit(
    X,
    y,
    model__tp_cost=tp_cost,
    model__fp_cost=fp_cost,
    model__fn_cost=fn_cost,
    model__tn_cost=tn_cost
)

5.2.3. Cost Matrix#

	Actual positive \(y_i = 1\)	Actual negative \(y_i = 0\)
Predicted positive \(\hat{y}_i = 1\)	`tp_cost` \(= \gamma_i d_i + (1 - \gamma_i) (CLV_i + c_i)\)	`fp_cost` \(= d_i + c_i\)
Predicted negative \(\hat{y}_i = 0\)	`fn_cost` \(= CLV_i\)	`tn_cost` \(= 0\)

with

\(\gamma_i\) : probability of the customer accepting the retention offer
\(CLV_i\) : customer lifetime value of the retained customer
\(d_i\) : cost of incentive offered to the customer
\(c_i\) : cost of contacting the customer

5.2. Churn in a TV Subscription Company#

5.2.1. Summary#

5.2.2. Using the Dataset#

5.2.3. Cost Matrix#

5.2.4. References#