load_churn_tv_subscriptions#

empulse.datasets.load_churn_tv_subscriptions(*, as_frame=False, return_X_y_costs=False)[source]#

Load the TV Subscription Churn dataset (binary classification).

The goal is to predict whether a customer will churn or not. The target variable is whether the customer churned, ‘yes’ = 1 and ‘no’ = 0.

This dataset is from a TV cable provider containing all 9410 customers active during the first semester of 2014. Features names are anonymized to protect the privacy of the customers.

For additional information about the dataset, consult the User Guide.

Classes	2
Churners	455
Non-churners	8955
Samples	9410
Features	45

Parameters:

as_framebool, default=False: If True, the output will be a pandas DataFrames or Series instead of numpy arrays.
return_X_y_costsbool, default=False: If True, return (data, target, tp_cost, fp_cost, tn_cost, fn_cost) instead of a Dataset object.

Returns:

datasetDataset or tuple of (data, target, tp_cost, fp_cost, tn_cost, fn_cost): Returns a Dataset object if return_X_y_costs=False (default), otherwise a tuple.

Notes

Cost matrix

	Actual positive \(y_i = 1\)	Actual negative \(y_i = 0\)
Predicted positive \(\hat{y}_i = 1\)	`tp_cost` \(= \gamma_i d_i + (1 - \gamma_i) (CLV_i + c_i)\)	`fp_cost` \(= d_i + c_i\)
Predicted negative \(\hat{y}_i = 0\)	`fn_cost` \(= CLV_i\)	`tn_cost` \(= 0\)

with

\(\gamma_i\) : probability of the customer accepting the retention offer
\(CLV_i\) : customer lifetime value of the retained customer
\(d_i\) : cost of incentive offered to the customer
\(c_i\) : cost of contacting the customer

References

[1]

A. Correa Bahnsen, D.Aouada, B, Ottersten, “A novel cost-sensitive framework for customer churn predictive modeling”, Decision Analytics, 2:5, 2015.

Examples

from empulse.datasets import load_churn_tv_subscriptions
from sklearn.model_selection import train_test_split

dataset = load_churn_tv_subscriptions()
X_train, X_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, random_state=42
)