load_churn_tv_subscriptions#
- empulse.datasets.load_churn_tv_subscriptions(*, as_frame=False, return_X_y_costs=False)[source]#
Load the TV Subscription Churn dataset (binary classification).
The goal is to predict whether a customer will churn or not. The target variable is whether the customer churned, ‘yes’ = 1 and ‘no’ = 0.
This dataset is from a TV cable provider containing all 9410 customers active during the first semester of 2014. Features names are anonymized to protect the privacy of the customers.
For additional information about the dataset, consult the User Guide.
Classes
2
Churners
455
Non-churners
8955
Samples
9410
Features
45
- Parameters:
- as_framebool, default=False
If True, the output will be a pandas DataFrames or Series instead of numpy arrays.
- return_X_y_costsbool, default=False
If True, return (data, target, tp_cost, fp_cost, tn_cost, fn_cost) instead of a Dataset object.
- Returns:
- dataset
Dataset
or tuple of (data, target, tp_cost, fp_cost, tn_cost, fn_cost) Returns a Dataset object if return_X_y_costs=False (default), otherwise a tuple.
- dataset
Notes
Cost matrix
Actual positive \(y_i = 1\)
Actual negative \(y_i = 0\)
Predicted positive \(\hat{y}_i = 1\)
tp_cost
\(= \gamma_i d_i + (1 - \gamma_i) (CLV_i + c_i)\)fp_cost
\(= d_i + c_i\)Predicted negative \(\hat{y}_i = 0\)
fn_cost
\(= CLV_i\)tn_cost
\(= 0\)- with
\(\gamma_i\) : probability of the customer accepting the retention offer
\(CLV_i\) : customer lifetime value of the retained customer
\(d_i\) : cost of incentive offered to the customer
\(c_i\) : cost of contacting the customer
References
[1]A. Correa Bahnsen, D.Aouada, B, Ottersten, “A novel cost-sensitive framework for customer churn predictive modeling”, Decision Analytics, 2:5, 2015.
Examples
from empulse.datasets import load_churn_tv_subscriptions from sklearn.model_selection import train_test_split dataset = load_churn_tv_subscriptions() X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, random_state=42 )