load_upsell_bank_telemarketing#

empulse.datasets.load_upsell_bank_telemarketing(*, as_frame=False, return_X_y_costs=False, interest_rate=0.02463333, term_deposit_fraction=0.25, contact_cost=1)[source]#

Load the bank telemarketing dataset (binary classification).

The goal is to predict whether a client will subscribe to a term deposit after being called by the bank. The target variable is whether the client subscribed to the term deposit, ‘yes’ = 1 and ‘no’ = 0.

The dataset is related to a direct marketing campaigns (phone calls) of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be or not subscribed.

Features recorded before the contact event are removed from the original dataset [1] to avoid data leakage. Only clients with a positive balance are considered, since clients in debt are not eligible for term deposits.

For a full data description and additional information about the dataset, consult the User Guide.

Classes

2

Subscribers

4787

Non-subscribers

33144

Samples

37931

Features

10

Parameters:
as_framebool, default=False

If True, the output will be a pandas DataFrames or Series instead of numpy arrays.

return_X_y_costsbool, default=False

If True, return (data, target, tp_cost, fp_cost, tn_cost, fn_cost) instead of a Dataset object.

interest_ratefloat, default=0.02463333

Interest rate of the term deposit.

term_deposit_fractionfloat, default=0.25

Fraction of the client’s balance that is deposited in the term deposit.

contact_costfloat, default=1

Cost of contacting the client.

Returns:
datasetDataset or tuple of (data, target, tp_cost, fp_cost, tn_cost, fn_cost)

Returns a Dataset object if return_X_y_costs=False (default), otherwise a tuple.

Notes

Cost matrix

Actual positive yi=1

Actual negative yi=0

Predicted positive y^i=1

tp_cost =c

fp_cost =c

Predicted negative y^i=0

fn_cost =rdibi

tn_cost =0

with
  • c : cost of contacting the client

  • r : interest rate of the term deposit

  • di : fraction of the client’s balance that is deposited in the term deposit

  • bi : client’s balance

Using default parameters, it is assumed that c=1, r=0.02463333, di=0.25 for all clients.

References

[1]

Moro, S., Rita, P., & Cortez, P. (2014). Bank Marketing [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

[2]

S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM’2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS. [bank.zip]

[3]

A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.

Examples

from empulse.datasets import load_upsell_bank_telemarketing
from sklearn.model_selection import train_test_split

dataset = load_upsell_bank_telemarketing()
X_train, X_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, random_state=42
)