5.1. Bank Telemarketing Upsell Campaign#

5.1.1. Summary#

This dataset is related to a direct marketing campaigns (phone calls) of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be or not subscribed.

Features recorded before the contact event are removed from the original dataset [1] to avoid data leakage. Only clients with a positive balance are considered, since clients in debt are not eligible for term deposits.

Classes

2

Subscribers

4787

Non-subscribers

33144

Samples

37931

Features

10

Other relevant information can be found in [2] and [3].

5.1.2. Using the Dataset#

The dataset can be loaded through the load_upsell_bank_telemarketing function. This returns a Dataset object with the following attributes:

  • data: the feature matrix

  • target: the target vector

  • tp_cost: the cost of a true positive

  • fp_cost: the cost of a false positive

  • fn_cost: the cost of a false negative

  • tn_cost: the cost of a true negative

  • feature_names: the feature names

  • target_names: the target names

  • DESCR: the full description of the dataset

from empulse.datasets import load_upsell_bank_telemarketing

dataset = load_upsell_bank_telemarketing()

Alternatively, the load function can also return the features, target, and costs separately, by setting return_X_y_costs=True. Additionally, you can specify that you want the output in a pandas.DataFrame format, by setting as_frame=True.

The following code snippet demonstrates how to load the dataset and fit a model using the CSLogitClassifier:

from empulse.datasets import load_upsell_bank_telemarketing
from empulse.models import CSLogitClassifier
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, TargetEncoder

X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_upsell_bank_telemarketing(
    return_X_y_costs=True,
    as_frame=True
)
pipeline = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', StandardScaler(), X.select_dtypes(include=['number']).columns),
        ('cat', TargetEncoder(), X.select_dtypes(include=['category']).columns)
    ])),
    ('model', CSLogitClassifier())
])
pipeline.fit(
    X,
    y,
    model__tp_cost=tp_cost,
    model__fp_cost=fp_cost,
    model__fn_cost=fn_cost,
    model__tn_cost=tn_cost
)

5.1.3. Cost Matrix#

Actual positive \(y_i = 1\)

Actual negative \(y_i = 0\)

Predicted positive \(\hat{y}_i = 1\)

tp_cost \(= c\)

fp_cost \(= c\)

Predicted negative \(\hat{y}_i = 0\)

fn_cost \(= r \cdot d_i \cdot b_i\)

tn_cost \(= 0\)

with
  • \(c\) : cost of contacting the client

  • \(r\) : interest rate of the term deposit

  • \(d_i\) : fraction of the client’s balance that is deposited in the term deposit

  • \(b_i\) : client’s balance

Using default parameters, it is assumed that \(c = 1\), \(r = 0.02463333\), \(d_i = 0.25\) for all clients. The default parameters are based on [4].

These assumptions can be changed by passing your own values to the load_upsell_bank_telemarketing function:

from empulse.datasets import load_upsell_bank_telemarketing

X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_upsell_bank_telemarketing(
    return_X_y_costs=True,
    interest_rate=0.05,
    term_deposit_fraction=0.30,
    contact_cost=10,
)

5.1.4. Data Description#

Variable Name

Description

Type

age

Age of the client

numeric

balance

Average yearly balance

numeric

previous

Number of contacts performed before this campaign and for this client

numeric

job

Type of job (e.g., ‘admin.’, ‘blue-collar’, ‘entrepreneur’, etc.)

categorical

marital

Marital status (‘divorced’, ‘married’, ‘single’)

categorical

education

Education level (‘primary’, ‘secondary’, ‘tertiary’, ‘unknown’)

categorical

has_credit_in_default

Has credit in default? (‘yes’ = 1, ‘no’ = 0)

binary

has_housing_loan

Has housing loan? (‘yes’ = 1, ‘no’ = 0)

binary

has_personal_loan

Has personal loan? (‘yes’ = 1, ‘no’ = 0)

binary

previous_outcome

Outcome of the previous marketing campaign (‘success’, ‘failure’, ‘other’, ‘unknown’)

categorical

subscribed

Has the client subscribed a term deposit? (‘yes’ = 1, ‘no’ = 0)

binary

5.1.5. References#