5.1. Bank Telemarketing Upsell Campaign#
5.1.1. Summary#
This dataset is related to a direct marketing campaigns (phone calls) of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be or not subscribed.
Features recorded before the contact event are removed from the original dataset [1] to avoid data leakage. Only clients with a positive balance are considered, since clients in debt are not eligible for term deposits.
Classes |
2 |
Subscribers |
4787 |
Non-subscribers |
33144 |
Samples |
37931 |
Features |
10 |
5.1.2. Using the Dataset#
The dataset can be loaded through the load_upsell_bank_telemarketing
function.
This returns a Dataset
object with the following attributes:
data
: the feature matrixtarget
: the target vectortp_cost
: the cost of a true positivefp_cost
: the cost of a false positivefn_cost
: the cost of a false negativetn_cost
: the cost of a true negativefeature_names
: the feature namestarget_names
: the target namesDESCR
: the full description of the dataset
from empulse.datasets import load_upsell_bank_telemarketing
dataset = load_upsell_bank_telemarketing()
Alternatively, the load function can also return the features, target, and costs separately,
by setting return_X_y_costs=True
.
Additionally, you can specify that you want the output in a pandas.DataFrame
format,
by setting as_frame=True
.
The following code snippet demonstrates how to load the dataset and fit a model using the
CSLogitClassifier
:
from empulse.datasets import load_upsell_bank_telemarketing
from empulse.models import CSLogitClassifier
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, TargetEncoder
X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_upsell_bank_telemarketing(
return_X_y_costs=True,
as_frame=True
)
pipeline = Pipeline([
('preprocessor', ColumnTransformer([
('num', StandardScaler(), X.select_dtypes(include=['number']).columns),
('cat', TargetEncoder(), X.select_dtypes(include=['category']).columns)
])),
('model', CSLogitClassifier())
])
pipeline.fit(
X,
y,
model__tp_cost=tp_cost,
model__fp_cost=fp_cost,
model__fn_cost=fn_cost,
model__tn_cost=tn_cost
)
5.1.3. Cost Matrix#
Actual positive \(y_i = 1\) |
Actual negative \(y_i = 0\) |
|
Predicted positive \(\hat{y}_i = 1\) |
|
|
Predicted negative \(\hat{y}_i = 0\) |
|
|
- with
\(c\) : cost of contacting the client
\(r\) : interest rate of the term deposit
\(d_i\) : fraction of the client’s balance that is deposited in the term deposit
\(b_i\) : client’s balance
Using default parameters, it is assumed that \(c = 1\), \(r = 0.02463333\), \(d_i = 0.25\) for all clients. The default parameters are based on [4].
These assumptions can be changed by passing your own values to the
load_upsell_bank_telemarketing
function:
from empulse.datasets import load_upsell_bank_telemarketing
X, y, tp_cost, fp_cost, fn_cost, tn_cost = load_upsell_bank_telemarketing(
return_X_y_costs=True,
interest_rate=0.05,
term_deposit_fraction=0.30,
contact_cost=10,
)
5.1.4. Data Description#
Variable Name |
Description |
Type |
---|---|---|
age |
Age of the client |
numeric |
balance |
Average yearly balance |
numeric |
previous |
Number of contacts performed before this campaign and for this client |
numeric |
job |
Type of job (e.g., ‘admin.’, ‘blue-collar’, ‘entrepreneur’, etc.) |
categorical |
marital |
Marital status (‘divorced’, ‘married’, ‘single’) |
categorical |
education |
Education level (‘primary’, ‘secondary’, ‘tertiary’, ‘unknown’) |
categorical |
has_credit_in_default |
Has credit in default? (‘yes’ = 1, ‘no’ = 0) |
binary |
has_housing_loan |
Has housing loan? (‘yes’ = 1, ‘no’ = 0) |
binary |
has_personal_loan |
Has personal loan? (‘yes’ = 1, ‘no’ = 0) |
binary |
previous_outcome |
Outcome of the previous marketing campaign (‘success’, ‘failure’, ‘other’, ‘unknown’) |
categorical |
subscribed |
Has the client subscribed a term deposit? (‘yes’ = 1, ‘no’ = 0) |
binary |