ProfTreeClassifier#

class empulse.models.ProfTreeClassifier(*, tp_cost=0.0, tn_cost=0.0, fn_cost=0.0, fp_cost=0.0, loss=None, alpha=0.0, patience=100, tolerance=0.0001, max_depth=10, min_samples_split=20, min_samples_leaf=7, max_iter=1000, population_size=None, crossover_rate=0.2, grow_rate=0.2, prune_rate=0.2, mutate_split_rate=0.2, mutate_value_rate=0.2, n_jobs=1, random_state=None)[source]#

Profit-driven evolutionary decision tree classifier.

The ProfTree classifier is a decision tree classifier that is trained using a genetic algorithm. The genetic algorithm is used to evolve a population of trees over multiple generations. The fitness of each tree is evaluated using a fitness function, which is used to select the best trees for crossover and mutation.

Parameters:

tp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of true positives. If float, then all true positives have the same cost. If array-like, then it is the cost of each true positive classification. Is overwritten if another tp_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification. Is overwritten if another fp_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

fp_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false positives. If float, then all false positives have the same cost. If array-like, then it is the cost of each false positive classification. Is overwritten if another fp_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

fn_costfloat or array-like, shape=(n_samples,), default=0.0

Cost of false negatives. If float, then all false negatives have the same cost. If array-like, then it is the cost of each false negative classification. Is overwritten if another fn_cost is passed to the fit method.

Note

It is not recommended to pass instance-dependent costs to the __init__ method. Instead, pass them to the fit method.

lossMetric or None

Fitness function for the genetic algorithm to maximize. If None, the max_profit_score is used.

alphafloat, default=0.0

Complexity penalty for the fitness function. A way to control overfitting.

When alpha is 0.0, the fitness function is not penalized for the amount of nodes in the tree. When alpha is greater than 0.0, the fitness function is penalized for the amount of nodes in the tree.

patienceint, default=100

Number of iterations to wait for improvement before stopping early.

tolerancefloat, default=1e-4

Minimum relative improvement in fitness required to consider a solution better.

max_iterint, default=1000

Maximum number of iterations / number of generations the GA is run.

max_depthint or None, default=10

Maximum depth of the tree. Computation time scales exponentially with depth, be careful with higher values.

min_samples_splitint or float, default=20

The minimum number of samples required to split an internal node:

If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

min_samples_leafint or float, default=7

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

population_sizeint or None, default=None

Number of decision trees in the population. If None, population_size is set to 10 * n_features. Will be at least 2 trees.

crossover_ratefloat, default=0.2

Probability of crossover. Must be in [0, 1].