ThresholdingOptimization¶

class costcla.models.ThresholdingOptimization(calibration=True)[source]¶

Classifier based on finding the threshold that minimizes the total cost on a given set.

Parameters:

calibration : bool, optional (default=True)

Whenever or not to calibrate the probabilities.

References

V. Sheng, C. Ling, “Thresholding for making classifiers cost-sensitive”, in Proceedings of the National Conference on Artificial Intelligence, 2006.

Examples

>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.cross_validation import train_test_split
>>> from costcla.datasets import load_creditscoring1
>>> from costcla.models import ThresholdingOptimization
>>> from costcla.metrics import savings_score
>>> data = load_creditscoring1()
>>> sets = train_test_split(data.data, data.target, data.cost_mat, test_size=0.33, random_state=0)
>>> X_train, X_test, y_train, y_test, cost_mat_train, cost_mat_test = sets
>>> f = RandomForestClassifier(random_state=0).fit(X_train, y_train)
>>> y_prob_train = f.predict_proba(X_train)
>>> y_prob_test = f.predict_proba(X_test)
>>> y_pred_test_rf = f.predict(X_test)
>>> f_t = ThresholdingOptimization().fit(y_prob_train, cost_mat_train, y_train)
>>> y_pred_test_rf_t = f_t.predict(y_prob_test)
>>> # Savings using only RandomForest
>>> print(savings_score(y_test, y_pred_test_rf, cost_mat_test))
0.12454256594
>>> # Savings using RandomForest and ThresholdingOptimization
>>> print(savings_score(y_test, y_pred_test_rf_t, cost_mat_test))
0.401816361581

Attributes

threshold_

(float) Selected threshold.

Methods

`fit`
`predict`

fit(y_prob, cost_mat, y_true)[source]¶

Calculate the optimal threshold using the ThresholdingOptimization.

Parameters:

y_prob : array-like of shape = [n_samples, 2]

Predicted probabilities.

cost_mat : array-like of shape = [n_samples, 4]

Cost matrix of the classification problem Where the columns represents the costs of: false positives, false negatives, true positives and true negatives, for each example.

y_true : array-like of shape = [n_samples]

True class

Returns:

self

predict(y_prob)[source]¶

Calculate the prediction using the ThresholdingOptimization.

Parameters:

y_prob : array-like of shape = [n_samples, 2]

Predicted probabilities.

Returns:

y_pred : array-like of shape = [n_samples]

Predicted class