BayesMinimumRiskClassifier

class costcla.models.BayesMinimumRiskClassifier(calibration=True)[source]

A example-dependent cost-sensitive binary Bayes minimum risk classifier.

Parameters:

calibration : bool, optional (default=True)

Whenever or not to calibrate the probabilities.

References

[R1]A. Correa Bahnsen, A. Stojanovic, D.Aouada, B, Ottersten, “Improving Credit Card Fraud Detection with Calibrated Probabilities”, in Proceedings of the fourteenth SIAM International Conference on Data Mining, 677-685, 2014.

Examples

>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.cross_validation import train_test_split
>>> from costcla.datasets import load_creditscoring1
>>> from costcla.models import BayesMinimumRiskClassifier
>>> from costcla.metrics import savings_score
>>> data = load_creditscoring1()
>>> sets = train_test_split(data.data, data.target, data.cost_mat, test_size=0.33, random_state=0)
>>> X_train, X_test, y_train, y_test, cost_mat_train, cost_mat_test = sets
>>> f = RandomForestClassifier(random_state=0).fit(X_train, y_train)
>>> y_prob_test = f.predict_proba(X_test)
>>> y_pred_test_rf = f.predict(X_test)
>>> f_bmr = BayesMinimumRiskClassifier()
>>> f_bmr.fit(y_test, y_prob_test)
>>> y_pred_test_bmr = f_bmr.predict(y_prob_test, cost_mat_test)
>>> # Savings using only RandomForest
>>> print(savings_score(y_test, y_pred_test_rf, cost_mat_test))
0.12454256594
>>> # Savings using RandomForest and Bayes Minimum Risk
>>> print(savings_score(y_test, y_pred_test_bmr, cost_mat_test))
0.413425845555

Methods

fit
fit_predict
get_params
predict
set_params
fit(y_true_cal=None, y_prob_cal=None)[source]

If calibration, then train the calibration of probabilities

Parameters:

y_true_cal : array-like of shape = [n_samples], optional default = None

True class to be used for calibrating the probabilities

y_prob_cal : array-like of shape = [n_samples, 2], optional default = None

Predicted probabilities to be used for calibrating the probabilities

Returns:

self : object

Returns self.

fit_predict(y_prob, cost_mat, y_true_cal=None, y_prob_cal=None)[source]

Calculate the prediction using the Bayes minimum risk classifier.

Parameters:

y_prob : array-like of shape = [n_samples, 2]

Predicted probabilities.

cost_mat : array-like of shape = [n_samples, 4]

Cost matrix of the classification problem Where the columns represents the costs of: false positives, false negatives, true positives and true negatives, for each example.

y_true_cal : array-like of shape = [n_samples], optional default = None

True class to be used for calibrating the probabilities

y_prob_cal : array-like of shape = [n_samples, 2], optional default = None

Predicted probabilities to be used for calibrating the probabilities

Returns:

y_pred : array-like of shape = [n_samples]

Predicted class

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(y_prob, cost_mat)[source]

Calculate the prediction using the Bayes minimum risk classifier.

Parameters:

y_prob : array-like of shape = [n_samples, 2]

Predicted probabilities.

cost_mat : array-like of shape = [n_samples, 4]

Cost matrix of the classification problem Where the columns represents the costs of: false positives, false negatives, true positives and true negatives, for each example.

Returns:

y_pred : array-like of shape = [n_samples]

Predicted class

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self