CostSensitiveLogisticRegression¶

class costcla.models.CostSensitiveLogisticRegression(C=1.0, fit_intercept=True, max_iter=100, random_state=None, solver='ga', tol=0.0001, verbose=0)[source]¶

A example-dependent cost-sensitive Logistic Regression classifier.

Parameters:

C : float, optional (default=1.0)

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept : bool, default: True

Specifies if a constant (a.k.a. bias or intercept) should be added the decision function.

max_iter : int

Useful only for the ga and bfgs solvers. Maximum number of iterations taken for the solvers to converge.

random_state : int seed, RandomState instance, or None (default)

The seed of the pseudo random number generator to use when shuffling the data.

solver : {‘ga’, ‘bfgs’}

Algorithm to use in the optimization problem.

tol : float, optional

Tolerance for stopping criteria.

verbose : int, optional (default=0)

Controls the verbosity of the optimization process.

See also

sklearn.tree.DecisionTreeClassifier

References

[R4]	A. Correa Bahnsen, D.Aouada, B, Ottersten, “Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring”, in Proceedings of the International Conference on Machine Learning and Applications, , 2014.

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.cross_validation import train_test_split
>>> from costcla.datasets import load_creditscoring2
>>> from costcla.models import CostSensitiveLogisticRegression
>>> from costcla.metrics import savings_score
>>> data = load_creditscoring2()
>>> sets = train_test_split(data.data, data.target, data.cost_mat, test_size=0.33, random_state=0)
>>> X_train, X_test, y_train, y_test, cost_mat_train, cost_mat_test = sets
>>> y_pred_test_lr = LogisticRegression(random_state=0).fit(X_train, y_train).predict(X_test)
>>> f = CostSensitiveLogisticRegression()
>>> f.fit(X_train, y_train, cost_mat_train)
>>> y_pred_test_cslr = f.predict(X_test)
>>> # Savings using Logistic Regression
>>> print(savings_score(y_test, y_pred_test_lr, cost_mat_test))
0.00283419465107
>>> # Savings using Cost Sensitive Logistic Regression
>>> print(savings_score(y_test, y_pred_test_cslr, cost_mat_test))
0.142872237978

Attributes

coef_	(array, shape (n_classes, n_features)) Coefficient of the features in the decision function.
intercept_	(array, shape (n_classes,)) Intercept (a.k.a. bias) added to the decision function. If fit_intercept is set to False, the intercept is set to zero.

Methods

`fit`
`get_params`
`predict`
`predict_proba`
`set_params`

fit(X, y, cost_mat)[source]¶

Build a example-dependent cost-sensitive logistic regression from the training set (X, y, cost_mat)

Parameters:

X : array-like of shape = [n_samples, n_features]

The input samples.

y : array indicator matrix

Ground truth (correct) labels.

cost_mat : array-like of shape = [n_samples, 4]

Cost matrix of the classification problem Where the columns represents the costs of: false positives, false negatives, true positives and true negatives, for each example.

Returns:

self : object

Returns self.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:

deep: boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X, cut_point=0.5)[source]¶

Predicted class.

Parameters:

X : array-like, shape = [n_samples, n_features]

Returns:

T : array-like, shape = [n_samples]

Returns the prediction of the sample..

predict_proba(X)[source]¶

Probability estimates.

The returned estimates.

Parameters:

X : array-like, shape = [n_samples, n_features]

Returns:

T : array-like, shape = [n_samples, 2]

Returns the probability of the sample for each class in the model.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self