Confusion Matrix, Accuracy, Precision, Recall, F1 Score

https://medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-f1-score-ade299cf63cd

https://www.scikit-yb.org/en/latest/api/classifier/classification_report.html

A person who is actually pregnant (positive) and classified as pregnant (positive). This is called TRUE POSITIVE (*TP*).
A person who is actually not pregnant (negative) and classified as not pregnant (negative). This is called TRUE NEGATIVE (*TN*).
A person who is actually not pregnant (negative) and classified as pregnant (positive). This is called FALSE POSITIVE (*FP*).
A person who is actually pregnant (positive) and classified as not pregnant (negative). This is called FALSE NEGATIVE (*FN*).

accuracy

precision

所有报的positive中，真的是positive的占比

recall

所有真的positive中，被报positive的占比

F1-score

F1 score is the harmonic mean of precision and recall and is a better measure than accuracy.
F1 Score becomes 1 only when precision and recall are both 1. F1 score becomes high only when both precision and recall are high.

support

Support is the number of actual occurrences of the class in the specified dataset.

Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing.

Support doesn’t change between models but instead diagnoses the evaluation process.

`sklearn.metrics.classification_report`举例

from sklearn.neighbors import KNeighborsClassifier

# load & split
dataset = load_iris()
(trainX, testX, trainY, testY) = train_test_split(
    dataset.data, dataset.target, random_state=3, test_size=0.25
)

# train: fit
model = KNeighborsClassifier(n_neighbors=1)
model.fit(trainX, trainY)

# predict & evaluate
predictions = model.predict(testX)
print(classification_report(testY, predictions, target_names=dataset.target_names))

输出：

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.92      0.92      0.92        12
   virginica       0.91      0.91      0.91        11

    accuracy                           0.95        38
   macro avg       0.94      0.94      0.94        38
weighted avg       0.95      0.95      0.95        38

setosa、versicolor和virginica是这个数据集的三类