Confusion Matrix & Cyber Crime
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.
Let’s understand TP, FP, FN, TN
◼True Positive: Interpretation: You predicted positive and it’s true.
◼True Negative: Interpretation: You predicted negative and it’s true.
◼False Positive: (Type 1 Error): Interpretation: You predicted positive and it’s false..
◼ False Negative: (Type 2 Error): Interpretation: You predicted negative and it’s false.
Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.
What can we learn from this..?
A valid question arises that what we can do with this matrix. There are some important terminologies based on this:
Precision :
It tells us how many of the correctly predicted cases actually turned out to be positive.
Recall :
It tells us about the actual positive cases we were able to predict correctly with our model.
F-1 Score :
It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time.
Accuracy :
It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this.
Confusion Matrix’s implementation in monitoring Cyber Attacks:
The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad connections, called intrusions or attacks, and good normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test (CPT) computed using the confusion matrix and a given cost matrix.
• True Positive (TP): The amount of attack detected when it is actually attack.
• True Negative (TN): The amount of normal detected when it is actually normal.
• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
• False Negative (FN): The amount of normal detected when it is actually attack.
A Confusion matrix is a tabular summary of a number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrices like accuracy, precision, recall and F1-score.