How Cyber Crime uses Confusion Matrix?

Samriddhi Mishra
6 min readJun 10, 2021

What is Cyber Crime?

Cybercrime is any criminal activity that involves a computer, networked device, or network. While most cybercrimes are carried out in order to generate profit for the cybercriminals, some cybercrimes are carried out against computers or devices directly to damage or disable them, while others use computers or networks to spread malware, illegal information, images, or other materials.

The necessity of internet connectivity has enabled an increase in the volume and pace of cybercrime activities because the criminal no longer needs to be physically present when committing a crime. The internet’s speed, convenience, anonymity, and lack of borders make computer-based variations of financial crimes — such as ransomware, fraud, and money laundering, as well as crimes such as stalking and bullying — easier to carry out.

Cybercriminal activity may be carried out by individuals or small groups with relatively little technical skill. Or, by highly organized global criminal groups that may include skilled developers and others with relevant expertise.

Types of cybercrime

  1. Cyberextortion: A crime involving an attack or threat of an attack coupled with a demand for money to stop the attack. One form of cyberextortion is the ransomware attack.
  2. Identity theft: An attack that occurs when an individual accesses a computer to glean a user’s personal information, which they then use to steal that person’s identity or access their valuable accounts, such as banking and credit cards.
  3. Cyberespionage: A crime involving a cybercriminal who hacks into systems or networks to gain access to confidential information held by a government or other organization. Attacks may be motivated by profit or by ideology.
  4. Cryptojacking: An attack that uses scripts to mine cryptocurrencies within browsers without the user’s consent. Cryptojacking attacks may involve loading cryptocurrency mining software to the victim’s system.

What is Cybersecurity?

Cybersecurity refers to the protection of computers or other similar devices from the theft of information, damage of software or hardware, and other intellectual properties. Cybersecurity is important and holds relevance as all the sections of the society such as Governments, Corporates, the military, various financial institutions, etc. are driven by data.

What are Artificial Intelligence and Machine Learning?

Machine learning and artificial intelligence are data-driven approaches to make decisions with no explicit programming involved. With the help of artificial intelligence, processes are automated, thus making the business activity free from any human intervention and bias.

Cybersecurity involves a lot of data points that can make use of artificial intelligence, as AI is all about data clustering, classification, processing, filtering, and management.

What is a Confusion Matrix?

The confusion matrix was invented in 1904 by Karl Pearson. A confusion matrix is a performance measurement technique for Machine learning classification problems. It’s a simple table that helps us to know the performance of the classification model on test data for the true values are known.

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score.

Need for Confusion Matrix in Machine learning

  • It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
  • It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
  • With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.

The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. The matrix itself can be easily understood and implemented to test an ML model.

Confusion matrices have two types of errors: Type I and Type II

  1. Type I Error:

Type I error refers to the False Positive error (FP). To better understand this, let’s observe carefully. This is a type of error is not as much as dangerous as the Type II error but can be pretty troublesome. A false positive error occurs when the model predicted a negative value, and it is actually positive.

In our case, A false positive state is when the IDS identifies an activity as an attack but the activity is acceptable behavior. A false positive is a false alarm. This type of error always leads to FATAL cases of data breaches, malware attacks, and many other types of cyberattacks.

2. Type II Error:

The Type II error refers to the False Negative error (FN). Out of the two types of errors, this type of error is the most dangerous to have.

A false negative state is the most serious and dangerous state. This is when the IDS identifies an activity as acceptable when the activity is actually an attack. That is, a false negative is when the IDS fails to catch an attack. This is the most dangerous state since the security professional has no idea that an attack took place. This type of error always leads to FATAL cases of data breaches, malware attacks, and many other types of cyberattacks.

Confusion Matrix’s implementation in monitoring Cyber Attacks

The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between “bad’’ connections, called intrusions or attacks, and “good’’ normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network
environment.
In KDD99 dataset these four attack classes (DoS, U2R,R2L, and probe) are divided into 22 different attack classes that tabulated below:

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test (CPT) computed using the confusion matrix and a given cost matrix.

• True Positive (TP): The amount of attack detected when it is actually attacked.

• True Negative (TN): The amount of normal detected when it is actually normal.

• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).

• False Negative (FN): The amount of normal detected when it is actually attacked.

--

--