Unsupervised Learning for Cybersecurity

08:29:2024

BY Bradley Hartlove

Even with dynamic dashboards like these, cyber operators still spend countless hours digging through information to search for events that may have slipped through the cracks.

Dashboards and automated alerts remain well-established fundamental components of nearly every cybersecurity team’s toolbelt. Peel back the layers of a network monitoring tool suite, and you’ll discover that every team is monitoring numerous visualizations supplied by their security information and event management system (SIEM) of choice and watching for automated alerts flagging events of interest on their network. Certainly, the capabilities provided by these tools play a crucial role in ensuring network hygiene and security. However, cyber operators still spend countless hours digging through this information alongside thousands of log events to search for events that may have slipped through the cracks. Simply put – dashboards and alerts cannot catch everything. Therefore, the job falls on seasoned operators to examine between the lines and pick up on interesting trends or deviations from the norm not reported by traditional tooling.

The Machine Learning Curve

Looking at innovations brought forth by the field of machine learning (ML) over recent years, a plethora of use cases exist involving anomaly detection – from detecting credit card fraud based on transaction history to possible malfunctions in IT equipment. Along those lines, anomaly detection has proved to be a valuable asset in finding anomalies in network traffic such as identifying an uncharacteristically large payload for a given machine or pointing out a spike of activity during off-hours for that specific network.

This blog dives into how SealingTech’s empowering customers with ML powered solutions to automate the detection of anomalous traffic within their unique network environments at scale.

Unsupervised Anomaly Detection

Within the field of machine learning, there are two primary categories of algorithms – supervised and unsupervised (while others exist, they’re outside the scope of this topic). With a supervised model, the dataset has known outputs, meaning the model can be trained to get closer to the correct answer over time (i.e., reducing the loss/error). However, in many cases such as detecting network anomalies, the outputs are unknown. Network defenders have a catalog of network logs, but no one has gone through and labeled every single Zeek connection as ‘benign’ or ‘anomalous’. This is where unsupervised learning comes into play. Instead of trying to train on a set of inputs and outputs, the model separates the inputs into different groups based on patterns.

Figure 1: Supervised vs Unsupervised Learning

While unsupervised models will naturally be less accurate due to the lack of labels to train the algorithm off of, because these models are not trained on any one specific network, they can be applied broad-spectrum and pick up on anomalies unique to the given environment. This lends itself well to the diverse set of networked environments customers are monitoring, allowing for flexibility in anomaly detection capabilities.

With the well-established background of anomaly detection via unsupervised learning, SealingTech has built a unique approach to leveraging these models on customer network data.

SealingTech’s Approach to Anomaly Detection

Our researched approach to anomaly detection differs from most and it’s been proven to enhance our customer’s capabilities and mission success. We’ve determined that an agreement between numerous unsupervised approaches provides more legitimacy to labeling, which in turn helps mitigate the limitations of unsupervised models.
A label in and of itself is also not as useful to the end customer – explainability is key.

The first tenant stems from the idea that unsupervised anomaly detection is, by nature, less accurate than a labeled approach. However, if five of six models report that a log record appears anomalous, it’s likely more interesting than a record flagged by one of six models. Through this, SealingTech’s pipeline is able to provide a ‘health-score’ for each record based on the consensus of multiple models.

Secondly, a label by itself does not give the operator much to work from. Therefore, our team invested resources into researching and integrating model explainability algorithms into the anomaly detection pipeline. Instead of only providing a ‘benign’ or ‘malicious’ label, the workflow is able to further explain what features contributed to the label — such as an extraordinarily high packet size that sat well outside three standard deviations of the average packet size for the network.

After executing all the models simultaneously across a distributed environment, the results are aggregated, passed through the model explainability flows, and pushed into a visualization platform to provide an easy-to-navigate interface to the anomaly detection results.

Due to the modularity of the execution environment, models can easily be added or removed to improve detection capabilities based on user feedback, with the goal of long-term label propagation to assist in tweaking the models over time.

Dashboards and alerts are here to stay as a vital part of the network defender toolbox, while the cybersecurity landscape continues to shift and evolve in complexity. SealingTech strives to provide innovative, quality tools to our customers to enable mission success. Through the use of emerging machine learning technologies, our proven approach to anomaly detection provides our customers with a new capability to shift the landscape in their favor.

Interested in learning more? Contact us today.

Technology Exchange

Harnessing AI for the Cyber Warfighter

Nate Delgado

04:04:2025

When deploying junior and senior cyber operators on a mission, experience levels amongst the team will vary. At times, junior operators may have questions and need to interact with unfamiliar…

Learn More

Technology Exchange

Perspectives & Post-Quantum Encryption: NATO Edge 24

Benjamin Young and Wade Saunders

01:06:2025

In December, SealingTech Account Managers, Wade Saunders and Benjamin Young, traveled to NATO Edge 24 in Tampa, Florida—an annual forum for industry experts and peers to address current and future…

Learn More

Technology Exchange

Disrupting Adversary Threats

Justin Hunsaker

12:12:2024

As a Principal Solutions Architect for SealingTech and proud 20-year US Army Veteran specializing in defensive cyberspace operations, I take the threat of near-peer adversaries seriously. Near-peer adversaries are predatory…

Learn More

Could your news use a jolt?

Find out what’s happening across the cyber landscape every month with The Lightning Report.

Be privy to the latest trends and evolutions, along with strategies to safeguard your government agency or enterprise from cyber threats. Subscribe now.

Unsupervised Learning for Cybersecurity

The Machine Learning Curve

Unsupervised Anomaly Detection

SealingTech’s Approach to Anomaly Detection

Related Articles

Harnessing AI for the Cyber Warfighter

Perspectives & Post-Quantum Encryption: NATO Edge 24

Disrupting Adversary Threats

Could your news use a jolt?