Challenges and Tradeoffs of Zero Trust Architecture in High Performance Computing

11:10:2025

BY Walker Haddock

Challenges exist when faced with implementing Zero Trust Architecture (ZTA) in High Performance Computing (HPC) enclaves. Although there are benefits to implementing the ZTA, we need to acknowledge what ZTA will cost and make decisions about the trade-offs between the mitigation of security risks gleaned from ZTA implementation versus the drawbacks including financial impacts and performance impact. In this blog, we’ll explore these trade-offs to gain a deeper understanding of the realities of ZTA in HPC environments; we’ll also ask the question: What level of risk is acceptable for high security systems?

A Brief History of Zero Trust Architecture

Implementing Zero Trust Architecture in HPC enclaves presents both benefits and challenges.

In 2020, NIST published SP 800-207, “The Zero Trust Architecture.” In 2022, M-22-09 required all agencies to plan and implement ZTA. Zero Trust is a modern security strategy based on the principle: never trust, always verify. Instead of assuming that everything behind a firewall is safe, the Zero Trust model assumes a “breach” has occurred and verifies each request as though it originated from an open network.

On January 26, 2022, the Executive Office released memorandum M-22-09, “Moving the US Government toward Zero Trust Cybersecurity Principles” (ZTA). Executive agencies issued their own policies in response to M-22-09. For example, the National Security Agency (NSA) released: “Embracing a Zero Trust Security Model.” The US Government has invested significantly in HPC systems that provide valuable services to the nation. The ZTA Executive Order applies to these systems and will have an impact moving forward.

The Department of Energy (DOE) Cybersecurity Plan (approved 4-30-2024) requires all departments to implement the Zero Trust Architecture (ZTA) per the Executive Office released memorandum M-22-09.

The US Army recently embarked on the “Army’s Unified Network Plan 2.0” which is a Zero Trust initiative. The goals are to:

  1. Operationalize the Unified Network to be truly data-centric
  2. Extend global network standards consistently into tactical theaters
  3. Reduce IT complexity at the tactical edge
  4. Centralize IT service delivery for efficiency
  5. Ensure the secure sharing of data across formations and with mission partners

While US Agencies are beginning to transition to ZTA, the organizations that require HPC systems to provide services for very sensitive purposes have done so by building systems in siloed enclaves, supporting these very specific programs. This is a very expensive solution to provide the protection required. For these systems, security has been provided by restricting access including boundary data flow. This approach also makes it difficult to share information that is essential in the missions of the government agencies. In contrast to the isolated approach, ZTA encourages a cloud model where users can connect from external systems. The penultimate guiding assumption in ZTA is that the network is already compromised, and applications must not implicitly trust users, systems, or other applications.

Secure Scientific Service Mesh

Let’s discuss ZTA in the context of next-generation HPC systems that will provide more automation and cloud like infrastructure composition interfaces. An example of such a system is given in a paper published by staff at Oak Ridge National Laboratory (ORNL)[1]. The Secure Scientific Service Mesh (S3M) provides a secure, API-driven infrastructure that can be interacted with by intelligent agents and experimental facilities.

Another excellent example can be found in the HPCIC Tutorial 2025: Flux at the 3:08 mark of the tutorial.[2] This paper describes a representative architecture for a user configurable, API-driven infrastructure that can support modern HPC workflows. It provides an API for users to submit infrastructure configurations, construct microservices applications and allocate compute, network, and storage resources for executing their jobs. The architecture provides:

  1. Slurm for HPC job scheduling
  2. Argo Workflows for orchestrating multi-step processes
  3. Globus Flows for moving data between connected systems
  4. Tapis Platform for fine grained authorization, data management and code execution capabilities
  5. OpenShift providing Kubernetes, storage and network provisioning

The S3M architecture demonstrates compliance with ZTA by providing an enterprise authentication service. The API grants or rejects requests based on the user’s cryptographic identity and authorizes access based on least privilege. The S3M provides strong isolation between different users and allocates resources based on the configuration requests submitted to the API. Users can monitor their jobs and orchestrate their workflows.

Zero Trust Architecture

Finding the right balance between performance and security is key for HPC environments implementing Zero Trust Architecture amid growing cyber threats.

HPC environments must consider their specific use cases and requirements and weigh that against the risk associated with their applications, users, and data. Balance can be achieved. If organizations choose to dive into ZTA with identifying that balance, the negative impacts will be palpable. However, administrators, end users, and other stakeholders such as those procuring new HPCs as the existing solutions can no longer meet the demands of the users.

That stated, there are near and present dangers that make ZTA an immediate imperative as codified in M-22-09. Some of these dangers include:

  1. Increased remote work force
  2. Increased usage of cloud-based technologies
  3. Multi-cloud and hybrid IT complexity
  4. Insider threat
  5. Ransomware on the rise
  6. Supply chain attacks
  7. Cyberattacks
  8. Sophisticated Advanced Persistent Threats (APTs)

NIST SP 800-207 defines Zero Trust as a collection of concepts and ideas designed to minimize uncertainty in enforcing accurate, least privilege per-request access decisions in information systems and services in the face of a network viewed as compromised. Zero Trust Architecture is an enterprise’s cybersecurity plan that utilizes Zero Trust concepts and encompasses component relationships, workflow planning, and access policies. Therefore, a zero trust enterprise is the network infrastructure (physical and virtual) and operational policies that are in place for an enterprise as a product of a zero trust architecture plan.

Zero Trust focuses on the data and applications using:

Micro segmentation: Dividing the system into smaller functional pieces, where each piece contains its own security policies and controls. For example, in HPC, this might be a set of compute nodes allocated to a specific tenant. 

Identity, authentication, authorization, and access management: Creating a strong identity and authentication system based on cryptography. Using this identity mechanism to enforce privilege and policies.

Integrity assurance: Implementing strong, cryptographic integrity over logs, automated actions, etc. to protect these artifacts. The integrity should be implemented in a way that is decentralized so that agents consuming them can remain untampered.

Automated continuous monitoring and defense: Integrating automated cyber defense mechanisms to monitor and control operations in a way to reduce the requirement for continuous human intervention.

In the article: “High Performance Computing Infrastructure and Zero Trust Architecture,” Tyson Macauley and Daksha Bhasker surveyed many professionals in the HPC infrastructure space to measure the effort required to implement ZTA in HPC [3]. They presented their findings using the maturity levels for each pillar in ZTA as given in the federal agency CISA’s “Zero Trust Maturity Model” V2.0 April 2023 [4]. The authors also provided an appendix which gave the detailed analysis for each control within each Pillar.

In the study by Macaulay and Bhasker, based on interviews with HPC owners and experts, they identify the following challenges regarding ZTA in HPC according to the security impact zone for HPC given in SP 800-223, High-Performance Computing (HPC) Security NIST Computer Security Resource Center (.gov)https://csrc.nist.gov › pubs › ipd:

In the Identity Pillar, an optimal level of implementation would likely be cost-prohibitive, highly complex, and restrict practical usability of the HPC assets.

With Devices in the Management and Compute and Storage zones, there is a potential for automation-based false positives to interfere with workload execution. Failure in workload execution or interruption due to highly automated security processes on devices would potentially impact workload completion times. This would restrict practical usability of the HPC assets.

The Network Pillar will require great effort and capital to provide the key management capabilities necessary to enable end-to-end encryption between all processes in a multi-computer HPC system. Staff, skills, procedures, and license costs for key management that can meet this requirement will be expensive.

Applications and Workloads will experience huge impacts on performance, usability, memory requirements, and latency due to continuous assessment and context awareness.

Data and Storage will have serious performance impacts for advanced and optimized ZTA implementation. Technology like homomorphic encryption or Trusted Execution Environments will likely add large amounts of latency.

Charting a Path Forward

Successfully implementing ZTA in HPC environments requires skilled staff, robust tools, and careful planning to balance security, performance, cost, and usability.

Zero Trust is not a new concept. It’s been in use in high security systems for decades to protect top level security secrets. ZTA does provide a great opportunity to shore up the existing NIST RMF processes where the SP 800-53 control catalog is still very relevant. The efforts of ZTA to task the entire enterprise with the mission of information security is of great value.

However, the executive order should provide sufficient staff who possess the skills required to implement ZTA so we can continue to impact the security of any information system asset. It must supply the resources needed to process security fixes and put them into operation quickly. It will need to furnish us with better tools and skills to monitor security events. It must provide better tools to check configurations; more help to develop and test configurations; automation tools that can provide strong and correct configuration management.

Great contributions like the Tri-labs Operating System Stack (TOSS) can be very effective in reducing risks in the Federal Government information systems. If we can create a commercial interest in making these types of products available for the government and public HPC system owners, it will be of great value.

There is another security compliance space that will be relevant in some use cases: The Cross Domain Solution (CDS) space takes an aggressive look at the concepts included in ZTA. The National Cross Domain Strategy and Management Office (NCDSMO) publishes a “Raise the Bar” document that provides design patterns for connecting and operating high security systems. HPC systems that will fall under this compliance space will have impact from security controls. We need to carefully consider if it is more prudent to build HPC systems that operate at the same security level to reduce the impact of security controls. If we do this, we can connect HPC systems at different levels using COTS CDS systems as necessary.

While we are working to transition HPC into ZTA, we will have to carefully weigh the costs and benefits between security, performance, cost, usability, and complexity. There may be designs and implementations or new technology that will provide these controls with less impact on performance, but they will most likely come at a high cost and may require special expertise to operate correctly.

 

References:

[1] Cornell University (2025) “Secure APT-Driven Research Automation to Accelerate Scientific Discovery

[2] The Flux Framework Tutorial (2025) LLNL’s High Performance Computing Innovation Center: 2025 Software Tutorials: Flux

[3] Pulse & Praxis: The Journal for Critical Infrastructure Protection, Security and Resilience (2024) “High Performance Computing Infrastructure and Zero Trust Architecture”

[4] Cybersecurity and Infrastructure Security Agency (CISA – 2023 & 2025) Zero Trust Maturity Model, Version 2.0

 

 

 

 

 

 

Related Articles

Transforming Cyber Challenges into Real-World Customer Solutions

Intuitive and skilled problem solvers, SealingTech engineers design and build defensive cyber solutions for challenging and unpredictable environments where critical missions are at stake. They tackle issues that directly impact…

Learn More

How Geopolitics Defines Cybersecurity for Critical Infrastructure

Geopolitics and cybersecurity increasingly converge. State-sponsored hackers target critical infrastructure as part of broader international competition. Governments use cyber operations for espionage, influence, and sabotage to apply pressure without kinetic…

Learn More

Developing Large Language Models for Cyber Applications

As a Software Engineer at SealingTech, I know firsthand that training large language models (LLMs) can be expensive, not to mention overwhelming with the countless libraries, guides, and resources that…

Learn More

Could your news use a jolt?

Find out what’s happening across the cyber landscape every month with The Lightning Report. 

Be privy to the latest trends and evolutions, along with strategies to safeguard your government agency or enterprise from cyber threats. Subscribe now.