Blog
Protect Cloud Workloads: 3D Chess as a Model for Cloud Threats
Edward Wu
June 22, 2021
In the last few years, with increased digitization of all aspects of businesses and the rapid pace that application teams are asked to deliver, cloud workload deployment and migration have been propelled to the top of most IT organizations' project lists. While a record number of organizations have already or are actively looking into running business-critical workloads in the cloud, many are still learning which tactics an attacker might use and what it takes to secure workloads running in the cloud.
At ExtraHop, we launched our first virtual sensor for cloud workloads in 2013, followed by a cloud-native network detection and response (NDR) solution in early 2018. We have observed firsthand how attackers infiltrate and how security practitioners defend workloads in the cloud.
Breaking Down Cloud Workload Security
To help practitioners understand the dynamics and complexity of cloud workload security, we'll use the concept of 3D chess as a way to model threats and defense for cloud workloads. Specifically, the cloud workload security landscape can be conceptually reasoned as a 3D chess game with two parallel planes.
In this 3D game, chess pieces can move forward or defend their positions on the two parallel planes. This is similar to traditional chess with the added complexity that pieces from one plane can also zap into the same position on the other plane.
As you can imagine, in the game of 3D chess, to be successful at stopping attackers from penetrating all layers of defenses, defenders need to have great defenses on each plane of attack and also be prepared for "vertical attacks" that originate from the parallel plane. We are going to use this 3D chess analogy to frame cloud workload security. In the cloud, there are two planes that attackers and defenders make their moves on: the management plane and the data plane.
The Management Plane
The management plane consists of cloud service provider (CSP) management APIs that enable organizations to create, modify, and manage infrastructures in the cloud. Compared to traditional on-premises data centers, where adding new servers typically involves connecting and racking new hardware, CSP management APIs allow application teams to provision new compute, storage capacity, and infrastructure in a blink of an eye.
While it makes operation and development easier in the cloud, the flexibility of the management APIs also opens up a whole suite of novel attack vectors and techniques. For defenders, the key focus of management plane security revolves around user credentials and access policies.
In a typical CSP account containing cloud workloads, there could be thousands of different user credentials, each with its slice of authority in making different CSP management API calls. To manage these credentials and minimize security risk, security practitioners often adopt degrees of least privilege, where each credential is only given the bare minimum it needs for business operation.
In addition to that, most CSPs also offer some form of detection and response services for the management plane to monitor and detect unauthorized or unusual user credential activity. However, one common challenge in building reliable detection and response programs for the management plane is the lack of organizational and operational context of different workloads in the cloud environment: It can be quite difficult to figure out whether a specific service credential should be performing a high-privilege management operation without knowing what that service is supposed to do.
The Data Plane
The data plane is where different hosts on the network communicate with each other via network packets. It should be quite familiar to security practitioners because this plane of attack is the battleground for the traditional on-premises data center and corporate network security. In the data plane, common defense techniques include adopting Zero Trust networking (or network micro-segmentation), deploying firewalls, and building detection and response programs that use a combination of endpoint, network, and log data.
While most of the attack techniques and defense technologies from on-premises data center security can also be directly applied to the data plane, there are important differences. For example, for PaaS, SaaS, containerized, and serverless workloads, endpoint detection and response (EDR) tools are not able to provide a lot of visibility. These are high abstraction services with limited access to the underlying OS. In addition, the ephemeral and dynamic nature of the cloud creates a lot of new challenges for security practitioners as technologies struggle to piece together the behavioral context of different workloads and distinguish malicious behaviors from benign and normal business activities.
How Attackers Play on Both Planes
Let's now take a look at how attackers could combine the management and data planes to advance their objectives. To move from the data plane to the management plane, attackers typically combine lateral movement with credential harvesting. For example, by pivoting to a more privileged host via password brute force or CVE exploitation, attackers would very likely gain access to additional credentials that provide expanded management plane access.
To move from the management plane to the data plane, attackers also have a few options. They can take full advantage of the dynamism in the cloud and "airdrop" workloads of their control directly behind defenses in the data plane. Attackers could also purposely provision malicious workloads in sensitive areas of the environment to directly access high-value assets that defenders thought were safely segmented.
Alternatively, attackers could inject malicious code into existing cloud workloads from the management plane. For example, AWS Systems Manager Agent (SSM Agent) allows users to execute any arbitrary command in supported EC2 instances directly from the management plane. User data is another common mechanism supported by all large cloud service providers that enable the management plane to inject custom code into a virtual machine during the start-up process.
Real-World Examples
A high-profile attack reported in early 2021 is a real-world case study demonstrating how cybercriminals used the interaction of the two planes of attack to their advantage. In this breach, attackers completely dominated the management plane by compromising various administrative credentials of the company's AWS accounts.
Even though the defenders reportedly used AWS security groups to restrict access to high-value data stores, the attackers were able to directly "warp" to the front door of these data stores by spawning malicious Linux instances within the "secure network segment" via the management plane. This is a classic example of how attackers can bypass a defense mechanism in the data plane by leveraging its advanced position in the management plane. Due to the dynamic nature of cloud workloads, most data plane security mechanisms have a hard time defending against this type of cross-dimensional attack.
Threat actors have also been observed adapting their offensive techniques by leveraging the advantages of two parallel planes of attack. They sidestep security mechanisms in one plane by sneaking across the other plane. Compared to traditional on-premises data centers where defenders focus on securing a single plane of attack, defenders need to pay attention to both planes of attack and understand that cloud workload security is ultimately determined by the lowest denominator of the two.
Defenders should aim to achieve comprehensive visibility and build detection and response programs for both planes of attack by relying on a variety of cybersecurity data sources available in the cloud. In addition, it is also vitally important to leverage contextual information gained from one plane to assist with detection and response in the other plane.
To see how you can detect, investigate, and stop an attack in a real AWS environment, start the Reveal(x) demo—an unthrottled version of our network detection and response solution, running on example data.
Discover more