How to Effectively Implement & Monitor Cloud Infrastructure

Written by Prasad Wanigasinghe | 12/12/19

As organisations are migrating more and more computing to the cloud, they are at risk of becoming more susceptible to malicious attacks. When it comes to the cloud, there’s a difference between what a cloud provider sees and what an attacker sees.

A cloud provider’s perspective

Cloud is ever present, ever accessible
Provides a wide range of computing services
Enables rapid development and deployment
Cloud consumption is rapidly increased

An attacker’s perspective

Can be continuously and relentlessly attacked
A wide surface area to attack
Easy to make mistakes and configuration errors
Makes a super attractive target

Most attacks are not new - things like malware, password brute forcing, credential theft, DDoS, and SQLi are all common in legacy and on-premise systems. Aside from these, there are also new types of attacks emerging in cloud environments such as password spraying, crypto miners, harvesting secrets/subscription keys, and file-less attacks.

For instance, password spraying works by taking one password and throwing them into multiple accounts, while password brute forcing takes one account and throws many passwords against it. There have been many reports of attacks along the supply chain and on misconfigurations in cloud infrastructure.

When we think about attacks on the cloud, we can group them as such:

Tenant level (Any organisation that puts their infrastructure in the cloud)

User elevated to tenant admin
Multi factor authentication changed

Subscription level

External accounts added to subscription
Stale accounts with access to subscription
Attack detection service not configured properly

IAAS

Known hacker/malicious tool/process found
Account password hash accessed
Anti-malware service disabled
Brute force login attack detected
Communication with a malicious IP
TOR IP detected
File-less attack technique detected
Outgoing DDoS attacks

PASS

Malicious key vault access — keys enumerated
Anonymous storage accessed
Activity from unfamiliar location
SQL injection detected
Hadoop YARN exploit
Open management ports on kubenetes nodes
Authentication disabled for App/Web services

SASS

A potentially malicious URL click was detected
Unusual volume of external file sharing
Password spray login attack

We should think about all these areas that need to be secured. Besides securing cloud infrastructure, it is also important to apply a good monitoring mechanism to respond to any kind of incident. But the problem is — are SOCs (Security Operations Centre) really prepared?

There are many challenges surrounding the implementation of a cloud monitoring system that prevent SOCs from keeping up to date.

Most cloud platforms are tenants or are based on subscription models, therefore creating new boundaries
Many cloud services = many attack types, and these attacks are becoming more sophisticated
Since cloud environments are still relatively new, gaining familiarity with this new technology involves a steep learning curve
If you have an on-premise SOC and you want to create a hybrid environment, it makes detection and investigation complex
Cloud infrastructure and services are a lot more dynamic in nature. Organisations will keep on running new services while cloud service providers rapidly will concurrently release new features. Furthermore, DevOps and SRE teams make frequent changes to their production systems. It will take a huge amount of effort to keep SOCs up to date with these new services.

If our servers are on-premise, we have control over the network. If an incident happens, we can perform actions like blocking IP or taking down the machine. However, we may not have the same flexibility on the cloud. Monitoring will require establishing partnerships with SOC analysts, cloud resource owners, subscription owners, and cloud service providers. SOC analysts may even need intervention from cloud resource owners in order to obtain resources to conduct investigations or for implementing remediation steps.

In order to implement an effective cloud monitoring system, we have to identify the odds and events that are generated in the aforementioned attacks. We can divide event types into four categories:

Control plane logs — ex: Create, update, and delete operations for cloud resources
Data plane logs — ex: Events logged as a part of cloud resource usage or Windows events in a VM, SQL audit logs etc.
Identity logs — When you design cloud infrastructure, you need to identify the identity architecture. It should be possible to map identity with any action, such as AuthN, AuthZ events, AAD logs etc.
Baked alerts — ex: Ready to consume security alerts, ASC, CASB etc.

It’s very beneficial to have a common raw events repository and an alert/log template that can help in log analytics. Additionally, it’s also better to include these data as a common template:

Event ID, Event name, Subscription ID, Resource name, Resource ID, Event time, Data centre, Meta data, Prod or dev, Owner ID, User ID, Success or failure.

This can help build your custom monitoring scenarios and help your SOC to run investigations.

Some alerts and logs can be false positives which may generate lots of load for the SOC. To prevent overloading, we can configure some limits so that it can be redirected to the resource owner. If the resource owner feels like a certain alert or log needs an investigation, they can then redirect them to the SOC.

SIEM (Security information and event management) system’s design and architecture is evolving too. If your on-premise infrastructure already has SIEM setup, it’s better to start bringing cloud events to an on-premise SIEM. Most cloud providers have connectors to popular SIEM’s that makes integration seamless. Over time, you can also consider moving to a cloud-based SIEM so that you can move on-premise events to the cloud SIEM. The last approach is to combine cloud and on-premise things into one big data platform. It provides more flexibility and a great user experience.

There are various mechanisms to fetch events:

REST API calls
Connectors by SIEM vendors
Conversion to standard Syslog format

Skilling up your analysts and engineers is the key to success. Start by providing trainings about cloud concepts like IAAS, PASS, and SAAS. You can begin with IASS as it is more close to on-premise before moving on to PASS, which is more complex than IASS. Try to avoid specific things, accept flexibility, find people who understand data, and keep learning.

To be successful in implementing a proper monitoring system, we have to configure it right. We can apply tools like Azure CIS benchmark to achieve this. Prioritisation is super critical. We have limited resources but have hundreds of use cases. We can use threat modelling to prioritise monitoring scenarios and cut noises.

And last but not least, we cannot forget the importance of constantly scaling up team skills, designing the right SIEM architecture, and establishing a mechanism to keep up with new features in the cloud.

View full post