Activity Aware IDS for AWS: The Simple Way to Stay Aware

Today we're open sourcing the internal tool we use at Giftbit to notify our team of suspicious activity in our AWS Account. Given our least privilege approach to IAM policies, Activity Aware IDS for AWS provides an integral part of our configuration debugging and intrusion detection (IDS).

If you're already sold, feel free to check out the Repository, or read on to find out more about the problems it solves, how it works, and how it can help you maintain the security of your AWS account.

Activity Aware IDS helps you be more aware of activity in your AWS account, including those that might suggest potential account compromises. In this article, we will discuss the common use cases for Activity Aware IDS for AWS, an overview of its architecture and how you can start using it today. Before we get to that, it's important to understand the security threats you face as an AWS customer, your responsibility in protecting against them, and overview of the principle of least privilege as a best practice in thinking about security and access control.

The Importance of the Principle of Least Privilege

A decent understanding of the principle of least privilege is vital when talking about security and access controls. In this section, we will briefly describe the principle of least privilege and work through an example of working towards this principle.

Activity Aware IDS is most effective when following the principle of least privilege, that is, that in any system, any identity (users, programs, systems, etc.) granted privileges to access resources or information, should be granted only the minimum privileges necessary to perform their tasks. In the Activity Aware IDS default configuration, it will inform your team when users and roles are attempting to use actions or access resources beyond their privileges. It is possible to configure Activity Aware IDS to notify you of AWS API actions, even when these actions are permitted, but this requires knowledge of the specific actions for which your team wants to be informed.

As an example of following the principle of least privilege, let us look at a recent scenario we faced at Giftbit. We wanted to back up the logs from our instances into an S3 bucket so that if anything happened to one of our instances, we could always get access to its recent logs for investigating the root cause.

To allow instances to upload their logs, we could have used the Identity and Access Management (IAM) system to attach the AmazonS3FullAccess Managed policy to the roles associated with the EC2 instances. With those permissions, they could easily have uploaded their logs, but they would also be able to do other things, like Reading or Deleting the log files. In the case that this instance had these extra privileges, and for some reason, it attempted to delete all the log file, it would be successful. Under the default configuration, we would not be informed of this because the instance only performed actions it was granted. Instead, we could give them an inline policy like the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "*"
    }
  ]
}

This would restrict the instances to only be able to upload to an S3 bucket, but not allow them to Read or Delete from the buckets. This is still not least privilege because, although an instance can only upload files, it can upload them to any bucket in your account. Instead, we could restrict the privilege to a specific bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::mybucket/*"
    }
  ]
}

So now we've restricted our instances to being able to upload only to a specific bucket, we must be good right? Well, we wanted to find the logs for a specific instance, so we wanted them to put their logs into a unique prefix for each instance. For this, we used a prefix that was a combination of the unique id for the role associated with the primary role of the instance, combined with the instance id, like role-id:instance-id. This gives us something special when it comes IAM policies. We can make the above policy even more restrictive:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::mybucket/${aws:user_id}/*"
    }
  ]
}

In the policy above, ${aws:user_id} is what is known as an IAM policy variable. In this case, this specific variable is equivalent to the unique identifier for which ever role, user, or other identity, is making a particular request. So in this example, it would be coming from an EC2 instance. If you consult the table from the IAM policy variables page, you will find that for an EC2 instance, this resolves to role-id:ec2-instance-id, which is what we used for the prefix above. Thus, our instance is only able to upload its log files into a specific bucket, with the specific prefix for its instance. Under the default configuration of Activity Aware IDS, if this instance attempted to perform any other action on this or any other bucket, or upload its logs to a different prefix, a notification would be sent to the Giftbit Slack channel, thus informing the team of the aberration.

So now we must be good right? Well, that's the hard thing about the principle of least privilege. If we happened to know that the instances only upload their logs at a specific time, we could reduce it's privilege even further. The purpose of the principle of least privilege is to reduce the attack surface, that is, the different contexts that a malicious user could attempt to cause damage to your systems. This damage could be caused by creating bad data, getting a hold of your good data, using your resources like your EC2 instances, or creating EC2 instances that get charged to your account. But since we want to backup our logs frequently throughout the day, for all reasonable intents and purposes, we will call this example done.

Setting up proper IAM permissions like above isn't just good practices, it's one part of what AWS customers are responsible in the Shared Responsibility Model. Let us explore this a little further.

Security in the AWS Cloud: The Shared Responsibility Model

As a customer in any cloud provider, it's important to understand your responsibilities. In this section, we'll cover AWS' Shared Responsibility Model, and overview your responsibilities as an AWS Customer.

When securing your resources, the cornerstone of security in the AWS Cloud is the Shared Responsibility Model. At a high level, this is a delineation between what AWS takes responsibility to secure, and what you as an AWS Customer are responsible for securing. Most simply, AWS takes care of security OF the cloud, and you are responsible for security IN the cloud.

For the AWS Side of this, they take responsibility for securing the building blocks used to compose your systems. These include the Compute (EC2 hosts), Storage (S3 infrastructure), Databases (RDS Hosts), Networking infrastructure, and so on. That's not to say that you can just throw all of your customers' credit card data into an S3 bucket, and your security responsibility is complete.

As an AWS Customer, you are responsible for maintaining the security of the operating systems of your EC2 instances, your application, network and firewall configurations, identity and access management, encryption, and customer data. A big part here is that AWS provides the building blocks, like the Storage (S3), and Encryption (KMS) to protect the security of this data, but it's your responsibility to use those services properly, and rotate your access keys regularly. If we stored our data with too broad of permissions, and some attacker gained access to our customer data and leaked it, we, as the AWS customer, are responsible for that.

If this doesn't sound obvious, or you just need to review, I recommend you check out The AWS Shared Security Responsibility Model in Practice video as a starting point.

Now that we have a better understanding of what we as AWS Customers are responsible for security-wise, let us go on to an overview of the potential security threats you need to consider when operating in AWS.

Security Threats in the AWS Cloud

When trying to secure your resources in the AWS Cloud, it's valuable to understand what kind of security threats you're likely to face. In this section, we will briefly cover the different types of security incidents you need to consider.

Although there are a number of different potential security threats you need to be aware of in the AWS Cloud, they have been nicely broken down into the 3 major types of security incidents in AWS by Adobe. These are Infrastructure Impact, Host Compromise, and Account Compromise.

Infrastructure Impact includes external attacks on the underlying infrastructure of your application. This type of attack largely consists of Distributed Denial of Service (DDoS) attacks, where an attacker sends a large volume of traffic at your site. The main objective of this type of attack is to occupy your resources with garbage traffic, so that legitimate traffic can not get through, thus taking your site offline. DDoS Protection tools, like the new AWS Shield, can help to alleviate these types of threats, by identifying common patterns in the DDoS traffic, and blocking it before it gets to your servers.

Host Compromise involves using a technique like command injection to gain access to your existing resources, such as your EC2 instances. When an attacker gains access to one of your instances, for example, they are usually targeting one of two things: Using the instance for its compute resources, such as BitCoin Mining; or Using the instance to gain access to its data or approach another instance that is likely to have valuable data, called an Advanced Persistent Threat (APT). In the case that the attacker can't find valuable data on the instance they gain access to, they will sometimes use that instance to look for other potentially vulnerable instances on the same network (or VPC). Host-based intrusion detection and intrusion prevention are the most common methods of alleviating this type of threat. These protections regularly involve placing some software or agent on your host, which scans incoming and outgoing traffic to look for common attack patterns, and informing you and/or stopping this traffic automatically.

Account Compromise involves an attacker gaining access to some identity (either users or roles on an instance), and then using that identity for the data it has access to or the resources it can create. As in the Host Compromise case above, if the attacker is interested mining bitcoins, they might spin up a number of large (potentially 4xLarge or 8xLarge) instances, using them to mine bitcoins, then leave you with the bill for the instances. Alternatively, if they're interested in the data, existing safeguards might not allow them to export a snapshot of your database directly, for example, but if they can create an instance that has access to the database, they can export it from there.

When it comes to protecting against Infrastructure Impact and Host Compromise, there seems to be no shortage of tools to help with these threats. When it comes to Account Compromise, the solution space seems to be a lot more open. This is where Activity Aware IDS for AWS fits in. When an attacker is attempting to determine the permissions of a compromised identity, they will likely be denied access to a number of the actions and resources they attempt to use while probing the identity. Activity Aware IDS notifies you of these denials where your team will notice them most, like Slack.

Now that we have a better idea of the common types of security incidents that might occur in the AWS Cloud let us look at the use cases for Activity Aware IDS for AWS.

Activity Aware IDS for AWS Use Cases

Intrusion Detection

As we covered briefly in the last section, Activity Aware IDS for AWS can help you discover Account Compromises. This works by using CloudTrail, and the principle of least privilege. CloudTrail is essentially a log of all of the activity in your account, and the principle of least privilege, as we covered above, means giving any identity (users and roles) only access to the permissions it needs to fulfill its tasks.

Now let us imagine that a member of your team, Alice, had her credentials compromised by a malicious attacker Mallory. Mallory might want to see what Groups and Roles they now have access to through Alice's account. For this example, let's assume that Alice doesn't have access to view the Groups she is associated with, or at the very least, doesn't have access to view the policies attached to those groups. If Mallory attempts to view the policies of those groups, she will get denied access, which gets logged to CloudTrail. At this point, Activity Aware IDS receives the denial log, converts it into a friendly format, and sends it to a Slack Channel monitored by your team. Once the message arrives at Slack, your team will see that there are strange "Access Denied" messages associated with Alice, speak with her about the denial, to find out that she is not performing the actions, and replace her credentials.

Principle of Least Privilege Debugging

In the example above, the attacker Mallory was blocked from performing actions due to the principle of least privilege. Although the principle of least privilege is a common recommendation regarding security, it can be difficult in finding the exact set of permissions that a User or Role should have. Activity Aware IDS can also assist with this.

Imagine that you are deploying a new service, and you want to give it permissions following the Principle of Least Privilege. The easiest place to start with the principle of least privilege is to create a role with no permissions. Now lets assume that the system needs to send logs to CloudWatch Logs (this is a common requirement for AWS Lambda). When the system attempts to create a new Log Group, for example, it will get denied, because the role has no permissions. This denial will get logged to CloudTrail, then Activity Aware IDS for AWS will send the denial to your Slack, where you can see the specific action being attempted, the Role attempting to perform the action, and even the arn of the specific resource it's trying to perform the action on. At this point, you can add a statement to the policy for the role allowing it access to "CreateLogGroup" and even provide a fairly restrictive resource description.

Architectural Overview

There is usually value in understanding, at least at a high level how systems you use work. In this section, we'll briefly overview the architecture of Activity Aware IDS, its major components, and how they work together.

The architecture of Activity Aware IDS for AWS is fairly simple. It's composed mainly of 3 pieces: Source Lambdas, Destination Lambdas, and an SNS Topic for message passing.

Architectural Overview of Activity Aware IDS for AWS

Above is an architectural overview of Activity Aware IDS for AWS.

On the left side, we have Sources. These are the services that generate the events of which you want to be aware. Each of these services is coupled with a specific lambda function that knows the structure of events for that service, converts them into a common message format, and sends them to the SNS Topic. In the initial release, we have a CloudTrail source, but we've designed the system to be extensible with additional sources, such as VPC Flow logs.

On the right side, we have Destinations. These are the services or methods of communication by which you want to be informed. Each of these destination services is coupled with a specific lambda function that is subscribed to the SNS Topic, knows how to interpret the common message format, and send a message to the given destination service. In the initial release, we have a Slack destination, but like with Sources, the system is designed to be extensible, and support additional sources, such as SES for email, PagerDuty, and more.

The SNS Topic is a simple, but central messaging hub in Activity Aware IDS. It receives messages from the source lambdas and sends them out to the subscribed destination lambdas.

So there you have it, it's fairly simple, but with its extensibility, it can also be quite powerful.

Getting Started

If you've made it this far, you're probably still interested, so why don't you give it a shot. Check out our Getting Started guide for the requirements you'll need, and how to install Activity Aware IDS for AWS in your AWS account.

How you can help

Do you have a source you'd like to get notified about in your Slack? Maybe you'd love to have your CloudTrail events go to a different destination, like Email? Or maybe you found a bug in the system? We love getting all of this feedback. Be sure to add an issue for your Source and Destination requests, as well as any bugs you've located in our Issues tracker.

Feel like getting your hands dirty? If you want to contribute to the project, we'd be happy to have it. Create an issue for it first, so that we might give you pointers, and so that others know if a feature they're excited about is being worked on.