PCG logo
Case Study

Web Application Firewall WAF

The Challenge

A customer has a public website that is often visited by automated bots.

An automated bot, or simply bot, is a software application or script that performs tasks on the internet in an automated and repetitive manner. Bots are designed to perform various functions and are typically programmed to interact with websites, applications, or other online services. They can be created for both legitimate and malicious purposes.

The problem which the customer faces is that some bots or other unidentified web clients consume the resources of its web servers and make them less responsive.

The customer asked for a solution to this problem. The solution should preferably not require any changes to the existing web server infrastructure.

The Solution

We chose AWS WAF because it is easy to integrate into the customer’s setup and offers a predefined set of rules managed by AWS (AWS WAF Bot Control rule group), that can detect different categories of bots. In addition, it is possible to use rate-based rules and rules based on specific request attributessuch as HTTP methods, URL paths, HTTP headers or IP address ranges.

Bot categories

List of all bot categories known to AWS WAF Bot Control:

Category: Advertising

Description: Bots that are used for advertising purposes

Category: Archiver

Description: Bots that are used for archiving purposes

Category: Content Fetcher

Description: Bots that are fetching content on behalf of a user

Category: Email Client

Description: Email clients

Category: Http Library

Description: HTTP libraries that are used by bots

Category: Link Checker

Description: Bots that check for broken links

Category: Miscellaneous

Description: Miscellaneous bots

Category: Monitoring

Description: Bots that are used for monitoring purposes

Category: Scraping Framework

Description: Web scraping frameworks

Category: Search Engine

Description: Search engine bots

Category: Security

Description: Security-related bots

Category: SEO

Description: Bots that are used for search engine optimization

Category: Social Media

Description: Bots that are used by social media platforms to provide content summaries

Category: AI

Description: Artificial intelligence (AI) bots

Category: Automated Browser

Description: Inspects the request’s token for indicators that the client browser might be automated

Category: Known Bot Data Center

Description: Inspects for data centers that are typically used by bots

Category: Non Browser User Agent

Description: Inspects for user agent strings that don’t seem to be from a web browser

Preparations

The next steps are performed in a dedicated test environment before the setup is deployed to production.

We start by writing some terraform code that creates an Application Load Balancer (ALB) to attach the WAF. We put the existing web servers into a target group to which the ALB will route its traffic. The ALB will play the role of an additional reverse proxy layer in front of the web servers.

We also create a Route 53 Alias A record pointing to the ALB and a TLS certificate in the AWS Certificate Manager (ACM) for that DNS record, since we only allow HTTPS traffic at the ALB. HTTP traffic is permanently redirected to HTTPS using HTTP Status code 302 (the HTTP Strict-Transport-Security response header is handled by the caddy reverse proxy layer behind the ALB.)

In addition, the ALB and WAF logs are configured to be sent to Datadog.

Adding WAF rules

Filtering specific URLs

First, we add a filtering rule to the WAF (block_webfonts_path) that blocks the URI path /webfonts/2775253ec3a5.css to a resource that no longer exists on the web servers but is still frequently referenced by some misconfigured resources.

Bot Control

Then we add the AWSManagedRulesBotControlRuleSet AWS WAF Bot Control managed rule group to the WAF and select the “common” inspection level. Using this inspection level, bots are detected by the WAF by statically analyzing request data.

The second rule, bot-control-default, will be processed next if the first rule does not match.

This rule will do the following:

Should a request match against a signature of a verified bot, for example Microsoft’s Bing search engine, the request will be labeled by the WAF and the next rule will be processed (rules that only add labels as an action are not terminating rules, which means that the next rule is processed even if there’s a match).

If the rule matches, the following labels are added:

image-13e46e9eef27

Should the WAF not be able to match a request against one of its signatures of verified bots, it will label it as unverified and block the request. No further WAF rule will be processed.

In this scenario, where our main goal is to prevent the web servers from running out of resources, we start with the most restrictive bot control rule set. We do not only block unverified bots, we also want to block verified bots. This is done with rule bot-control-block-verified.

This rule looks for requests which are labeled awswaf:managed:aws:bot-control:bot:verifiedand blocks them.

Rate-based rule

As the last rule, we add a rate-based rule (rate-based-captcha) with the following condition:

If a specific source IP sends more than 500 requests in 5 minutes, that client will be presented with a CAPTCHA for 5 minutes.

Final rule set

Our final set of rules looks like this:

Rule name: block_webfonts_path

Rule priority: 0

Purpose: Blocks URI path

Rule name: bot-control-default

Rule priority: 1

Purpose: Blocks unverified bots, labels verified bots

Rule name: bot-control-block-verified

Rule priority: 2

Purpose: Blocks verified bots by matching label

Rule name: rate-based-captcha

Rule priority: 3

Purpose: Blocks source IPs that exceed the rate limit

After installing the ruleset on the test environment, we waited for a few days to collect enough logs to be able to generate some traffic statistics and also to check for false positives (WAF blocks requests that should be allowed) or false negatives (WAF fails to detect harmful traffic).

We found no problems, which allowed us to point the DNS record of the production environment to the production ALB where the WAF is also attached. The DNS change now routes the traffic of the production environment to the WAF.

Datadog WAF Logs Example Views

image-bb1865d52a3b

Datadog showing requests blocked by the WAF using rule block_webfonts_path in the last 24 hours.

image-a2b0fa6410aa

Datadog statistics for blocked requests from unverified bots for the last 24 hours.

image-4cb54aac6fc6

Datadog statistics for blocked requests from verified bots for the last 24 hours.

Results and Benefits

By using this set of rules, we were able to reduce the amount of unwanted requests reaching the web servers and could so prevent further outages.

About PCG

Public Cloud Group (PCG) supports companies in their digital transformation through the use of public cloud solutions.

With a product portfolio designed to accompany organisations of all sizes in their cloud journey and competence that is a synonym for highly qualified staff that clients and partners like to work with, PCG is positioned as a reliable and trustworthy partner for the hyperscalers, relevant and with repeatedly validated competence and credibility.

We have the highest partnership status with the three relevant hyperscalers: Amazon Web Services (AWS), Google, and Microsoft. As experienced providers, we advise our customers independently with cloud implementation, application development, and managed services.


Continue Reading

Article
Gemini for Google Workspace: Prompting Guide for Efficient Use of AI

Your cheat code for Gemini for Google Workspace: Get the most out of the AI features with our prompting guide.

Learn more
Case Study
Software
DevOps
Accounting Accelerates

What began as a start-up in their parents' basement has developed into a leading provider of cloud-based accounting and financial software within just a few years: sevDesk.

Learn more
Case Study
Media & Entertainment
Big data platform supports publishing house with data processing

Axel Springer adopts modern, big data technology to manage a substantial amount of operational data.

Learn more
Case Study
Energy & Utilities
Cloud Migration
Energy (over)Flow

"Fish" that generate energy: This is how Energyminer envisions the future of hydropower. PCG was responsible for setting up their data platform in the AWS cloud.

Learn more
See all

Let's work together

United Kingdom
Arrow Down