Use Dissector Bot-Manager to defend your website against bad Bots

July 2018 ยท 5 minute read

Motivation

Compute resources and bandwidth are cheap and are getting cheaper and cheaper. One side effect of this is, that more and more Bots ravage through the internet and hit our beloved websites, bulletin boards, blogs, etc.
Most of this Bots are just annoying and hog just a little amount of resources. But there are also those bad Bots which try to spam in comment sections, harvest content to publish it on other sites or even try to find some security issues to inject malware or use the host for another Bot instance. The possibilities are endless.

Stupid and simple Bots can be banned with simple methods but they evolve quickly and with each iteration it’s harder to catch and block them without affecting real human visitors. As a consequence of this issue there’s a whole new industry sector which’s specializing in detecting unwanted Bots and bring them down. One of this companies is Dissectix with it’s lightweight and flexible Bot-Manager Dissector.

Requirements

  • A webpage which should be protected (e.g. geek1.de)
  • Access to your webserver (e.g. Nginx or HAProxy) configuration

Setup

Dissector has different setup options (from Cloud to On-premise). From my perspective the hidden Cloud setup provides the best balance between easy integration and maintainability.
All client traffic still terminates on the origin webserver. But instead of being forwarded to the application server all traffic passes the Dissectix Cloud to be filtered and only then reaches the origin application server (e.g. Magento, osCommerce, WordPress, GitLab, etc.).

Configure Dissector

After getting an account from Dissectix the domain must be configured on the Cloud. The bare minimum setup requires the base domain (e.g. geek1.de) and origin IP and port. Dissector: Add domain Dissector: Init domain

Now Dissector can handle your traffic and pass it back to your origin. There are many configuration options to fine tune the Bot-Manager for your needs (among others SSL between origin and cloud) but that’s not in the scope of this post.

Configure origin

Example configuration of Nginx without Dissector:

server {
  listen 443 ssl http2;
  listen [::]:443 ssl http2;
  server_name geek1.de www.geek1.de;

  root /var/www/geek1.de/htdocs;

  include /etc/ssl/le/geek1.de/includes/nginx.conf;
  include /etc/nginx/ssl.conf;
  
  if ($host !~ ^www\.) {
    return 301 $scheme://www.$host$request_uri;
  }

  location / {
    proxy_pass https://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_http_version 1.1;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_ssl_verify on;
    proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    proxy_ssl_verify_depth 4;
    proxy_ssl_name $host;
    proxy_ssl_server_name on;
    proxy_ssl_session_reuse on;
    proxy_ssl_protocols TLSv1.1 TLSv1.2;
    proxy_ssl_ciphers "EECDH+CHACHA20:EDH+CHACHA20:EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
  }

  access_log /var/log/nginx/access_geek1.de.log;
  error_log /var/log/nginx/error_geek1.de.log;
}

/etc/ssl/le/geek1.de/includes/nginx.conf and include /etc/nginx/ssl.conf are just normalized SSL settings which are generated and stored at a well known place to simplify the Let’s Encrypt setup. This special part of the configuration will be topic of a follow up post. Also plain HTTP is spared out because all hosted systems are SSL only and all HTTP traffic is redirected to HTTPs. Both parts are extraneous for the Dissector integration.

Example configuration of Nginx with Dissector:

set_real_ip_from  203.0.113.1/32;
set_real_ip_from  203.0.113.2/32;
real_ip_header    X-Forwarded-For;
real_ip_recursive on;

geo $realip_remote_addr $g1_upstream  {
  203.0.113.1/32 g1_origin;
  203.0.113.2/32 g1_origin;
  default g1_dsx;
}

upstream g1_origin {
  server 127.0.0.1:3000;
}

upstream g1_dsx {
  server cloud.dissectix.io:443 max_fails=2 fail_timeout=60s;
  server 127.0.0.1:3000 backup;
  keepalive 16;
}

server {
  listen 443 ssl http2;
  listen [::]:443 ssl http2;
  server_name geek1.de www.geek1.de;

  root /var/www/geek1.de/htdocs;

  include /etc/ssl/le/geek1.de/includes/nginx.conf;
  include /etc/nginx/ssl.conf;
  
  if ($host !~ ^www\.) {
    return 301 $scheme://www.$host$request_uri;
  }

  location / {
    proxy_pass https://$g1_upstream;
    proxy_set_header Host $host;
    proxy_http_version 1.1;
    proxy_read_timeout 5s;
    proxy_send_timeout 2s;
    proxy_connect_timeout 2s;
    proxy_set_header Connection "";
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_ssl_verify on;
    proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    proxy_ssl_verify_depth 4;
    proxy_ssl_name $host;
    proxy_ssl_server_name on;
    proxy_ssl_session_reuse on;
    proxy_ssl_protocols TLSv1.1 TLSv1.2;
    proxy_ssl_ciphers "EECDH+CHACHA20:EDH+CHACHA20:EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
  }

  access_log /var/log/nginx/access_geek1.de.log;
  error_log /var/log/nginx/error_geek1.de.log;
}

In the example above 203.0.113.1/32 and 203.0.113.2/32 are just example IPs which are not used by the Dissector Cloud to contact your origin server. You’ll get the real IPs after creating an account. The same applies for the used upstream host cloud.dissectix.io.

The server configuration block changed just a little. Instead of the plain upstream host 127.0.0.1:3000 there is now a Nginx variable $g1_upstream. The interesting part is just the geo and real_ip configuration above the server block.

The following block uses the default Nginx module ngx_http_realip_module to trust the X-Forwarded-For header from the Dissector Cloud which is just the same header which was injected by the configuration in the server block. This configuration can be easily adapted to your needs (e.g. if your application server expects something like X-Real-IP). But don’t forget to update the used header name also in your Cloud account otherwise Dissector won’t know how to detect the real client IP.

set_real_ip_from  203.0.113.1/32;
set_real_ip_from  203.0.113.2/32;
real_ip_header    X-Forwarded-For;
real_ip_recursive on;

The next block uses the default Nginx module ngx_http_geo_module to set the value of the variable $g1_upstream based on the real client IP ($realip_remote_addr). As a result all requests from Dissector will be forwarded to your origin g1_origin and all other requests will be forwarded to Dissector g1_dsx.

geo $realip_remote_addr $g1_upstream  {
  203.0.113.1/32 g1_origin;
  203.0.113.2/32 g1_origin;
  default g1_dsx;
}

Finally just both upstreams must be defined:

upstream g1_origin {
  server 127.0.0.1:3000;
}

upstream g1_dsx {
  server cloud.dissectix.io:443 max_fails=2 fail_timeout=60s;
  server 127.0.0.1:3000 backup;
  keepalive 16;
}

As you can see the Dissector upstream has also the origin application as an backup entry. This setup ensures that even if the Dissector Cloud is not reachable from your origin, Nginx just falls back to the old known setup and forwards the traffic directly to you origin application server until the Dissector cloud is reachable again.
That’s one of the huge benefetis of the hidden Cloud setup: The origin webserver / load balancer can easily off route the whole Dissector Cloud if a problem occurs.