Real-Time Traffic Analysis 2: Varnish
Varnish is a very powerful caching reverse proxy which features a configuration language (VCL) and tools to analyse traffic. I use it primarily to cache anonymous Drupal page requests so that a site can handle a massive spike. When your application is configured correctly to work with Varnish, you will soon find that the next major bottleneck to deal with is your server’s Internet connection :-).
In the last post we went over how to use GoAccess to analyse Apache logs to find potential malicious clients. Varnish can lend a hand towards this purpose as well, and can even be used to thwart an attack.
The primary tool I use for this is varnishtop, in particular to see incoming request headers.
root@server:~# varnishtop -i RxHeader
An example output with me just running curl against my frontpage over and over again:
list length 14 aminastaneh.net
14.86 RxHeader Host: aminastaneh.net
14.86 RxHeader Accept: */*
13.86 RxHeader User-Agent: curl/7.29.0
1.97 RxHeader Server: nginx/0.7.65
1.97 RxHeader Content-Type: text/html
1.97 RxHeader Last-Modified: Sun, 28 Jul 2013 21:45:26 GMT
1.97 RxHeader Connection: keep-alive
1.97 RxHeader Accept-Ranges: bytes
1.00 RxHeader User-agent: Mozilla/5.0 (compatible; Ezooms/1.0; ezooms
1.00 RxHeader Accept-Charset: utf-8;q=0.7,iso-8859-1;q=0.2,*;q=0.1
1.00 RxHeader Date: Thu, 22 Aug 2013 01:47:30 GMT
1.00 RxHeader Content-Length: 11270
0.98 RxHeader Date: Thu, 22 Aug 2013 01:47:08 GMT
0.98 RxHeader Content-Length: 37999
So with this information I can see the user agent most prominently displayed. Now, assume there was a really unsophisticated script kiddie that wanted to take my site down. He just wants to generate a bunch of pageloads using some random script he downloaded off a site somewhere:
#!/bin/bash
for i in `seq 1 100`;
do curl $1 -H 'User-Agent: EvilBot 0.1 (Linux x86_64)';
done
Nagios wakes me up because my uptime is starting to suffer and once I come to the conclusion this is a problem regarding traffic levels and I analyse what’s coming in, I will see this bubbling up to the top of the varnishtop output:
95.50 RxHeader User-Agent: EvilBot 0.1 (Linux x86_64)
Sweet! Now I can use VCL to block the offending user-agent.
sub vcl_recv {
...
if (req.http.User-Agent ~ "EvilBot") {
error 404 "Not Found";
}
...
}
I reload varnish, and the attacker’s terrible script starts returning this:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>Error 404 Not Found</h1>
<p>Not Found</p>
<h3>Guru Meditation:</h3>
<p>XID: 1958276348</p>
<hr>
<address>
<a href="http://www.varnish-cache.org/">Varnish cache server</a>
</address>
</body>
</html>
Of course this is a very hypersimplistic example. However, I have used varnishtop and VCL-driven 404s in this fashion several times to great success against several patterns:
- Accept-Language headers unique to a certain region
- Obviously nonexistent URLs, protecting the application from needlessly running code only to spit a 404
- Videos embedded in a frontpage instead of using a streaming service
Some guidelines:
- This technique shouldn’t be used for a long-term fix, however can be a lifesaver when in the middle of an outage. Consider moving to a CDN if these issues become common.
- Look for out-of-the-ordinary but prominent headers in incoming requests.
- Be careful to not accidentally block legitimate traffic.
Got a question or an interesting Varnish-related story to tell? Please let me know in the comments!