Real-Time Traffic Analysis 1: GoAccess

When dealing with performance problems with a web application, there’s always the possibility that the root cause is malicious traffic. The difficulty lies in how to find unique identifiers of the attack (addresses, HTTP headers, etc). These series of posts will go over tools and techniques that I’ve depended on when I needed to quickly isolate attackers and take steps to mitigate.

The first tool is GoAccess, which is a ncurses-based log analyser. I use it for NCSA log formats generated by Apache, Nginx, and Varnish web servers.

You can point it directly at a log file. This will allow you to analyse the log as it is being written to:

bash root@host:~# ./goaccess -f /var/log/nginx/access.log

You can even generate an html report by redirecting output to a file:

bash root@host:~# ./goaccess -f /var/log/nginx/access.log > report.html

Here’s an example report.

Key information to look at would be:

  • Top Connecting IPs: is there one or many IPs generating the most traffic?
  • Top Requested Files: is there a page or pages that is being requested the most?
  • HTTP Status Codes: is the stack surviving the traffic (200) or timing out (503)?
  • Top 404 requests: some web applications (eg: Drupal) can take a performance hit when there’s a lot of 404s being emitted

With this information, you will be able to gain a clearer understanding of traffic hitting a single host. If you need to analyse multiple hosts, you can use a parallel shell like pdsh and redirect all webhead logs to a file as they are being written to:

bash pdsh -w web1,web2,web3 "sudo tail -f /var/log/nginx/access.log" > /tmp/analysis.log

Then run goaccess against the file. I use this technique all the time against groups of webservers.

Next time we’ll go over the tool suite that the Varnish caching reverse proxy offers to isolate and block pesky crawlers and botnets. You are using Varnish, aren’t you? :D

Dialogue & Discussion