Since moving from shared hosting to a virtual server, I felt I had to do something about the endless numbers of robots that are harvesting my sites for email addresses, stealing content or running security scanners on my web server to find exploits to report to their human owner. This is the solution I came up with:
A security system to ban malicious web bots
This article covers a solution that tries to detect suspicious behavior and ban the IPs using the Linux 'iptables' command. Iptables will silently drop all TCP packets from the bots IP address, so it will appear as if the server is just busy and the bot has to wait until his connection times out. I like this much better then just 403-blocking spambots, as it is a way to harass them and slow them down. Because IPs can often be dynamically assigned to different hosts, they will be banned for a limited time only. Each time another attempt is made from the same IP, the IP will be banned for twice as long.
This software is inspired by Neil Gunton's Spambot Trap article. Be sure to check out his page. I adapted the idea of his Perl script in Ruby and made some improvements to it: IP ban expiration dates are remembered instead of regularly checking the database to eliminate unnecessary load on the MySQL server. Furthermore I wanted my software to ban not only email spambots, but all kind of harmful robots that are common on the internet like scripts that brute force attacks on phpmyadmin or guestbooks.
Some prerequisites
It is assumed that your server meets the following prerequisites:
- Root access to your server running some sort of Linux/Unix that supports the 'iptables' command. A virtual server is fine.
- Apache
- MySQL
- PHP
- Ruby
How to detect spambots
The first problem for an automatic security system would be to detect potential harmful behavior. I found three patterns that would catch almost every bad robot that visited my server in the last few months I have been running this system:
- Using a user agent string in %{HTTP_USER_AGENT} that is know to be harmful.
- Accessing my site directly by requesting the IP address. Very common for brute force attacks.
- Requesting URLs that are disallowed in your robots.txt file.
1. Banning by user agent is a good start, but not enough itself. Many spambots fake their identity, since spoofing the user agent is easy. To keep your server configuration light and prevent banning innocent users, I would not recommend using third party blacklists. Just look through Apache's log files once in a while and add malicious user agents by hand.
2. Looking into my Apache log files, I noticed thousands of requests, guessing path and passwords for phpmyadmin. They never accessed my server by one of my registered domain names but used it's IP address instead. It seems to be common practice for script kiddies to scan whole IP ranges known to host virtual servers and brute force phpmyadmin login. So I decided to permit direct access on my IP via TCP port 80 and 430. Because accessing a site by IP it is no crime itself and it can easily be used against oneself, it is strongly recommended to ban the host IP for a couple of seconds only.
3. The third rule should be obvious: Whatever user agent reads my robots.txt file and decides to follow disallowed URLs will be banned. Period.
Parts of the security system
So how does it all work? These are the parts of the security system used to dynamically ban IPs:
- The badrobot daemon running in the background to update the firewall every time a new IP is added to the database or an IP ban expires.
- A PHP script that saves IPs in the database: badrobot.php.
- Rewrite rules for Apache to redirect requests from spambots to badrobot.php.
- The honey pot: a robots.txt with some URLs that spambots can't resist to follow.
Step 1: Create the MySQL database
We start off by creating the database to keep record of bad robots. Execute this statement in your MySQL console:
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip` varchar(19) NOT NULL,
`power` int(11) NOT NULL DEFAULT '1',
`ban_time` int(11) NOT NULL DEFAULT '5',
`expiry` datetime NOT NULL,
`last_access` datetime NOT NULL,
`reason` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ip` (`ip`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Step 2: Recording IPs with a PHP script
Whenever a robot makes a request that you defined as forbidden, it will be redirected to the following PHP script that records his IP into a database. You can specify how long the IP should be banned in the variable $ban_time. Every time the robot makes another request, $ban_time will be doubled.
I use different scripts for different $ban_time values, because it is a potential security risk to control the script's variables with URL parameters. Use this as an example and modify to your taste. Then create a directory to contain this file and all the other files we will create. On my server it is '/srv/www/shared/security'.
<?php
echo "Bad Robot! You fell into a honey pot.";
exec("/usr/bin/touch /tmp/badrobot");
# Change this line to match your MySQL configuration
$db = mysql_connect("localhost", "mysql_user", "mysql_password");
mysql_select_db("badrobot", $db);
$ip = $_SERVER['REMOTE_ADDR'];
# How long should the IP be banned?
$ban_time = 600; // seconds = 10 minutes
# Describe the reason for ban
$reason = "Fell in Honeypot: ".$_SERVER['HTTP_USER_AGENT'];
$sql = "INSERT INTO hosts_ban (ip, reason, ban_time, expiry, last_access) VALUES('$ip','$reason', '$ban_time', NOW() + INTERVAL '$ban_time' SECOND, NOW()) ON DUPLICATE KEY UPDATE power = power + 1, expiry = NOW() + INTERVAL (POWER(2,(power-1))*ban_time) SECOND, last_access = NOW();";
mysql_query($sql);
?>
Step 3: A robots.txt to lure spambots
Create a file called 'robots.txt' in your just created security directory and paste the following text:
# Some honeypots to trap bad robots
User-agent: *
Disallow: /guestbook-emails/
Disallow: /top-secret/
Warning: don't try this out by typing these URLs in your browser! You just have to believe me, you will be banned for at least ten minutes.
You could leave it like this and trap bots that read robots.txt and request forbidden pages on purpose. You could also link to these forbidden urls from links hidden to humans, like links around images with zero width and height. Well-behaving bots like GoogleBot will not request these URLs, because they read and obey the robots.txt. Keep in mind that some bots don't read the robot.txt on every visit but use a cached version.
Step 4: Apache configuration
Apache will now be configured to redirect all requests that you defined as harmful to our badrobot.php script. To make sure the security system works on all your virtual hosts, create a file called 'security.inc' and place it in your Apache configuration directory.
# Allow web access to the security directory
<Directory /srv/www/shared/security>
Order deny,allow
Deny from all
Allow from all
</Directory>
Alias /robots.txt /srv/www/shared/security/robots.txt
RewriteEngine On
# Honeypot for bad robots
RewriteCond %{REQUEST_URI} ^/(guestbook-emails|top-secret)(/)?$
RewriteRule ^.* /security/badrobot.php [PT,L]
# Uncomment the following lines to permit IP access via WWW. Create 'badrobot_ipaccess.php' first
# RewriteCond %{SERVER_NAME} 123.123.123.123 # Your server's IP address here.
# RewriteCond %{REQUEST_URI} !favicon.ico
# RewriteRule ^.* /security/badrobot_ipaccess.php [PT,L]
# Uncomment to ban by user agent. Create 'badrobot_useragent.php' first.
# RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
# RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
# RewriteRule ^.* /security/badrobot_useragent.php [PT,L]
# Place your own bad robot directives here...
# This must follow your last bad robot directive
Alias /security /srv/www/shared/security
If you installed the badrobot.php file to another directory than /srv/www/shared/security, you will have to modify security.inc accordingly. Note the [PT,L] at the end of every RewriteRule. It will pass the rewrite to the alias in the last line. This is a hack to use aliases with mod_rewrite. Usually rewrites to aliases are processed before the alias and will fail.
In your virtual host containers include the security.inc like this:
Step 5: Download the badrobot daemon
The badrobot daemon is a short script written in ruby that runs in the background and should not impact your system performance. It checks once every second if the file /tmp/badrobot has been touched. If so, it reloads the banned IPs from the MySQL database and adds them to your firewall. It also remembers the next expiration time to refresh the firewall rules whenever an IP ban has expired.
# This is the Badrobot daemon.
# Version 1.1
# Released under the Public Domain
require 'rubygems'
require 'mysql'
# -- Configuration -- #
# MySQL database:
@db_host = "localhost"
@db_name = "badrobot"
@db_user = "badrobot"
@db_password = ""
# The user your web server runs under:
www_user = "wwwrun"
# Your iptables firewall startup script: (Empty string for none)
firewall_script = ""
# Location of touchfile that gets notified when the database changes:
ban_touchfile = "/tmp/badrobot"
# Check touchfile every n seconds:
loop_time = 1
# -- Don't change below this line unless you know what you are doing -- #
# Initial values
last_ban = 0
@current_chain = 0
@next_expiry = nil
# Restart Firewall to clean up the mess me may have made
system(firewall_script) if firewall_script
# Create touch file, if necessary
system("touch #{ban_touchfile} && chown #{www_user}.root #{ban_touchfile}")
# The IP blocking method
def ipblock
# Alternate chains
@old_chain = @current_chain
@current_chain = (@current_chain == 1) ? 0 : 1
# Flush chain
system("iptables -N banned_ips#{@current_chain} 2>/dev/null")
system("iptables -F banned_ips#{@current_chain}")
dbh = Mysql.real_connect(@db_host, @db_user, @db_password, @db_name)
# Get all banned IPs
result = dbh.query("SELECT ip, expiry FROM hosts_ban WHERE expiry > NOW()")
while row = result.fetch_hash do
# Add IP to chain
system("iptables -A banned_ips#{@current_chain} -s #{row["ip"]} -j DROP")
end
result.free if result
# Get next expiry date
result = dbh.query("SELECT MIN(expiry) AS next_expiry FROM hosts_ban WHERE expiry > NOW()")
if row = result.fetch_hash and row["next_expiry"]
t = row["next_expiry"].split(/-|:|\s/)
@next_expiry = Time.mktime(t[0], t[1], t[2], t[3], t[4], t[5])
else
@next_expiry = nil
end
dbh.close if dbh
# Insert chain in INPUT
system("iptables -I INPUT -j banned_ips#{@current_chain}")
# Delete old chain
system("iptables -D INPUT -j banned_ips#{@old_chain} 2>/dev/null")
system("iptables -F banned_ips#{@old_chain}")
system("iptables -X banned_ips#{@old_chain}")
end
# Main loop
loop do
# IP block
if File.mtime(ban_touchfile) != last_ban or (@next_expiry and Time.now > @next_expiry)
ipblock
last_ban = File.mtime(ban_touchfile)
end
# Wait loop_time seconds
sleep(loop_time)
end
Change the values for db_* to match your MySQL configuration and set www_user to your Apache user. If you are not sure, type 'ps ux' to get all running processes and their usernames listed. If you don't already have the mysql gem on your computer, install it like this:
$ gem install mysql
After you have downloaded and put all required files in place, you need to change into the script's directory and make the badrobotd script executable:
$ cd /path/to/badrobot $ chmod u+x badrobotd # Start the daemon: $ badrobotd >/dev/null 2>&1 &
All done! It might be handy to add a start-up script for badrobotd with your runlevel editor, just make sure it start after the mysql daemon or it will fail to load properly. Also you might extend the badrobot system to your needs. All you have to do is feed the database with IP addresses and the badrobot daemon will do his job. In fact it works not only for web pages: You could for instance extend the badrobot script to look up your system log and ban hosts that try to brute force your SSH login.