Protect Your Web Server From Spambots

Since moving from shared hosting to a virtual server, I felt I had to do something about the endless numbers of robots that are harvesting my sites for email addresses, stealing content or running security scanners on my web server to find exploits to report to their human owner. This is the solution I came up with:

A security system to ban malicious web bots

This article covers a solution that tries to detect suspicious behavior and ban the IPs using the Linux 'iptables' command. Iptables will silently drop all TCP packets from the bots IP address, so it will appear as if the server is just busy and the bot has to wait until his connection times out. I like this much better then just 403-blocking spambots, as it is a way to harass them and slow them down. Because IPs can often be dynamically assigned to different hosts, they will be banned for a limited time only. Each time another attempt is made from the same IP, the IP will be banned for twice as long.

This software is inspired by Neil Gunton's Spambot Trap article. Be sure to check out his page. I adapted the idea of his Perl script in Ruby and made some improvements to it: IP ban expiration dates are remembered instead of regularly checking the database to eliminate unnecessary load on the MySQL server. Furthermore I wanted my software to ban not only email spambots, but all kind of harmful robots that are common on the internet like scripts that brute force attacks on phpmyadmin or guestbooks.

Some prerequisites

It is assumed that your server meets the following prerequisites:

Root access to your server running some sort of Linux/Unix that supports the 'iptables' command. A virtual server is fine.
Apache
MySQL
PHP
Ruby

How to detect spambots

The first problem for an automatic security system would be to detect potential harmful behavior. I found three patterns that would catch almost every bad robot that visited my server in the last few months I have been running this system:

Using a user agent string in %{HTTP_USER_AGENT} that is know to be harmful.
Accessing my site directly by requesting the IP address. Very common for brute force attacks.
Requesting URLs that are disallowed in your robots.txt file.

1. Banning by user agent is a good start, but not enough itself. Many spambots fake their identity, since spoofing the user agent is easy. To keep your server configuration light and prevent banning innocent users, I would not recommend using third party blacklists. Just look through Apache's log files once in a while and add malicious user agents by hand.

2. Looking into my Apache log files, I noticed thousands of requests, guessing path and passwords for phpmyadmin. They never accessed my server by one of my registered domain names but used it's IP address instead. It seems to be common practice for script kiddies to scan whole IP ranges known to host virtual servers and brute force phpmyadmin login. So I decided to permit direct access on my IP via TCP port 80 and 430. Because accessing a site by IP it is no crime itself and it can easily be used against oneself, it is strongly recommended to ban the host IP for a couple of seconds only.

3. The third rule should be obvious: Whatever user agent reads my robots.txt file and decides to follow disallowed URLs will be banned. Period.

Parts of the security system

So how does it all work? These are the parts of the security system used to dynamically ban IPs:

The badrobot daemon running in the background to update the firewall every time a new IP is added to the database or an IP ban expires.
A PHP script that saves IPs in the database: badrobot.php.
Rewrite rules for Apache to redirect requests from spambots to badrobot.php.
The honey pot: a robots.txt with some URLs that spambots can't resist to follow.

Step 1: Create the MySQL database

We start off by creating the database to keep record of bad robots. Execute this statement in your MySQL console:

CREATE TABLE `hosts_ban` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip` varchar(19) NOT NULL,
`power` int(11) NOT NULL DEFAULT '1',
`ban_time` int(11) NOT NULL DEFAULT '5',
`expiry` datetime NOT NULL,
`last_access` datetime NOT NULL,
`reason` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ip` (`ip`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Step 2: Recording IPs with a PHP script

Whenever a robot makes a request that you defined as forbidden, it will be redirected to the following PHP script that records his IP into a database. You can specify how long the IP should be banned in the variable $ban_time. Every time the robot makes another request, $ban_time will be doubled.

I use different scripts for different $ban_time values, because it is a potential security risk to control the script's variables with URL parameters. Use this as an example and modify to your taste. Then create a directory to contain this file and all the other files we will create. On my server it is '/srv/www/shared/security'.

/* File: badrobot.php */
<?php
echo "Bad Robot! You fell into a honey pot.";
exec("/usr/bin/touch /tmp/badrobot");

# Change this line to match your MySQL configuration
$db = mysql_connect("localhost", "mysql_user", "mysql_password");
mysql_select_db("badrobot", $db);

$ip = $_SERVER['REMOTE_ADDR'];

# How long should the IP be banned?
$ban_time = 600; // seconds = 10 minutes

# Describe the reason for ban
$reason = "Fell in Honeypot: ".$_SERVER['HTTP_USER_AGENT'];

$sql = "INSERT INTO hosts_ban (ip, reason, ban_time, expiry, last_access) VALUES('$ip','$reason', '$ban_time', NOW() + INTERVAL '$ban_time' SECOND, NOW()) ON DUPLICATE KEY UPDATE power = power + 1, expiry = NOW() + INTERVAL (POWER(2,(power-1))*ban_time) SECOND, last_access = NOW();";
mysql_query($sql);
?>

Step 3: A robots.txt to lure spambots

Create a file called 'robots.txt' in your just created security directory and paste the following text:

# File: robots.txt
# Some honeypots to trap bad robots
User-agent: *
Disallow: /guestbook-emails/
Disallow: /top-secret/

Warning: don't try this out by typing these URLs in your browser! You just have to believe me, you will be banned for at least ten minutes.

You could leave it like this and trap bots that read robots.txt and request forbidden pages on purpose. You could also link to these forbidden urls from links hidden to humans, like links around images with zero width and height. Well-behaving bots like GoogleBot will not request these URLs, because they read and obey the robots.txt. Keep in mind that some bots don't read the robot.txt on every visit but use a cached version.

Step 4: Apache configuration

Apache will now be configured to redirect all requests that you defined as harmful to our badrobot.php script. To make sure the security system works on all your virtual hosts, create a file called 'security.inc' and place it in your Apache configuration directory.

# File: security.inc
# Allow web access to the security directory
<Directory /srv/www/shared/security>
Order deny,allow
Deny from all
Allow from all
</Directory>

Alias /robots.txt /srv/www/shared/security/robots.txt

RewriteEngine On

# Honeypot for bad robots
RewriteCond %{REQUEST_URI} ^/(guestbook-emails|top-secret)(/)?$
RewriteRule ^.* /security/badrobot.php [PT,L]

# Uncomment the following lines to permit IP access via WWW. Create 'badrobot_ipaccess.php' first
# RewriteCond %{SERVER_NAME} 123.123.123.123 # Your server's IP address here.
# RewriteCond %{REQUEST_URI} !favicon.ico
# RewriteRule ^.* /security/badrobot_ipaccess.php [PT,L]

# Uncomment to ban by user agent. Create 'badrobot_useragent.php' first.
# RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
# RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon
# RewriteRule ^.* /security/badrobot_useragent.php [PT,L]

# Place your own bad robot directives here...

# This must follow your last bad robot directive
Alias /security /srv/www/shared/security

If you installed the badrobot.php file to another directory than /srv/www/shared/security, you will have to modify security.inc accordingly. Note the [PT,L] at the end of every RewriteRule. It will pass the rewrite to the alias in the last line. This is a hack to use aliases with mod_rewrite. Usually rewrites to aliases are processed before the alias and will fail.

In your virtual host containers include the security.inc like this:

Include "path/to/your/security.inc"

Step 5: Download the badrobot daemon

The badrobot daemon is a short script written in ruby that runs in the background and should not impact your system performance. It checks once every second if the file /tmp/badrobot has been touched. If so, it reloads the banned IPs from the MySQL database and adds them to your firewall. It also remembers the next expiration time to refresh the firewall rules whenever an IP ban has expired.

#!/usr/bin/env ruby

# This is the Badrobot daemon.
# Version 1.1
# Released under the Public Domain

require 'rubygems'
require 'mysql'

# -- Configuration -- #

# MySQL database:
@db_host = "localhost"
@db_name = "badrobot"
@db_user = "badrobot"
@db_password = ""

# The user your web server runs under:
www_user = "wwwrun"

# Your iptables firewall startup script: (Empty string for none)
firewall_script = ""

# Location of touchfile that gets notified when the database changes:
ban_touchfile = "/tmp/badrobot"

# Check touchfile every n seconds:
loop_time = 1

# -- Don't change below this line unless you know what you are doing -- #

# Initial values
last_ban = 0
@current_chain = 0
@next_expiry = nil

# Restart Firewall to clean up the mess me may have made
system(firewall_script) if firewall_script

# Create touch file, if necessary
system("touch #{ban_touchfile} && chown #{www_user}.root #{ban_touchfile}")

# The IP blocking method
def ipblock
# Alternate chains
@old_chain = @current_chain
@current_chain = (@current_chain == 1) ? 0 : 1
# Flush chain
system("iptables -N banned_ips#{@current_chain} 2>/dev/null")
system("iptables -F banned_ips#{@current_chain}")
dbh = Mysql.real_connect(@db_host, @db_user, @db_password, @db_name)
# Get all banned IPs
result = dbh.query("SELECT ip, expiry FROM hosts_ban WHERE expiry > NOW()")
while row = result.fetch_hash do
# Add IP to chain
system("iptables -A banned_ips#{@current_chain} -s #{row["ip"]} -j DROP")
end
result.free if result
# Get next expiry date
result = dbh.query("SELECT MIN(expiry) AS next_expiry FROM hosts_ban WHERE expiry > NOW()")
if row = result.fetch_hash and row["next_expiry"]
t = row["next_expiry"].split(/-|:|\s/)
@next_expiry = Time.mktime(t[0], t[1], t[2], t[3], t[4], t[5])
else
@next_expiry = nil
end
dbh.close if dbh
# Insert chain in INPUT
system("iptables -I INPUT -j banned_ips#{@current_chain}")
# Delete old chain
system("iptables -D INPUT -j banned_ips#{@old_chain} 2>/dev/null")
system("iptables -F banned_ips#{@old_chain}")
system("iptables -X banned_ips#{@old_chain}")
end

# Main loop
loop do
# IP block
if File.mtime(ban_touchfile) != last_ban or (@next_expiry and Time.now > @next_expiry)
ipblock
last_ban = File.mtime(ban_touchfile)
end
# Wait loop_time seconds
sleep(loop_time)
end

Change the values for db_* to match your MySQL configuration and set www_user to your Apache user. If you are not sure, type 'ps ux' to get all running processes and their usernames listed. If you don't already have the mysql gem on your computer, install it like this:

  $ gem install mysql

After you have downloaded and put all required files in place, you need to change into the script's directory and make the badrobotd script executable:

  $ cd /path/to/badrobot
  $ chmod u+x badrobotd
  # Start the daemon:
  $ badrobotd >/dev/null 2>&1 &

All done! It might be handy to add a start-up script for badrobotd with your runlevel editor, just make sure it start after the mysql daemon or it will fail to load properly. Also you might extend the badrobot system to your needs. All you have to do is feed the database with IP addresses and the badrobot daemon will do his job. In fact it works not only for web pages: You could for instance extend the badrobot script to look up your system log and ban hosts that try to brute force your SSH login.

Comment on this article [21]

Made some minor fixes and code clean-up in the scripts. Don’t bother reinstalling though if it already works for you.
Some statistics for the 12 days in January so far:

34 IPs banned for trying to brute force phpmyadmin by accessing this serverâ€™s IP address directly.

9 User agents fell into the robots.txt honeypot. One of them was a suspicious reader of this blog that wanted to check the strength of the trap himself. Also the Sphere bot thatâ€™s supposed to fetch RSS feeds fell into the trap. The other seven bots were definitly spammers as far as I can see.

Thatâ€™s an average of 3 IPs banned per day.
Very nice work, thanks.
Good article, good idea. Thanks!
Very nice.
Am a newbie at all this would you mind pointing me to where i can get instructions on extending the badrobot script to ban hosts that try to brute force SSH login. Or if possible send me an email on how to do it
how do i use spam bot?
having trouble loging in to my message board
Great Article!

I am testing the script and need some assistance with trouble shooting.

It appears to be running correctly, but the banned IPs are’nt being loaded into iptables.

Many thanks, Thomas
Nice work. I’m having some trouble with badrobotd though. I’m not real familiar with Ruby, but what I’m trying to do is modify the script so that it inserts and deletes the ip address when they expire from an existing blacklist chain.

As of now, the script appears to create banned_ips chain and inserts the offending ip address. But it looks like when it expires, it deletes the entire banned_ips chain? Maybe I’m reading the script wrong.

Any suggestions…Thanks.
Jon: As to my knowlege, there is no way to delete a specific IP from the chain.

So when an IP expires, the entire chain is flushed before the other IPs from the database are added back to the chain.

To prevent an attacker to connect to the server while the chain is flushed, two alternating chains “banned_ips0/banned_ips1” are used - so at any given time at least one IP chain is active.
Ok, thanks for the responce. I understand now.

We should be able to delete an existing ip without the need to create/alternate chains. If you wouldn’t mind taking a look at:

under “5. Customising the Config file”.
iptables -D <chain_name> -s <ip> -j DROP should be possible.

It should be possible to query the expired ips from the db and loop through each one, removing it from the chain. The biggest downfall I’d see is that the script should flag the actively blocked ips in the db. So that the script doesn’t attempt to delete an ip address that isn’t in the chain. Also, a check would be required to ensure chain actually exists of course.

I may play a bit with this and see if it’s feasible. Thanks gain.
Confirmed.

To add to an ip to an existing chain:
iptables -A <chain_name> -s <ip> -j DROP

To remove an ip from an existing chain:
iptables -D <chain_name> -s <ip> -j DROP

I changed the script to use a variable for the chain name, instead of being hard coded. I still need to do some fiddling with the remove part. It’s going to take me some time though, as I’m learning Ruby as I go.
Ok…it looks like I have everything working but I’m still in the testing phase. I have badrobotd adding and removing ips from an exiting chain. The chain is configurable to any name you wish to call it, however…it know longer creates the chain nor checks if it exists (may change this). It assumes that you have created it and have placed the chain referenced where it fits best in your iptables rule set.

I’ve added a few features, such as whitelisting an ip address, badrobotd forks in the background and the ip in the db gets flagged when it blocked (useful for stats).

I’m working on a php script to view the stats from the db and adding the ability to whitelist ips from the php script.

On a side note, I should mention that in my above post, to add an ip address to an existing chain should have been:
iptables -I <chain_name> -s <ip> -j DROP

Otherwise the ip will get ammended to the end of the chain, where an insert will put it at the beginning of the chain. If we do a RETURN on the chain back to the parent chain (which should happen by default in iptables, even if it’s missing) when no match is found, then the ip must be added to (inserted, not ammended) before the return. Oherwise the rule will never match.
Hi Jon, seems you really getting into it.

Thanks for the link, I am not sure anymore if I choose the “flushing chains method” because I didn’t know how to remove IPs from a chain or because it seemed to be a better solution at the end of the day..

Flagging IPs as blocked is not necessary as far as I can see. You can query your MySQL for expiry dates in the future, those are the IPs to be blocked. Expiry dates in the past show IPs that have been blocked but are unblocked now.

PS: Reading over your posts again, it seems like my script already does what you want it to do, except maybe for the whitelisting feature. IPs are never deleted from the blacklist but only expire, because the next time a host with a blacklisted IP "misbehaves" after it had been banned, it should be banned longer.
Hi Niko,

Your right about flagging the ip…its not really nessecary in order for the script to remove an ip from iptables.
I’m still flagging them anyway, so that I know the script ran the appropriate command (I have the script flag the ip when the iptables command is run, so that if that part of the script doesn’t run, the flag does not get set). It’s really just a preference thing, but I’d prefer not to assume an ip is unblocked, just because it’s after it’s expiry time in the db (or vise-versa).

Something could happen to badrobotd (such as it gets sent a kill command by something/someone), mean while we think ips are getting dropped by iptables, when their not. Only to find our bandwidth back saturated yet again. Again, it’s really a preference thing…theres no real right or wrong way to do it.

I understand about ips never getting deleted from the db. It was why I added the whitelisting feature.

Thanks for your scripts. They have really been a huge advantage. Most of the other scripts required using the .htaccess file, which was going to be insane using across 40 websites.

BTW, I was just curious, but have you ever been DOSed using the alternating chain method (I suspect not but I’d thought I’d ask)?

It seems to me using a series of proxies/zombie hosts, one could continue to hit the trap at a rapid rate, causing iptables to go into a race situation, where it creates/flushes/adds/deletes/references chains/ips at a rapid rate. Thus DOSing itself.
Hi Jon,

if someone is trying to break a single server by using a series of (zombie) hosts, I guess there must be special reason why that server gets so much attention. In such a case I would never rely only on an automatic script like Badrobot.

I have never been target of such an attack, at least not that I know of.

The badrobot script waits for one second (variable loop_time) after each run, that makes it very unlikely to cause a DOS by itself. But you could of course set that variable to a higher value if necessary.
Hi Niko,

I have put your scripts to good use :).

Just wanna say that you don’t really have a daemon unless you detach the process from its controlling terminal.

As such, it’s very easy to ‘daemonize’ the Ruby script with¹:

exit if fork # Parent exits, child continues.
Process.setsid # Become session leader.
exit if fork # Zap session leader. See [1].
Dir.chdir “/” # Release old working directory.
File.umask 0000 # Ensure sensible umask. Adjust as needed.
STDIN.reopen “/dev/null” # Free file descriptors and
STDOUT.reopen “/dev/null”, “a” # point them somewhere sensible.
STDERR.reopen STDOUT # STDOUT/ERR should better go to a logfile.

Optionally, you could ‘gem install daemons’ and have a second script as controller to start and stop this one.

Cheers,
Ari Constancio
Well, this article is great and gives me some good ideas. Currently, I just use my emails to ROOT with warnings of failed logins etc, as well as sifting through my apache logs … and I can visually detect bots and hackers (along with emails that come to me when users want to register on my forums) .. in short, I have a script that just “REMOVES” or -j DROP all the IP’s in my script… just a long list of IP’s to block.

Now I read there is a special blacklist mechanism… but whatever… eventually I’ll write my own tool that parses all my logs, moves the info to mysql for reporting and statistics while searching for “signatures” known to be problematic… (bots) and automate my iptable “deny” script much like the article here.

I’m just annoyed how these morons are so relentlessly trying to barge their way into systems without prejudice. It’s quite frustrating, they need to get a life.

—Jason
Hi, I might be quite the newcomer to this game, especially after seeing how old this thread is, but I’ve written my own solution (GPL too), called ZB Block. It’s version is still only 0.4.4 beta, so there’s plenty of time to jump in and help dev. it with me.

It only requires PHP to operate, and there are reasons for NOT using MySQL in it, as I don’t want some of the injection attempts to come near any schema of mine.

The signature library is large, and it checks with torproject.org, stopforumspam.com, and hosts-file.net to see if the registrant is a TOR user, or known spammer. It also uses an .ini file for most settings, logs to a .txt and/or a .csv file, and has it’s own flat file database abuse ban system. Until banned on X malevolent actions, the client gets a 403 with description of the bad act. Afterwards, they get a permanent 503 with a 24 hour timeout.

Zap :)
It is somewhat weird when you see that â€œZaphodâ€ started out by promoting his php-based blocking thingy called ZBBlock by spreading links to his website all over the web.

I might be a bit narrow-minded when it comes to infosec, but that’s a practical example of â€œcomment-spamâ€ if you’ld ask me.

So Zap, in case you ever come back here and read this, note that â€œwe’ve been using spamtraps like yours long before zbblock even existedâ€!

This article is a superb example of such an implementation and â€” in contrast to zbblock â€” it actually works a bit better than tools like zbblock because this way to do it does not permanently block anyone, or even rely on 3rd party blacklists which need frequent updating. Simply slowing down the bad guys so they’ll have a looooong time of fun while trying to grab content from your website works just perfect! ;)

The only thing I would change: I’ld add some whitelisting for benign bots.
What about google bots? will they also be banned?

RubyRobot