Banning a bot IP that iptables could not ban
| 2715 words | 13 minutes
Recently, I was configuring
fail2ban on a server and testing that it actually works by looking at incoming logs. Thankfully, there were a few bots that consistently hit the server, so I had plenty of testers. However, the was one bot that I simply could not block! This post describes how I came to understand why I wasn’t able to block that one IP and how I managed to block it in the end. (Note: I’m far from a networking expert, so for advanced readers might find the conclusion obvious).
The server we’re interested in this case hosts a gitea instance, which has both HTTP and SSH services. The same server hosts other services, so for the easy maintenance, gitea is deployed as a docker container and sits behind traefik, a reverse proxy that handles TLS termination, which is also deployed as a container.
Part 1: Early wins with SSH
I started from gitea’s SSH service. As I understand it,
fail2ban configuration consists of few components – how to find the logs, how to parse them, and how to ban specific IPs.
As docker is configured to send the logs to
fail2ban has support for
systemd backend, defining the log source is as easy as setting
backend = systemd. However, this configuration alone would require
fail2ban to parse all system logs, most of which are not really required here. There’s multiple ways to filter journal data, but to find the relevant keys and values, one can use
-o json-pretty output mode (some output omitted for brevity):
For this case, we’re interested in logs for specific docker container, so we can define the filter as follows.
Next, parsing the logs is a matter of defining
ignoreregex parameters. To validate that the regexes are picking what they need to pick, use
This way I found that the regexes in default sshd jail didn’t work in this case and I ended up slightly modifying this gist to arrive at a working configuration. All in all, the gitea-ssh specific configuration looked as such:
And that… seemed to work! 🎉 (Note, all IPs in this post are redacted.)
Part 2: HTTP is going to be easy
Similar configurations were then applied to the HTTP service. The logs now come from an
access.log written by
fail2ban reads it and bans IP based on that. Easy!
fail2ban.filter: INFO [traefik-botsearch] Found 188.8.131.52 - 2023-09-06 23:32:19
fail2ban.actions: WARNING [traefik-botsearch] 184.108.40.206 already banned
Wait a minute…
WARNING: already banned means that
fail2ban has already banned the IP in the past, but that IP can still access the resource, thus ending up again in the logs, which are parsed by
fail2ban. So, something’s not right here. 🤔
Good thing here was that there was a particularly insisting bot that continuously spammed the same request with a period of 1-2 hours. This bot proved to be a good testing companion, as that period was enough for me to think about new ideas and implement them before the next request would come again… and pass through
220.127.116.11 - - [06/Sep/2023:02:27:15 +0000] "GET / HTTP/1.1" 404 19 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Mobile/15E148 Safari/604.1" 834 "-" "-" 0ms
Part 3: Checking iptables rules
Let’s see… if
fail2ban is able to notice all of this, then it’s not an issue with parsing configuration. As defined in the
docker-action.conf (see above), the ban and unban actions insert and delete entries in the iptables’ DOCKER-USER chain. Let’s verify that these rules are actually created:
Yeah, the rules are created alright. The same command can be augmented to show packet information:
Here it shows that some of the rules are getting used, as iptables has registered non-zero packet statistics on those rules. That is,
fail2ban successfully creates
iptables rules and
iptables successfully uses them to drop traffic from specific hosts… However, the bot IP we’re interested in doesn’t register any packets –
traefik continues to log a single request from that IP, approximately once per hour, and iptables shows:
Okay… 🤔 so, what if the rules are arranged in a manner that this request is able to hit a previous rule, and get a pass that way?
iptables is actually a collection of tables and chains, with two main independent paths: through
INPUT or through
DOCKER-USER chain is part of the
FORWARD chain, so maybe the packets pass through
INPUT chain and hence also pass the filters…?
There seems to be a lot of different options and solutions that different folks suggest to debug iptables routing (ref1, ref2, ref3, ref4). Not all of those solutions seemed to work for my system, but here’s some of the commands I executed, for your inspiration. Some of these may work, some may not. This area is the most unclear to me.
Nevertheless, I was able to capture and view a single packet of my own, destined for the docker container. There’s a lot of info, but one thing is clear – the packet goes into FORWARD chain and then immediately jumps to DOCKER-USER chain. So, if my packets destined for the docker container end up in the correct chains, then it’s probable that the bot’s packets also pass through. In short, there’s absolutely no issue with
Okay, let’s recap. There’s a bot sending HTTP requests to
traefik and ending up in the
access.logs. We can notice this with
fail2ban and create
iptables rules to block it, but that does absolutely nothing to stop the bot. There aren’t many ideas left to investigate. So, what if we’ll catch the packets as they are and see if their contents can tell us something?
Part 4: Using tcpdump to see the magic packets
tcpdump to capture traffic with the specific bot IP and compare against access.log timestamps.
After a while, the
access.log shows a new request… and
tcpdump remains quiet. Waaait, what. How is it possible that a packet arrives but skips all the
iptables rules and all of the
tcpdump things and just magically appears in the
access.log?? Let’s remove the filter and see what else is happening on the same machine at the same time.
After a while, a new request comes and is logged to
access.log. Okay, open that pcap in
wireshark and there’s one HTTP request with the same headers. Looks promising so far.
wireshark can show the whole TCP stream as a single conversation, so let’s open that up and see:
The bot IP is there!!
And it’s proxied!!
And 18.104.22.168 is IP of an ingress server!!
There was one crucial detail to all of this that I forgot completely – that I had an ingress server. That was one piece of this infrastructure which I setup long time ago in the cloud and forgot about it as it didn’t require any changes. The ingress server proxies 80/443 to the
traefik and that’s it, that’s all it does. It wraps the request with PROXY protocol and
traefik unwraps it, and therefore, the access logs show the client IP, even through
traefik only communicates with the ingress. So, that’s the reason why
iptables rules didn’t work. 🤦
Curiously, the SSH
fail2ban rules worked because of misconfiguration of the DNS records. You see, they were configured to skip the ingress server entirely. Hence, the gitea ssh server communicated directly with the clients, without any PROXY protocol stuff.
Part 5: SSH and PROXY protocol
Alright, all of this setup was surprising and not really that great, so let’s do some cleanup. There’s two improvements to be made here. First, the ingress server should be used for all traffic, so it should also forward SSH traffic. Second,
fail2ban should be configured properly. And that’s it. But there’s an issue – OpenSSH doesn’t handle PROXY protocol. And even if it did,
traefik host cannot do much with
fail2ban because all traffic comes from the ingress. Let’s first focus on proxying SSH traffic. To solve this, I tried couple of things.
For one, using
traefik itself to unwrap TCP stream destined for the SSH service works – but as the
traefik host initiates a new connection by doing so, the SSH service sees the
traefik host IP as the client. Hence, this solution will not work with
gitea seems to support PROXY protocol as well. However, there seems to be some usability issues with the official
gitea/gitea docker container – by default, it seems to use “normal” OpenSSH, but
gitea itself can also act as an SSH server. So,
START_SSH_SERVER=true breaks the container, as there’s now two SSH servers fighting for one port. There doesn’t seem to be an option to disable the OpenSSH server, to replace it with the one from
gitea. And the OpenSSH server doesn’t know how to handle PROXY protocol:
Next, I found of
mmproxy and its alternative
go-mmproxy. There’s a blog post that goes into more detail, but the main point is that this proxy unwraps the PROXY protocol and spoofs the source IP, transparently. That is, any TCP application will “just work” with such a proxy between the ingress and the application. After setting
go-mmproxy up, I saw following logs:
The 172.17.0.1 is a docker internal IP. That is,
go-mmproxy unwraps the PROXY protocol and sends it to docker which now acts as if it was representing the end user. To fix this, the
gitea container had to be moved to the
host network instead. After all of the changes, the logs show that
gitea’s SSH service can correctly see the actual client IP. (
d8744aef4ef8 is the
Based on the official go-mmproxy.service.example, with only ExecStart changed to:
Part 6: Wiring up fail2ban with ingress server
The final piece of the puzzle is to have ingress server somehow made aware of the improper use of resources happening on the application servers. Otherwise, ingress server doesn’t have context to block anything. And application servers cannot block anything, as all traffic comes from one host. The solution is to transfer the context data somehow and there are similar solutions done for Cloudflare: ref1, ref2.
A new application is deployed on the ingress server, with its only purpose to provide access to local
fail2ban-client to external users. This doesn’t really require much, so below is its full source. It’s then wrapped into a systemd service and managed automatically henceforth.
fail2ban on the application server’s side is extended with a new action, which connects to the server above to ban/unban specific IPs.
The connection between these two is naturally kept private with a VPN tunnel. But overall, that’s all there is to make application server’s
fail2ban and ingress server’s
fail2ban work together.
In the end, I also moved the SSH services away from the default 22, which probably was more effective at reducing log spam than anything else outlined in this post. But overall, there’s definitely a lot I’ve learned working with different tools and PROXY protocol in practice.