Hi,

I know this is quite impossible to diagnose from afar, but I came across the posting from lemmy.world admins talking about the attacks they are facing where the database will get overwhelmed and the server doesn’t respond anymore. And something similar seemed to have happened to my own servers.

Now, I’m running my own self-hosted Lemmy and Mastodon instances (on 2 seperate VPS) and had them become completely unresponsive yesterday. Mastodon and Lemmy both showed the “there is an internal/database error” message and my other services (Nextcloud and Synapse) didn’t load or respond.

Login into my VPS console showed me that both servers ran at 100% CPU load since a couple of hours. I can’t currently SSH into these servers, as I’m away for a couple of days and forgot to bring my private SSH key on my Laptop. So, for now I just switched the servers off.

Anyway, the main question is: what should I look at in troubleshooting when I’m back home? I’m a beginner in selfhosting and I run these instances just for myself and don’t mind if I’d have to roll them back a couple days (I have backups). But I would like to learn from this and get better at running my own services.

For reference: I run everything in docker containers behind Nginx Proxy Manager as my reverse proxy. I have only ports 80, 443 and 22 open to the outside. I have fail2ban set up. The Mastodon and Lemmy instances are not open for registration and just have 2 users each (admin + my account).

  • GameGod
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I don’t see anyone else actually telling you how to figure out if you’re being DoSed, so I’ll start:

    Check your logs. Look at what process is eating your CPU in htop and then look at the logs for that process. If it’s a web application, that means the error and access logs for it. If you see a flood of requests to a single URL, or some other suspicious pattern in the log, then you can try blocking the IPs associated with them temporarily and see if it alleviates the load. Repeat until the load goes down.

    If your application uses a database, check your database logs too. IIRC postgres logs queries that take longer than 5 seconds by default, which can make it easy to spot a slow query especially during a time of high load.

    I don’t think DNS amplification attacks over UDP are likely to be a problem as I think most cloud providers filter traffic with forged src addresses (correct me if I’m wrong). You can also try blocking all inbound UDP traffic if you suspect a UDP flood but this will likely break DNS lookups for you temporarily. (your machine should not have any open UDP ports in any case though if you’re just running Lemmy).

    If you want to go next level, you can use “perf” to generate a system-wide profile and flamegraph which will show you where you’re burning CPU cycles. This can be extremely useful for troubleshooting performance or optimizing applications. (you’ll find that even ipfilters takes CPU power, which is why most DDoS protection happens on dedicated hardware upstream)