Hi,

I know this is quite impossible to diagnose from afar, but I came across the posting from lemmy.world admins talking about the attacks they are facing where the database will get overwhelmed and the server doesn’t respond anymore. And something similar seemed to have happened to my own servers.

Now, I’m running my own self-hosted Lemmy and Mastodon instances (on 2 seperate VPS) and had them become completely unresponsive yesterday. Mastodon and Lemmy both showed the “there is an internal/database error” message and my other services (Nextcloud and Synapse) didn’t load or respond.

Login into my VPS console showed me that both servers ran at 100% CPU load since a couple of hours. I can’t currently SSH into these servers, as I’m away for a couple of days and forgot to bring my private SSH key on my Laptop. So, for now I just switched the servers off.

Anyway, the main question is: what should I look at in troubleshooting when I’m back home? I’m a beginner in selfhosting and I run these instances just for myself and don’t mind if I’d have to roll them back a couple days (I have backups). But I would like to learn from this and get better at running my own services.

For reference: I run everything in docker containers behind Nginx Proxy Manager as my reverse proxy. I have only ports 80, 443 and 22 open to the outside. I have fail2ban set up. The Mastodon and Lemmy instances are not open for registration and just have 2 users each (admin + my account).

  • lungdart
    link
    fedilink
    English
    arrow-up
    25
    ·
    10 months ago

    Sounds like you were out of resources. That is the goal of a DoS attack, but you’d need connection logs to detect if that was the case.

    DDoS attacks are very tricky to defend. (Source: I work in DDoS defence). There’s two sections to defense, detection and mitigation.

    Detection is very easy, just look at packets. A very common DDoS attack uses UDP services to amplify your request to a bigger response, but then spoof your src ip to the target. So large amounts of traffic is likely an attack, out of band udp traffic is likely an attack. And large amount of inband traffic could be an attack.

    Mitigation is trickier. You need something that can handle a massive amount of packet inspection and black holing. That’s done serious hardware. A script kiddie can buy a 20Gbe/1mpps attack with their moms credit card very easily.

    Your defence options are a little limited. If your cloud provider has WAF, use it. You may be able to get rules that block common botnets. Cloudflare is another decent option, they’ll man in the middle your services, and run detection and mitigation on all traffic. They also have a decent WAF.

    Best of luck!

    • PriorProject@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      10 months ago

      A very common DDoS attack uses UDP services to amplify your request to a bigger response, but then spoof your src ip to the target.

      Having followed many reports of denial of service activity of Lemmy, I don’t think this is the common mode. Attacks I’d heard of involve:

      • Using regular lemmy APIs backed by heavy database queries. I haven’t heard discussion of query rates, but Lemmy instances are typically single-machine deployments on modest 4-core to 32-core hardware. Dozens to thousands of queries per second to the heaviest API endpoints are sufficient to saturate them. There’s no need for distributed attack networks to be involved.
      • Uploading garbage images to fill storage.

      Essentially the low-hanging fruit is low enough that distributed attacks, amplification, and attacks on bandwidth or the networking stack itself are just unnecessary. A WAF is still a good if indeed OPs instance is getting attacked, but I’d be surprised if wafs has built-in rules for lemmy yet. I somewhat suspect one would have to do the DB query analysis to identify slow queries and then write custom waf rules to rate limit the corresponding API calls. But it’s worth noting that OP has provided no evidence of an attack. It’s at least equally likely that they dos’ed themselves by running too many services on a crappy VPS and running out of ram. The place to start is probably basic capacity analysis.

      Some recent sources:

    • Excel@lemmy.megumin.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I’ve heard that enabling CloudFlare DDoS protection on Lemmy breaks federation due to the amount of ActivityPub traffic.

      • lungdart
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        You could always add them to the allow list so they don’t get blocked.