After dabbling in the world of LLM poisoning, I realised that I simply do not have the skill set (or brain power) to effectively poison LLM web scrapers.

I am trying to work with what I know /understand. I have fail2ban installed in my static webserver. Is it possible now to get a massive list of known IP addresses that scrape websites and add that to the ban list?

  • lungdart
    link
    fedilink
    arrow-up
    8
    ·
    7 days ago

    Fail2ban is not a static security policy.

    It’s a dynamic firewall. It ties logs to time boxed firewall rules.

    You could auto ban any source that hits robots.txt on a Web server for 1h for instance. I’ve heard AI data scrapers actually use that to target big data rather than respect web server requests.