• purrtastic@lemmy.nz
    link
    fedilink
    English
    arrow-up
    48
    ·
    3 months ago

    It’s not fine. They are not archiving the internet.

    I had to ban their user agent after very aggressive scraping that would have taken down our servers. Fuck this shitty behaviour.

      • Mojave@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        ·
        3 months ago

        They obfuscate their traffic by randomizing user agents, so it’s either add a global rate limit, or let them ass fuck you

        • WhyJiffie@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 months ago

          the article told all source IPs can be tracked back to bytedance. Wouldn’t it be possible to block them? maybe even blocking all IPs of a specific ASN

          • tempest
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            3 months ago

            They can be tracked back one by one but if you have any amount of traffic it’s a constant game of cat and mouse.

            You can block entire ASNs until they start using residential proxies provided by less ethical companies. Then you end up blocking all of France or destroying user experience by enforcing a captcha on everyone.

        • Melvin_Ferd@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 months ago

          Why do they need to hit a website like that? Wouldn’t it just need to scrape the data and frig off. What is the point of creating that much traffic