The bots scrape costly endpoints like the entire edit histories of every page on a wiki. You can’t always just cache every possible generated page at the same time.
Cache size is limited and can usually only hold a limited number of most recently viewed pages. But these bots go through every single page on the website, even old ones that are never viewed by users. As they only send one request per page, caching doesnt really help.
Its absolutely sustainable. Just cache it. Done.
The bots scrape costly endpoints like the entire edit histories of every page on a wiki. You can’t always just cache every possible generated page at the same time.
Of course you can. This is why people use CDNs.
Put the entire site on a CDN with a cache of 24 hours for unauthenticated users.
Cache size is limited and can usually only hold a limited number of most recently viewed pages. But these bots go through every single page on the website, even old ones that are never viewed by users. As they only send one request per page, caching doesnt really help.
Cache size is definitely not an issue, especially for these companies using cloudflare
It is an issue for the open source projects discussed in the article.
I’m sure that if it was that simple people would be doing it already…