probably start with no public API that uses something like a csrf token.
then leverage the patterns that companies combating adblock use and make the page source so spaghetti that anything like beautifulsoup couldn’t use it.
at that point you would have to track fluctuations in user actions, rather the lack of. if a user takes .5 seconds to click an action after a page loads 20 times in an hour flag the account for further surveillance that watches over n days for larger impacting actions like post or activity times. correlate with existing users that are in the same profile range and if they are a match for known bot activity ban the account.
on the flip side, bots may try to randomize interactions to combat this, so another filter may look at long term patterns with repetitive actions. things like adding comments may not be useful, but the way they’re entered may.
how long does it take them to enter the comment into the input form? how many words are they using every comment? are what they are responding to indicative of a response length provided?
for example if the post was, “what’s your favorite cheese?” someone may respond with “Gouda” or “I love Swiss on toasted rye. it reminds me of a lake retreat I had where …” but it would certainly be less than 1000 characters.
as opposed to a post asking about a political opinion that’s nuanced and requires thought and opinion to be shared. not just, “you suck!”
further interactions like upvote/downvote can trigger surveillance. I know some users will dv bomb a user for whatever reason. that could be reason enough to identify them as a malicious entity that’s interfering with the system.
So… Having no public API means people just develop libraries to interact with your private API.
Furthermore, beautiful soup can work on any page… It’s just a matter of how easily.
CSRF doesn’t do what I think you think it does. It only works with a cooperating client (i.e. it’s to protect a user in their own web browser). If it’s a bot you’d just scrape the token and move on.
Fluctuations in user actions can also be simulated (you can have a bot architecture that delays work to be done to be similar to what a normal user might do/say/post) … and rate limiting can be overcome by just using more accounts, stolen IP addresses, etc
You can do a lot, but it’s always going to be a bit of a war. Things you’re suggesting definitely help (a lot of them echo strategies used by RuneScape to prevent/reduce bots), but … I think saying it’s an architecture problem is a bit disingenuous; some of those suggestions also hurt users.
How do you propose such an architecture works?
probably start with no public API that uses something like a csrf token.
then leverage the patterns that companies combating adblock use and make the page source so spaghetti that anything like beautifulsoup couldn’t use it.
at that point you would have to track fluctuations in user actions, rather the lack of. if a user takes .5 seconds to click an action after a page loads 20 times in an hour flag the account for further surveillance that watches over n days for larger impacting actions like post or activity times. correlate with existing users that are in the same profile range and if they are a match for known bot activity ban the account.
on the flip side, bots may try to randomize interactions to combat this, so another filter may look at long term patterns with repetitive actions. things like adding comments may not be useful, but the way they’re entered may.
how long does it take them to enter the comment into the input form? how many words are they using every comment? are what they are responding to indicative of a response length provided?
for example if the post was, “what’s your favorite cheese?” someone may respond with “Gouda” or “I love Swiss on toasted rye. it reminds me of a lake retreat I had where …” but it would certainly be less than 1000 characters.
as opposed to a post asking about a political opinion that’s nuanced and requires thought and opinion to be shared. not just, “you suck!”
further interactions like upvote/downvote can trigger surveillance. I know some users will dv bomb a user for whatever reason. that could be reason enough to identify them as a malicious entity that’s interfering with the system.
So… Having no public API means people just develop libraries to interact with your private API.
Furthermore, beautiful soup can work on any page… It’s just a matter of how easily.
CSRF doesn’t do what I think you think it does. It only works with a cooperating client (i.e. it’s to protect a user in their own web browser). If it’s a bot you’d just scrape the token and move on.
Fluctuations in user actions can also be simulated (you can have a bot architecture that delays work to be done to be similar to what a normal user might do/say/post) … and rate limiting can be overcome by just using more accounts, stolen IP addresses, etc
You can do a lot, but it’s always going to be a bit of a war. Things you’re suggesting definitely help (a lot of them echo strategies used by RuneScape to prevent/reduce bots), but … I think saying it’s an architecture problem is a bit disingenuous; some of those suggestions also hurt users.