Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

AlteredStateBlob@kbin.social · edit-2 10 months ago

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

AlteredStateBlob@kbin.social · 10 months ago

Every post is tied to a username and email address, making it personal information, since each poster can be identified. I’m sure they’re also tracking further metrics such as IP addresses, browser fingerprints, etc. It is immaterial if we from the outside are able to identify users, it only matters if it’s possible given the data available to the processor. In this case, it is. Not to mention, there is a good chance texts and posts themselves contain plenty of personal information, such as linking to other user profiles, mentioning and discussing people, etc.

FaceDeer@kbin.social · 10 months ago

If they were GDPR-compliant before, I don’t see how they’ve changed to not be GDPR-compliant now. They allow people to delete their accounts and their posts if they wish, which removes all identifying information from their system.

Frankly, this looks like just a “I just hate Reddit! There’s gotta be something I can hit them with!” flailing attempt to me.

roadkill@kbin.social · 10 months ago

They ‘allow’ people to delete their posts and accounts…

But never actually delete anything from their databases. I’ve had years-old comments I deleted mysteriously reappear despite being gone for months.

FaceDeer@kbin.social · edit-2 10 months ago

So contact them about that, then. Sue them if you’re sufficiently offended. This doesn’t change anything. If they were GDPR-compliant before they’re still GDPR-compliant, if they weren’t GDPR-compliant then they still aren’t. My point is that this AI training stuff has nothing to do with that.

webghost0101@sopuli.xyz · 10 months ago

Did you read the parts of clear informed use case for any further processing. I asked the people i know still go there none of them where even aware there was anything going on.

HeartyBeast@kbin.social · 10 months ago

True, however I assume that Reddit is supplying Google with just the text. So, yes Reddit is collecting lots of PII, but that’s not what is going to Google to deduce it - unless you dox yourself in the text.

Not trying to be deliberately argumentative, just thinking this though, much as I dislike Reddit, the case feels weak

AlteredStateBlob@kbin.social · 10 months ago

It doesn’t matter, as long as the text is supplied as is, a simple Google search with the text and site:reddit.com will reveal the author, keeping it identifiable. True anonymization under GDPR almost does not exist, as it would destroy the dataset and make it unusable.

QuaternionsRock@lemmy.world · 10 months ago

I deleted my first Reddit account a few years ago. When the whole API fiasco happened and I moved here, I realized that Redacted didn’t finish the job. I tried to get them to remove the rest of my stuff through a GDPR request, but they wouldn’t do shit, and they seemed to think that was acceptable under GDPR. When you delete your account, they (claim to) delete your associated email address, so they also “couldn’t” verify that it was mine.

FWIW, HackerNews has the same policy.

HeartyBeast@kbin.social · edit-2 10 months ago

It will reveal the username, not the identity of the author. If I tell you my Reddit username, what do you know about me?

AlteredStateBlob@kbin.social · 10 months ago

It doesn’t matter what it tells me. Personal data is clearly defined under GDPR as data that can be used to identify a person. It is irrelevant if you or I can do it with publicly available data, reddit has the data and that is enough to qualify it as such.

A DPA might absolutely disagree with my reading of the situation. I would be surprised, if a DPA considered usernames as non personal identifable information and know of no such ruling.

HeartyBeast@kbin.social · 10 months ago

My view is that Reddit has personally identifiable data but the data that is being licensed to Google, isn’t personally identifiable because the username by itself is insufficient to identify a person, without the additional data that Reddit isn’t passing over.

But I agree I may well be surprised by a DPA decision.

oce 🐆@jlai.lu · 10 months ago

Isn’t it enough to remove any connection to any personal identifier before sending it? LLM training doesn’t care about your email, it cares about a certain quality of question/answer pairs, and reddit has a lot of those.

AlteredStateBlob@kbin.social · 10 months ago

It is not enough, no. The LLM might reveal training data, showing the original text and that is a simple Google search with site:reddit.com away from identifing the user. It’s trivial and thus not anonymized.

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

Any EU based users of reddit should immediately file a complaint under GDPR with their supervisory authority for the sale of their data to Google to train their LLMs

Legal Basis?

What’s being processed

My lord, is this legal?

Your rights and how they’re being violated (not in a kinky fun way)

Send reddit a little e-mail

Delving into the Arcane

Cool, what now?

US

EU

UK

Good luck!