Stack Overflow stopped publishing its Data Dump

avidseeker@lemmy.one · 2 years ago

Stack Overflow stopped publishing its Data Dump

cwagner@discuss.tchncs.de · edit-2 2 years ago

deleted by creator

tojikomori@kbin.social · edit-2 2 years ago

This reply’s interesting:

How can data licensed under the CC-BY-SA licenses (that SO content is licensed under) be “misused”? The license explictly allows others to do essentially anything they want with the data as long as attribution is given, in particular profit off of it.

When SO content is applied as parametric knowledge I’d expect the outcome to fail both the “BY” and the “SA” clauses, since model interpreters can’t provide attribution for it and their output won’t share the license. That’s true even if output is considered public domain: CC-BY-SA content can’t be moved into a public domain equivalent license. It seems practically indistinguishable from using any other in-copyright content as training material.

None of that’s to say SO is right to stop data dumps. It feels like they’re trying to find a technical solution to a legal problem, perhaps even one that rises to criminality on the part of Open AI and others?

Naatan@lemmy.ml · edit-2 2 years ago

deleted by creator

lightrush · edit-2 2 years ago

Please let this not be a sign that the enshittification of StackExchange has begun.

SquishyPandaDev@yiffit.net · 2 years ago

Da fuq you mean begun. Has been a shit hole for a long ass time

lightrush · 2 years ago

I’m behind.

Garrathian@beehaw.org · 2 years ago

Well i know mods at stackoverflow were wanting to mutiny because the owners wanted to start incorporating AI responses to questions posted there or something like that

AbelianGrape@beehaw.org · 2 years ago

They are not allowing moderators to remove replies based solely on the fact that they were written by AI, regardless of how much evidence there is to that fact.

I only ever interact with stackoverflow to read like 10-year-old responses to random problems I run into and even I want the moderators to mutiny over that. It’s arguably more serious than what Reddit is doing, because in many ways SE is the unsung backbone of technology at the moment. AI responses to technical questions are almost always wrong in some important way (even if the main idea is correct) and no moderator or group of moderators can be expected to have sufficiently broad knowledge to always know that an answer is wrong.

Garrathian@beehaw.org · 2 years ago

Yeah that’s what I read thanks for expounding, it’s definitely not good as somebody who also uses stackoverflow pretty commonly

lightrush · edit-2 2 years ago

This is madness. Writing responses to queries in SO using AI trained on the same data or often data of much lower quality is like… a snake eating its tail while shitting diarrhoea. This would just decrease the signal-to-noise ratio on SO. Wait did I describe enshittification… fml

Stack Overflow stopped publishing its Data Dump

Stack Overflow stopped publishing its Data Dump

June 2023 Data Dump is missing