One issue that I don’t think Lemmy has tackled collectively is the licensing of the user data. Lemmy is open source and that’s one crucial part of the enshittification resistance equation. The other is doing the equivalent for the user data. If the user data is licensed under the right version of the CC license, it will ensure that it can always be copied to another instance in cases of instance enshittification. As far as I know, there isn’t anything about who owns the user data. That defaults to every author having copyright over their data. While this means the instance owner can’t sell it without permission from every user it’s also not conductive to moving bulk data across instances. Individual migration would improve this significantly but I believe we should switch to having user data licensed under some CC license too.
If all of this sounds strange, think Wikipedia. That’s what guarantees data contributed to Wikipedia stays within our hands irrespective of what the Wikimedia Foundation does.
This is a great point. The user data needs to be enshrined in such a way that it can be easily moved in a bulk migration without requiring a direct opt-in from every user. While at the same time making it clear how it’s being used/kept/sold/not sold/etc.
I’m not against LLMs using the data generated on sites like this to inform useful answers when I ask ChatGPT a question. It genuinely makes AI a better tool, but I feel like the contributors of such content should know how their answers are being used.
LLMs are likely going to scrape no matter the license. I doubt OpenAI got a copyright license from Reddit to ingest it. In fact I’m not even sure they need one if ingestion can be make similar enough to “reading the web site”. And so making content CC probably won’t affect LLM use of public posts.
One issue that I don’t think Lemmy has tackled collectively is the licensing of the user data. Lemmy is open source and that’s one crucial part of the enshittification resistance equation. The other is doing the equivalent for the user data. If the user data is licensed under the right version of the CC license, it will ensure that it can always be copied to another instance in cases of instance enshittification. As far as I know, there isn’t anything about who owns the user data. That defaults to every author having copyright over their data. While this means the instance owner can’t sell it without permission from every user it’s also not conductive to moving bulk data across instances. Individual migration would improve this significantly but I believe we should switch to having user data licensed under some CC license too.
If all of this sounds strange, think Wikipedia. That’s what guarantees data contributed to Wikipedia stays within our hands irrespective of what the Wikimedia Foundation does.
This is a great point. The user data needs to be enshrined in such a way that it can be easily moved in a bulk migration without requiring a direct opt-in from every user. While at the same time making it clear how it’s being used/kept/sold/not sold/etc.
I’m not against LLMs using the data generated on sites like this to inform useful answers when I ask ChatGPT a question. It genuinely makes AI a better tool, but I feel like the contributors of such content should know how their answers are being used.
LLMs are likely going to scrape no matter the license. I doubt OpenAI got a copyright license from Reddit to ingest it. In fact I’m not even sure they need one if ingestion can be make similar enough to “reading the web site”. And so making content CC probably won’t affect LLM use of public posts.