EDIT
TO EVERYONE ASKING TO OPEN AN ISSUE ON GITHUB, IT HAS BEEN OPEN SINCE JULY 6: https://github.com/LemmyNet/lemmy/issues/3504
June 24 - https://github.com/LemmyNet/lemmy/issues/3236
TO EVERYONE SAYING THAT THIS IS NOT A CONCERN: Everybody has different laws in their countries (in other words, not everyone is American), and whether or not an admin is liable for such content residing in their servers without their knowledge, don’t you think it’s still an issue anyway? Are you not bothered by the fact that somebody could be sharing illegal images from your server without you ever knowing? Is that okay with you? OR are you only saying this because you’re NOT an admin? Different admins have already responded in the comments and have suggested ways to solve the problem because they are genuinely concerned about this problem as much as I am. Thank you to all the hard working admins. I appreciate and love you all.
ORIGINAL POST
You can upload images to a Lemmy instance without anyone knowing that the image is there if the admins are not regularly checking their pictrs database.
To do this, you create a post on any Lemmy instance, upload an image, and never click the “Create” button. The post is never created but the image is uploaded. Because the post isn’t created, nobody knows that the image is uploaded.
You can also go to any post, upload a picture in the comment, copy the URL and never post the comment. You can also upload an image as your avatar or banner and just close the tab. The image will still reside in the server.
You can (possibly) do the same with community icons and banners.
Why does this matter?
Because anyone can upload illegal images without the admin knowing and the admin will be liable for it. With everything that has been going on lately, I wanted to remind all of you about this. Don’t think that disabling cache is enough. Bad actors can secretly stash illegal images on your Lemmy instance if you aren’t checking!
These bad actors can then share these links around and you would never know! They can report it to the FBI and if you haven’t taken it down (because you did not know) for a certain period, say goodbye to your instance and see you in court.
Only your backend admins who have access to the database (or object storage or whatever) can check this, meaning non-backend admins and moderators WILL NOT BE ABLE TO MONITOR THESE, and regular users WILL NOT BE ABLE TO REPORT THESE.
Aren’t these images deleted if they aren’t used for the post/comment/banner/avatar/icon?
NOPE! The image actually stays uploaded! Lemmy doesn’t check if the images are used! Try it out yourself. Just make sure to copy the link by copying the link text or copying it by clicking the image then “copy image link”.
How come this hasn’t been addressed before?
I don’t know. I am fairly certain that this has been brought up before. Nobody paid attention but I’m bringing it up again after all the shit that happened in the past week. I can’t even find it on the GitHub issue tracker.
I’m an instance administrator, what the fuck do I do?
Check your pictrs images (good luck) or nuke it. Disable pictrs, restrict sign ups, or watch your database like a hawk. You can also delete your instance.
Good luck.
@p @ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 Yeah, it’s using local CLIP model, something I’ve suggested both to gr*f and jakparty.soy admin. The problem is that it requires a lot of clock cycles, preferably on GPU, so it isn’t something people with $5 VPSes can afford. Not fully sure about effectiveness, either, malicious actors can keep scrambling the image so that it passes the filter yet is still recognizable by human brain.
@mint @Nerd02 @bmygsbvur @ceo_of_monoeye_dating @db0
> it’s using local CLIP model,
How does this not end up getting used to produce computer-generated CP?
> isn’t something people with $5 VPSes can afford.
Yeah, but when you’re at the $5 VPS stage, you’re usually going to be hosting a couple dozen people at most.
> malicious actors can keep scrambling the image so that it passes the filter yet is still recognizable by human brain.
Yeah. Not foolproof.
@p @Nerd02 @bmygsbvur @db0 @mint >How does this not end up getting used to produce computer-generated CP?
It was. That’s the problem they wrote this script to try to solve.
@ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint Yeah, presumably it is better at detecting stuff that it produces itself, but my understanding is that this kind of model is legally questionable to possess because of that.
@p @Nerd02 @bmygsbvur @db0 @mint They’ve had the model on github for months. If they were gonna get bonked, they’d’ve gotten bonked by now.
@ceo_of_monoeye_dating @p @Nerd02 @bmygsbvur @db0 @mint
It’s not their model, it’s an implementation of the openAI paper from some academics hosted here https://github.com/pharmapsychotic/clip-interrogator/
To be specific they use one of the ViT-L/14 models.
This type of labeling models have been around for a long time. They used to be called text-from-image or some other similar verbose description.
If the current generative models can produce porn then they can also produce CSAM, there’s no need to go through another layer.
The issue with models trained on actual illegal material is that then they could be reverse engineered to output the very same material that they have been trained with, in addition to very realistic generated ones. It’s similar to how LLMs can be used to extract potentially private information they’ve been trained with.
@ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint @p
*some academics hosted here https://github.com/mlfoundations/open_clip
The above link was just the wrapper.
@laurel @ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint @p HI LAUREL
bearhug.gif
@ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint Yeah, but youtube-dl was on Github for years and then suddenly declared an evil piracy tool and scrubbed and banned. The odds that you get bonked are also higher than the odds that Github gets bonked; “I got it from Github” doesn’t constitute much of a defense.
In either case, I don’t have much investment in the legality of that model because I don’t plan to acquire it. Just it was my understanding that possessing a model that was trained on some source material and that can be used to produce material resembling the source material is considered the same, legally, as possessing the source material. I’m not an expert on that and I don’t think there have even been any cases yet.
@p @Nerd02 @bmygsbvur @db0 @mint The problem with the models is the fact that training data can be reverse engineered from the model. If the model’s not trained on any CP, there’s not likely to be any problem.
@ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint Ah, okay, so this one wasn’t trained on that material?
@p @ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 Yes, but it should be able to count two concepts together even if there were no overlap between the two in training data.
@mint @p @Nerd02 @bmygsbvur @db0 This is the type of response I was looking for - and why I’d asked pete. If the big problem’s clock cycles, then maybe there’s something that can be done - after all, the model’s way beefier than what’s needed to solve this particular problem, it does much more.