EDIT

TO EVERYONE ASKING TO OPEN AN ISSUE ON GITHUB, IT HAS BEEN OPEN SINCE JULY 6: https://github.com/LemmyNet/lemmy/issues/3504

June 24 - https://github.com/LemmyNet/lemmy/issues/3236

TO EVERYONE SAYING THAT THIS IS NOT A CONCERN: Everybody has different laws in their countries (in other words, not everyone is American), and whether or not an admin is liable for such content residing in their servers without their knowledge, don’t you think it’s still an issue anyway? Are you not bothered by the fact that somebody could be sharing illegal images from your server without you ever knowing? Is that okay with you? OR are you only saying this because you’re NOT an admin? Different admins have already responded in the comments and have suggested ways to solve the problem because they are genuinely concerned about this problem as much as I am. Thank you to all the hard working admins. I appreciate and love you all.


ORIGINAL POST

You can upload images to a Lemmy instance without anyone knowing that the image is there if the admins are not regularly checking their pictrs database.

To do this, you create a post on any Lemmy instance, upload an image, and never click the “Create” button. The post is never created but the image is uploaded. Because the post isn’t created, nobody knows that the image is uploaded.

You can also go to any post, upload a picture in the comment, copy the URL and never post the comment. You can also upload an image as your avatar or banner and just close the tab. The image will still reside in the server.

You can (possibly) do the same with community icons and banners.

Why does this matter?

Because anyone can upload illegal images without the admin knowing and the admin will be liable for it. With everything that has been going on lately, I wanted to remind all of you about this. Don’t think that disabling cache is enough. Bad actors can secretly stash illegal images on your Lemmy instance if you aren’t checking!

These bad actors can then share these links around and you would never know! They can report it to the FBI and if you haven’t taken it down (because you did not know) for a certain period, say goodbye to your instance and see you in court.

Only your backend admins who have access to the database (or object storage or whatever) can check this, meaning non-backend admins and moderators WILL NOT BE ABLE TO MONITOR THESE, and regular users WILL NOT BE ABLE TO REPORT THESE.

Aren’t these images deleted if they aren’t used for the post/comment/banner/avatar/icon?

NOPE! The image actually stays uploaded! Lemmy doesn’t check if the images are used! Try it out yourself. Just make sure to copy the link by copying the link text or copying it by clicking the image then “copy image link”.

How come this hasn’t been addressed before?

I don’t know. I am fairly certain that this has been brought up before. Nobody paid attention but I’m bringing it up again after all the shit that happened in the past week. I can’t even find it on the GitHub issue tracker.

I’m an instance administrator, what the fuck do I do?

Check your pictrs images (good luck) or nuke it. Disable pictrs, restrict sign ups, or watch your database like a hawk. You can also delete your instance.

Good luck.

  • pistolero@freespeechextremist.com
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    @ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 The last time the topic came up, the only publicly available API for this was owned by the feds. I don’t know if this tool downloads a model (I also don’t know how such a model could be legal to possess) or if it consults an API (which would be a privacy concern). In either case, you’d have to be very careful about false positives.

    • @ryona.agency
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      @p @ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 Yeah, it’s using local CLIP model, something I’ve suggested both to gr*f and jakparty.soy admin. The problem is that it requires a lot of clock cycles, preferably on GPU, so it isn’t something people with $5 VPSes can afford. Not fully sure about effectiveness, either, malicious actors can keep scrambling the image so that it passes the filter yet is still recognizable by human brain.

        • laurel@freespeechextremist.com
          link
          fedilink
          arrow-up
          4
          ·
          1 year ago

          @p @Nerd02 @bmygsbvur @ceo_of_monoeye_dating @db0

          Compared to what the feds use yeah, but it is a way to leverage legal training material to detect illegal one.
          Think of it like this, you have a model that detects pornographic content and another one that detects age of people depicted. You run the image through both and if the result is over some threshold you flag the image.

          In this case they use an off the shelf general model that outputs a text description and they just use the raw keyword weights without the sentence generating phase.

          • CMD@bae.st
            link
            fedilink
            arrow-up
            3
            ·
            1 year ago

            @laurel @p @Nerd02 @bmygsbvur @db0 If nothing else, the fact that this model exists and is not getting rekt by fedbois is a sign that the problem *can* be solved. I’m bookmarking this package - the next time everyone starts bitching about CP spam, I’m going to throw it on the table.

            • pistolero@freespeechextremist.com
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              @ceo_of_monoeye_dating @laurel @Nerd02 @bmygsbvur @db0

              > If nothing else, the fact that this model exists and is not getting rekt by fedbois is a sign that

              This is not a sign of anything. “The cops didn’t seem to care yesterday” doesn’t indicate anything about today.

              > the next time everyone starts bitching about CP spam, I’m going to throw it on the table.

              “Why don’t you use a ridiculous amount of bandwidth downloading literally every image and then a ridiculous amount of computer juice processing all of it and then deal with the false positives?”

              I don’t even use the thumbnailer because it is too heavy. sjw regularly posts 12MB JPEGs. It’s so heavyweight that you could DoS it just by posting a lot of very large images, and you could defeat it pretty easily. Even something like hashing the images is too much for most instances.

              • CMD@bae.st
                link
                fedilink
                arrow-up
                3
                ·
                1 year ago

                @p @laurel @Nerd02 @bmygsbvur @db0 >“Why don’t you use a ridiculous amount of bandwidth downloading literally every image and then a ridiculous amount of computer juice processing all of it and then deal with the false positives?”

                Right, this is actually the key problem - the model is pretty beefy, and doing this for every instance that ain’t your own is a sure way to get completely wrecked.

                Regardless, this is better than what we believed before - the tools not only can be built, but they exist and are apparently being used (albeit on a smaller scale - the tool posted above *only* checks images on your own instance, and even then only those that are orphaned.)

        • CMD@bae.st
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          @p @laurel @Nerd02 @bmygsbvur @db0 There’s no way to make something like this reliable. The only people holding onto a dataset like this are cops and pedos.

          Cops don’t release models like this because of Dwork’s result, and pedos aren’t exactly invested in stopping other pedos from fapping to CP.

    • CMD@bae.st
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      @p @Nerd02 @bmygsbvur @db0 It’s the code in the horde-safety package, which I’ve linked here: https://github.com/Haidra-Org/horde-safety/blob/main/horde_safety/csam_checker.py

      At a first glance, it looks like it takes an image, runs it through a model to return keywords that would’ve been used to generate such an image, then checks them against a pair of lists containing “underage” words and “pornographic” words. In a deep sense, it detects if an image “has children” and “is porn” without ever having trained on a combination of the two.

      The model’s more beefy than what’s needed to solve this problem minimally, but it does appear to solve the problem.