@ceo_of_monoeye_dating @p @Nerd02 @bmygsbvur @db0 @mint
It’s not their model, it’s an implementation of the openAI paper from some academics hosted here https://github.com/pharmapsychotic/clip-interrogator/
To be specific they use one of the ViT-L/14 models.
This type of labeling models have been around for a long time. They used to be called text-from-image or some other similar verbose description.
If the current generative models can produce porn then they can also produce CSAM, there’s no need to go through another layer.
The issue with models trained on actual illegal material is that then they could be reverse engineered to output the very same material that they have been trained with, in addition to very realistic generated ones. It’s similar to how LLMs can be used to extract potentially private information they’ve been trained with.
@ceo_of_monoeye_dating @Nerd02 @bmygsbvur @db0 @mint @p
*some academics hosted here https://github.com/mlfoundations/open_clip
The above link was just the wrapper.