AIs can generate near-verbatim copies of novels from training data

supersquirrel@sopuli.xyz · edit-2 1 day ago

AIs can generate near-verbatim copies of novels from training data

Archangel1313 · 1 day ago

Doesn’t this just mean they copied the original text, and still managed to get some of it wrong?

VitoRobles@lemmy.today · 1 day ago

They don’t copy the book and store the words in a database or anything. LLMs don’t have a brain or storage.

They copy it, convert pieces into numbers for its vector database, and mathematically reconstruct it when you ask it a question.

Since it’s reconstructing it (with math), it hallucinates and gets it wrong…

lectricleopard@lemmy.world · 1 day ago

I like this way of thinking about it, but I would scare quote that “hallucinates.” Its more like its been encrypted, and then decrypted with an imperfect algorithm. Or like a lossy compression and decompression.

We have mathematical understanding for these things. Its not a mysterious thing like the human brain still is for science. Personification of them is an unfortunate side affect of the fact its designed to emulate human intelligence and uses natural language in a sort of “conversation.” It does more to obfuscate the real nature of them than it does to explain them.

AliasAKA@lemmy.world · 1 day ago

This, and lossy compression is exactly right.

Alternatively, it’s a decomposition of a big matrix (think very large excel) wherein each cell is a probability you observe every other word (really its tokens of course but for sake of argument) given that you’ve observed other words. Like, you could literally make a transformer in excel. It wouldn’t run, but that’s excels fault, not the math.

Aside: but I’m pretty sure distributing a lossy compression and decompression algorithm is distribution, and charging for it is also there. Realistically if this is allowed, anyone should be able to pirate anything for any reason legally as long as it’s passed through a lossy compression and decompression first.

lectricleopard@lemmy.world · 1 day ago

Yeah, there isnt much of a difference as far as how the data is transformed between your pirating case and and the case of an ai providing copywritten material. It really is only because they treat it like an artificial person that they are able to convince people it should be allowed.

The kick in the teeth is, if I charged people for me to recite a copywritten novel, that I memorized but dont have the explicit permission to use, I’d be sued. There really is no way to argue this should be allowed that doesnt immediately fall apart if you pull it apart even a little.

supersquirrel@sopuli.xyz · 1 day ago

I didn’t cheat on you, I just didn’t realize I was making love to an entirely different woman! They are different OK!!!

WoodScientist@lemmy.world · 1 day ago

That’s a interesting question. Think of the Star Trek holodeck. If someone creates a perfect holodeck recreation of their own partner, and sleeps with that simulation, is that cheating on their partner? Let’s assume it’s not one of those fancy sentient holograms like the doctor, just a regular mindless one.

thisbenzingring@lemmy.today · 1 day ago

what if they are the doctor and have sex with a ghost?

TachyonTele@piefed.social · 1 day ago

That’s just a good old Blazin’ time

supersquirrel@sopuli.xyz · 18 hours ago

I prefer Blazin’ with Bev’

TachyonTele@piefed.social · 16 hours ago

Bev is implied when one is already Blazin’

ThePantser@sh.itjust.works · 1 day ago

Eh I would say it’s masturbating to a “picture” of their partner. It’s just a sexy light show. As long as it’s not sentient it can never have feelings back so it’s just a sex toy. Ever hear of a clone-a-willy?

PumaStoleMyBluff@lemmy.world · 16 hours ago

As with a picture, the important part is consent. Was the picture/3D model created with informed consent from the partner that it might reasonably be used for masturbation? If so, then not cheating. Otherwise it is.