Suing Writers Seethe at OpenAI's Excuses in Court

@floofloof · 8 months ago

Suing Writers Seethe at OpenAI's Excuses in Court

@[email protected] · edit-2 8 months ago

Let me replicate the point of contention on this topic:

IANAL so correct me on this - copyright currently protects the expression of works, with the exception of fair use. So let’s ignore fair use for now and focus instead on “expression is copyrighted, idea is fair game”.

Let’s look at how AIs work. AIs that generate text are usually called “LLM” (large language model). In their training, they are shown some text and then they either predict what the response or what the next word looks like. They then get to look at the right solution, and they learn how to improve in this specific scenario. The way they learn, in it’s simplest form, is looking at the previous text, doing some math on it with specific weights, and then they adjust those weights. We’re talking arbitrary math and arbitrary decimals for the most part. So imagine on your hard drive the AI model looks like a metadata, a blueprint where the weights are and how they interact, and then the numbers attached to the weights (this is the trained bit).

Under current copyright law you would need to prove that these numbers are either specifically representative of the expression of a book itself, or in tandem with the rest of the AI they give the AI the ability to replicate the book in its expression as to be a substitute for the book.

The former is probably impossible to argue, as these numbers in its very nature and on its own don’t represent the book. For one, the numbers represent what to do with a given number of inputs, but they then also include a wide range of books and text that are important for a particular section of weights in the AI model.

Now the latter argument is interesting. I am not a lawyer, so no clue how one would argue this in court, but there is a point to be made that some of the expression of a book is resembled in the output of the AI. Now this doesn’t look to me like something to be measured in traditional copyright, but there’s certainly an argument here that this deserves protection.

This is the point of contention. And every time ppl say “this should be easy”, no, it shouldn’t. Law is hard, and the technical details of AI are even harder. I dumbed down a lot of topics here to make it easier to understand for a layperson, but ask experts and they can report about the wrinkles of it for days.

Hopefully this helps some ppl understand the vast majority of the issues. Please correct me in the comments, or give me your best arguments, would love to see all the facettes on this.