In addition to the possible business threat, forcing OpenAI to identify its use of copyrighted data would expose the company to potential lawsuits. Generative AI systems like ChatGPT and DALL-E are trained using large amounts of data scraped from the web, much of it copyright protected. When companies disclose these data sources it leaves them open to legal challenges. OpenAI rival Stability AI, for example, is currently being sued by stock image maker Getty Images for using its copyrighted data to train its AI image generator.
Aaaaaand there it is. They don’t want to admit how much copyrighted materials they’ve been using.
The article says nothing about the models violating copyright. They do say that the laws require them to disclose the use of any copyrighted material, which I believe is pretty black or white with current laws.
In any case, I don’t know if I’d call it copyright infringement, but the crux of the matter is that artists do not want their work to be used in this way. There are two main problems with this that I’m aware of (second hand info from talking to one person involved in the art community):
This is of course assuming you agree with the goal of promoting innovation, both in technology and in arts.