- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.
Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.
It seems like it was almost necessary to go through this phase for the sake of developing the tech. Doesn’t a lot of CS research uses web crawling algorithms to gather data without identifying that the information is licensed for such use? What about the fediverse? it remains unclear what the copyright and licensing will be should it come into question. There is no EULA to access fedi, just a set of open protocols.
Testing an algorithm for a paper with releasing the weights/data is not the same as selling the output of the algorithm.
It doesn’t matter: scraping data has and always been legal.
Depends where you live, my academic advisor set limits on scrapping due to past experience.