Courts in the US and UK are currently considering whether training artificial intelligence models with copyrighted literature is lawful, and the financial stakes are significant. Numerous authors and publishers have filed lawsuits, pointing out that particular AI models have not only utilized popular texts in their training but also retained much of this material word for word.
The primary issue revolves around whether developers of AI have the right to use protected content without authorization. Previous research has shown that several leading language models were trained using the “Books3” dataset, which contains nearly 200,000 copyrighted works, some of which were illegally obtained. Supporters of AI assert they do not violate copyright laws, arguing that these models generate unique word combinations instead of simply copying the original texts.
Recent tests have explored the extent to which these systems reproduce their training data. While many models do not remember exact phrases, Meta’s model has retained substantial parts of certain texts, proof of direct memorization, which might result in damages of over $1 billion if the courts rule against the company. Mark Lemley from Stanford University noted that this indicates AI models possess a more rote understanding of the text than simple word relationships.
The ainewsarticles.com article you just read is a brief synopsis; the original article can be found here: Read the Full Article…