How Generative AI is Reshaping Copyright Landscape

Leave a Comment / News / By MPTech Insights

Spread the love

The pending class actions against OpenAI have stirred considerable attention in legal and technological circles. There have been conversations on how the outcomes of cases could reshape the terrain of copyright law universally. Renowned authors, George R.R. Martin, Jodi Picoult, et al commenced a suit at the New York Federal Court against OpenAI, and recently, among others, another class action against the company was instituted at Manhattan Federal Court by Julian Sancton, an author and Hollywood Reporter. The earlier lawsuit, like the subsequent ones, alleges that OpenAI’s products, including ChatGPT, make unauthorised use of copyrighted material, sparking a significant legal confrontation. Legal analysts suggest that the outcome of this case could profoundly influence the trajectory and capabilities of generative AI, potentially setting new boundaries or solidifying a more expansive approach to online content as well as copyright law.

It is alleged that Generative AI applications like ChatGPT generate responses to user prompts by employing an algorithm that picks words based on insights gained from analyzing vast amounts of text spanning billions of sources on the internet. At the core of the lawsuit is the claim of “systemic theft on a massive scale” within the algorithms driving generative AI programs. – “These algorithms are at the heart of Defendants’ massive commercial enterprise. And at the heart of these algorithms is systematic theft on a mass scale.” The plaintiffs argue that the ingestion of copyrighted material as training data for AI models constitutes a reproduction of the works, without proper authorisation. The lawsuit contends that OpenAI’s use of copyrighted content contributes to the creation of work that publishers would otherwise pay authors to produce.

It is important to note that Generative AI like ChatGPT generates responses to prompts using a machine learning model called GPT-3.5, developed by OpenAI. The model is part of the GPT (Generative Pre-trained Transformer) series. It is trained on a diverse range of internet text and uses patterns it has learned during training to generate coherent and contextually relevant responses. When a user inputs a prompt or question, the model analyzes the context and generates a response by predicting the next set of words based on the patterns it has learned. It does not access specific databases, websites, or real-time information during this process. The responses are generated based on the patterns and information present in the data it was trained on, up until its last training. In other words, while it can provide information and answer questions based on its training, this Generative AI does not have personal experiences, opinions, or access to real-time data. All responses are purely based on patterns learned from the data on which it was trained.

However, the exact nature of the data sets used by OpenAI remains a point of contention. Questions linger about the extent to which the company relies on copyrighted material for training its AI models. The lawsuit may potentially shed light on these data sets, exposing the inner workings of generative AI and its utilization of copyrighted content.

OpenAI has yet to respond to queries regarding the specifics of the datasets used. In a broader context, this lawsuit mirrors a similar legal challenge brought forth by comedian and actress Sarah Silverman, highlighting concerns over the scanning of copyrighted material without permission. Both OpenAI and Meta, the parent company of Facebook, have invoked the “fair use” defence in response to these claims, arguing that their use of copyrighted material falls within the bounds of transformative and legally acceptable practices. Defence of fair use allows for the limited reproduction of text for uses like commentary or criticism. It has thus been argued that the defence applies to the company’s use of material for the training of its AI product.

Legal experts have opined that the court’s interpretation of this case could have far-reaching implications for the generative AI industry and the scope of copyright law. If the plaintiffs prevail, AI companies may be compelled to seek permission from authors and publishers, potentially leading to negotiations over licensing agreements. Conversely, if OpenAI emerges victorious, it could pave the way for widespread scanning of the internet to establish AI models based on diverse data sets.

On another note, recent conversations around the use of copyrighted songs by generative AI have alerted another concern. Generative AI is also trained on a vast database of existing songs, using them to generate music based on text prompts. It may be recalled that Ed Newton-Rex resigned from Stability AI’s audio team as a result of disagreements over the company’s stance that training AI models on copyrighted works falls under “fair use.” According to Newton-Rex, some large AI companies avoid dealing with artists and labels due to the time and cost involved. In defence, Emad Mostaque, Stability AI’s co-founder and CEO, argues that fair use supports creative development. Fair use is a legal clause permitting the use of copyrighted work without the owner’s permission for specific non-commercial purposes, such as research or teaching. Stability’s audio generator, Stable Audio, allows musicians to opt out of their training data pool. Despite this, millions of AI-generated songs are produced daily online, with major artists signing deals with tech giants like Google, YouTube, and Sony to create AI music tools.

While some artists agree to their work being used in these models, there is a concern about AI generators that potentially scrape music without the creator’s consent. The Human Artistry Campaign, representing global music associations, advocates for regulations to safeguard copyright and ensure artists have the choice to license their work to AI companies for a fee. Another cause of action on this is envisaged.
Indeed, the decisions in these cases are likely to fundamentally reshape the tech industry’s approach to copyright and could usher in a new era of negotiations and specialized agreements within the information marketplace. These lawsuits and the ongoing conversations, therefore, stand as pivotal moments challenging assumptions inherent in the copyright system, comparable to the transformative impacts witnessed during the rise of the internet and mass media.

Leave a Comment Cancel Reply