Addressing the Copyright Dilemma in AI’s Training Data

The article discusses the complex issue of copyright infringement in the context of AI training data, often referred to as “AI’s Original Sin”. Tech giants like OpenAI and Google have been accused of violating copyright laws by transcribing YouTube videos and using the text as additional training data for their AI models. This practice is in direct violation of YouTube’s terms of service and has raised legal disputes.

The article also highlights the intricate nature of copyright law, which is further complicated by generative AI systems. These systems raise issues of authorship, similarity, direct and indirect liability, fair use, and licensing. The legality of a generative AI system’s output can depend on how its training datasets were assembled, and the creator’s liability can depend on the prompts supplied by its users.

The article suggests that instead of focusing on the fine points of copyright law and arguments over liability for infringement, it is more important to explore the political economy of copyrighted content in the emerging world of AI services. It raises questions about who will profit from generative AI and how the value created by the “generative AI supply chain” should be allocated.

The article concludes by emphasizing the need for new institutions and business models that can distribute the value created by AI in proportion to the role that various parties play in creating it. This would foster a virtuous circle of ongoing value creation, benefiting everyone involved in the ecosystem.

Read more: www.oreilly.com